R Integration
From OpenI Wiki
| Table of contents |
[edit]
Requirements
Feedback requirements rewritten, 2006-02-16
Created some standards and limitations on R Scripts. This will allow proper integration between OpenI and R. Large R scripts are very difficult to debug, particularly when our R api does not provide detailed error messages: Did an error occur in file permissions? database connection? database login? timeout? Your R server is on linux and your odbc dsn is not setup properly? Good luck: If there is only one large R script, the root cause will be very difficult to find, when calling remote via the RServe api.
See also a new use case below. Read through the use case, then look at the R Standards following. Hopefully these two things will fit together.
[edit]
Use case
- User generates dataset
- this will be done outside of openi
- in a later iteration, we will create this dataset as well, but now this is done outside of openi
- query analyzer, db vis, some separate process generates the dataset
- dataset should be in tab delimited format, header in first row (.tab, .data, or .txt)
- user upload data to the server, puts in public or personal folder (use existing file upload functionality)
- user uploads an R Script, which conforms to standards detailed below (use existing upload file functionality)
- User configures the R function call (current functionality called R Task):
- function file (system generated dropdown of .r files, in folders accessible for that user)
- function name
- parameters
- default values
- one mandatory parameter is dataFrame
- User runs the R function call:
- system asks user to specify which dataset (system generated dropdown of .tab, .data, .txt, in folders accessible for that user)
- user selects one and hits run button
- System runs the function:
- run the script to create the function
- takes the data file, sends it to the R Server
- OpenI generates and executes the following R code:
data <- read.delim("uploadedFilename.tab", header=TRUE, sep="\t") # completely generated by OpenI, user does not need to write this repetitious code
filenames <- rFunctionCall(dataFrame = data, graphicsDevice=jpeg, other=otherParam) # user configures this function call, OpenI relies on proper function call configuration by the user
- Note that these commands must be executed in the same connection!
- system takes the vector of filenames, downloads to the openi server
- puts the files into the callers personal folder (username/scriptname/yyyy_MM_dd/*)
- User sees png's, dataframes, jpg's, pdfs, RData, .tab or whatever in their personal folder
[edit]
Why can't you just upload any R script?
- are you crazy?
- if your R script creates chart files, RData structures, OpenI application does not know where these files are located, does not know where to download these files
- R Server could be on a difference platform, this is particularly relevant for windows metafiles, which cannot be generated on a linux machine.
- Some standards need to be imposed on R Scripts, see below.
- There could be considerable usability problems, if we do not create some standards and conventions.
- As mentioned in the intro, RServe does not provide detailed error messages that occur inside of the R script.
[edit]
R Script standards
- scripts should be in the form of a function
- functions should accept a dataframe as an argument, not a file location
- to support multiple platforms, graphics devices should be passed in:
survival_curve <- function (graphicsDevice=jpeg ext=".jpg", dataSet, sIsActive, Heading, yLabel, UsageGroup, strataDescription = "all", cWidth=9, cHeight=5, cPointSize=10, cCex = 0.75)
graphicsDevice(filename, width = cWidth, height = cHeight, pointsize = cPointSize)
trellis.par.set(col.whitebg()) ## white background
print (histogram(UsageGroup, type = "count", main = paste ("Histogram for ", Heading, sep=""), ylab=yLabel))
dev.off()
- cannot query sql in scripts on the R server - once again platform setup/configuration/connectivity is not guaranteed
- for files created on the R server:
- create a vector of the files
- function returns the vector
filename = paste(strataDescription, "-histogram", ext, sep = "") returnFiles = c(returnFiles, filename) ... last thing in the method, return the vector: returnFiles
[edit]
Implementation
[edit]
Function execution
- Show pre-configured R functions in a dropdown list
- Show parameters for the selected function
- Execute the function on submit and store R result in personal folder
[edit]
Function Configuration
- Add/edit/delete functions
- Add/edit/delete function parameters
[edit]
Requirements
- Function file (R script containing function) must be uploaded before function configuration
- Dataset file must be uploaded before executing a function
[edit]
Description
- project_root/rfunction.xml : contains entry for R functions and function parameters
- WEB-INF/stat/r/rfunction.xsl : xslt to transform rfunction.xml file to generate UI
- org.openi.stat.r.RClient : R client interface which declares execute() method
- org.openi.stat.r.rserve.RServeClient : implements RClient interface and connects with R Server using RServe APIs
- public void execute(RFunction function, String outputPath) : this method takes a RFunction and out put file path, generates script for the function, sends dataset to R server, executes script and receives output from R server.
- org.openi.stat.r.RFunction : defines a R function
- org.openi.stat.r.RFunctionParam : defines parameters for a function
- org.openi.stat.r.RFunctions : contains R functions and defines helper methods for a function
- org.openi.stat.r.RFunctionUIBuilder : generates UI by transforming the xml file with the xsl file
- Controllers :
- org.openi.web.controller.RFunctionController: controller displaying and executing R functions.
- org.openi.web.controller.ManageRFunctionController: controller for managing(add, edit and delete) R functions and function parameters.
[edit]
Quick Start
Sample R scripts are included with
- Download RServe (http://stats.math.uni-augsburg.de/Rserve/)
- R CMD INSTALL Rserve_0.3-17.tar.gz (for *nix people, you do not untar and ./configure, make, etc!)
- Start RServe server. By default R server runs in local mode and doesn't accept remote connections. To enable remote R server mode:
- create a configuration file /etc/RServ.conf if this file is not already created.
- enable remote mode
- restart R server
- R CMD Rserve
workdir /tmp/Rserv wdfile remote enable auth disable plaintext disable fileio enable socket port 6311 maxinbuf 262144
- Upload R function file and dataset file using upload file feature
- To configure R function:
- to add R function
- select manage R function from left navigation
- select Add function button
- enter function name (must match with function name in R function file), display name and R function file
- to add parameters for R function:
- select function from the list
- select Add param button
- enter param name (must match with param name in R function defined in the file), display name, type (String type is enclosed in quote while making executing the function) and default value.
- to add R function
- To run R function:
- select R function from left navigation
- select a function from dropdown list
- enter value for parameters and select submit button
- results are stored in personal folder upon sucessful execution
[edit]
RServe vs. RSJava
[edit]
Installation
[edit]
compile from source
- If you need to compile R from source, make sure to install org-x11-devel, and enable R as a shared library
- http://tolstoy.newcastle.edu.au/R/help/04/08/2248.html
[edit]
Feedback
[edit]
2006 03 02
- increase std function library to output mosaic plots, box plots, LTV Curves, At Risk Curves
- problems remains in dataset creation
[edit]
