R Integration

From OpenI Wiki

Table of contents

Requirements

Feedback requirements rewritten, 2006-02-16

Created some standards and limitations on R Scripts. This will allow proper integration between OpenI and R. Large R scripts are very difficult to debug, particularly when our R api does not provide detailed error messages: Did an error occur in file permissions? database connection? database login? timeout? Your R server is on linux and your odbc dsn is not setup properly? Good luck: If there is only one large R script, the root cause will be very difficult to find, when calling remote via the RServe api.

See also a new use case below. Read through the use case, then look at the R Standards following. Hopefully these two things will fit together.

Use case

  • User generates dataset
    • this will be done outside of openi
    • in a later iteration, we will create this dataset as well, but now this is done outside of openi
    • query analyzer, db vis, some separate process generates the dataset
    • dataset should be in tab delimited format, header in first row (.tab, .data, or .txt)
  • user upload data to the server, puts in public or personal folder (use existing file upload functionality)
  • user uploads an R Script, which conforms to standards detailed below (use existing upload file functionality)
  • User configures the R function call (current functionality called R Task):
    • function file (system generated dropdown of .r files, in folders accessible for that user)
    • function name
    • parameters
    • default values
    • one mandatory parameter is dataFrame
  • User runs the R function call:
    • system asks user to specify which dataset (system generated dropdown of .tab, .data, .txt, in folders accessible for that user)
    • user selects one and hits run button
  • System runs the function:
    • run the script to create the function
    • takes the data file, sends it to the R Server
    • OpenI generates and executes the following R code:
data <- read.delim("uploadedFilename.tab", header=TRUE, sep="\t")  # completely generated by OpenI, user does not need to write this repetitious code
filenames <- rFunctionCall(dataFrame = data, graphicsDevice=jpeg, other=otherParam) # user configures this function call, OpenI relies on proper function call configuration by the user
    • Note that these commands must be executed in the same connection!
    • system takes the vector of filenames, downloads to the openi server
    • puts the files into the callers personal folder (username/scriptname/yyyy_MM_dd/*)
  • User sees png's, dataframes, jpg's, pdfs, RData, .tab or whatever in their personal folder

Why can't you just upload any R script?

  • are you crazy?
  • if your R script creates chart files, RData structures, OpenI application does not know where these files are located, does not know where to download these files
  • R Server could be on a difference platform, this is particularly relevant for windows metafiles, which cannot be generated on a linux machine.
  • Some standards need to be imposed on R Scripts, see below.
  • There could be considerable usability problems, if we do not create some standards and conventions.
  • As mentioned in the intro, RServe does not provide detailed error messages that occur inside of the R script.

R Script standards

  • scripts should be in the form of a function
  • functions should accept a dataframe as an argument, not a file location
  • to support multiple platforms, graphics devices should be passed in:
survival_curve <- function (graphicsDevice=jpeg ext=".jpg", dataSet, sIsActive, Heading, yLabel, UsageGroup, strataDescription = "all", cWidth=9, cHeight=5, cPointSize=10, cCex = 0.75)
       graphicsDevice(filename, width = cWidth, height = cHeight, pointsize = cPointSize)
       trellis.par.set(col.whitebg())    ## white background
       print (histogram(UsageGroup, type = "count", main = paste ("Histogram for ", Heading, sep=""), ylab=yLabel))
       dev.off()
  • cannot query sql in scripts on the R server - once again platform setup/configuration/connectivity is not guaranteed
  • for files created on the R server:
    • create a vector of the files
    • function returns the vector
filename = paste(strataDescription, "-histogram", ext, sep = "")
returnFiles = c(returnFiles, filename)
... last thing in the method, return the vector: 
returnFiles

Implementation

Function execution

  • Show pre-configured R functions in a dropdown list
  • Show parameters for the selected function
  • Execute the function on submit and store R result in personal folder

Function Configuration

  • Add/edit/delete functions
  • Add/edit/delete function parameters

Requirements

  • Function file (R script containing function) must be uploaded before function configuration
  • Dataset file must be uploaded before executing a function

Description

  • project_root/rfunction.xml : contains entry for R functions and function parameters
  • WEB-INF/stat/r/rfunction.xsl : xslt to transform rfunction.xml file to generate UI
  • org.openi.stat.r.RClient : R client interface which declares execute() method
  • org.openi.stat.r.rserve.RServeClient : implements RClient interface and connects with R Server using RServe APIs
    • public void execute(RFunction function, String outputPath) : this method takes a RFunction and out put file path, generates script for the function, sends dataset to R server, executes script and receives output from R server.
  • org.openi.stat.r.RFunction : defines a R function
  • org.openi.stat.r.RFunctionParam : defines parameters for a function
  • org.openi.stat.r.RFunctions : contains R functions and defines helper methods for a function
  • org.openi.stat.r.RFunctionUIBuilder : generates UI by transforming the xml file with the xsl file
  • Controllers :
    • org.openi.web.controller.RFunctionController: controller displaying and executing R functions.
    • org.openi.web.controller.ManageRFunctionController: controller for managing(add, edit and delete) R functions and function parameters.

Quick Start

Sample R scripts are included with

  • Download RServe (http://stats.math.uni-augsburg.de/Rserve/)
    • R CMD INSTALL Rserve_0.3-17.tar.gz (for *nix people, you do not untar and ./configure, make, etc!)
  • Start RServe server. By default R server runs in local mode and doesn't accept remote connections. To enable remote R server mode:
    • create a configuration file /etc/RServ.conf if this file is not already created.
    • enable remote mode
    • restart R server
    • R CMD Rserve
  workdir /tmp/Rserv
   wdfile 
   remote enable
   auth disable
   plaintext disable
   fileio enable
   socket
   port 6311
   maxinbuf 262144
  • Upload R function file and dataset file using upload file feature
  • To configure R function:
    • to add R function
      • select manage R function from left navigation
      • select Add function button
      • enter function name (must match with function name in R function file), display name and R function file
    • to add parameters for R function:
      • select function from the list
      • select Add param button
      • enter param name (must match with param name in R function defined in the file), display name, type (String type is enclosed in quote while making executing the function) and default value.
  • To run R function:
    • select R function from left navigation
    • select a function from dropdown list
    • enter value for parameters and select submit button
    • results are stored in personal folder upon sucessful execution

RServe vs. RSJava

RServe_RSJava_Comparison

Installation

compile from source

Feedback


2006 03 02

  • increase std function library to output mosaic plots, box plots, LTV Curves, At Risk Curves
  • problems remains in dataset creation

Graph gallery