R Data Generator
The R Data Generator transform lets you generate data by writing scripts using the R statistical programming language. This is similar to the R Language Analysis transform except that it does not accept input from a preceding transform and generates its own output directly from R.
R is both a programming language and an environment for statistical computing, graphics, and predictive analysis. You can use the R Data Generator transform to generate data for prototyping or developing proof-of-concepts, or if you are using R to access a data source.
To learn more about the R language, see The R Project for Statistical Computing.
1. Setup
Before you can use the R Data Generator transform in Dundas BI, the R programming environment must be installed on a server.
See Install and configure R for more details.
2. Input
The R Data Generator transform does not have any inputs. It just generates output by running R scripts against the R server.
3. Add the transform
When creating a new data cube, you can add the R Data Generator transform to an empty canvas from the toolbar.
The R Language Data Generation transform is added to the data cube and connected to a Process Result transform automatically.
You can also add the R Data Generator transform from the toolbar to an existing data cube process. A typical example is to connect the R Language Data Generation instance to a Union transform which merges data from multiple inputs.
4. Configure the transform
Double click the R Language Data Generation transform or select the Configure option from its right-click menu.
In the configuration dialog for the transform, the key task is to enter an R script that sets the output variable.
For example, a simple script for generating a column of numbers from 1 to 5 looks like this:
output=c(1,2,3,4,5)
In this dialog, you can set up Placeholders to insert into the script that pass in parameter values similar to when using a manual select.
You can also set up Parameters to directly filter this transform's output like with select transforms.
5. Output
The output of the R Data Generator depends on the R script it is configured with. It can be a single value, a column of values, or multiple columns.
In the case of the simple script for generating numbers from 1 to 5, you can see an output column named 'Data' by selecting the Process Result transform and then clicking on Data Preview.
6. Example R scripts
Here are some example R scripts for generating data.
6.1. Random number generation
Generate 10 random numbers between 200.5 and 300.5:
output=runif(10, 200.5, 300.5)
Generate 5 random integers between 1 and 1000:
output=sample(1:1000, 5)
Generate two columns of data. The first column contains integers from 1 to 5 in order. The second column contains 5 random integers between 50 and 100:
x=c(1,2,3,4,5) y=sample(50:100, 5) output=data.frame(x,y)
Generate two columns, the first column with 12 random dates between 2017/01/01 and 2018/01/01, and the 2nd column with 12 random integers between 1 and 1000 :
x=sample(1:1000, 12) y=sample(seq(as.Date('2017/01/01'), as.Date('2018/01/01'), by="day"), 12) output=data.frame(x,y)
6.2. Pre-defined datasets
Load pre-defined data from the R Datasets Package. For example, Freeny's Revenue Data:
output=datasets::freeny
Here's the resulting Data Preview: