23 Random Selection from a Dataset
You can select a random sample from an Rguroo dataset, with or without replacement, and replicate each sample. Moreover, you can apply statistic to each selected sample using R code.
Instructions to select random samples from a dataset
- Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.

Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the
Probability
dropdown menu and choose Random Selection. The Dataset Random Selection dialog opens (see Figure 23.1).Select your dataset from the Dataset dropdown.
Select your desired Sample Size, number of samples (Replications), and Seed.
Select one of With or Without replacement.
(Optional) If there is a numerical variable that consists of weights (probability of selection) for each case, select the variable from the Probability dropdown.
- If no variable is selected, all cases to be sampled will have the same probability of getting selected.
- The values of the probability variable must be all non-negative.
- If the values of the probability don’t add up to one, they will be internally normalized to add up to one.
- (Option) In the Sample a Subset section, you can specify which rows and columns to sample from.
- You can select rows using textboxes From –> To –> By or select rows by writing an R code in the Add Rows that results in specific row numbers. You can use both From –> To –> By and Add Rows at the same time.
- You can select columns by writing an R code in the Columns that results in specific column numbers.
- If length blank, all rows and columns will appear in the sample.
In our example, we select from the fourth (MPG) and fifth (HP) columns of the “cardata” dataset, and we only sample “Domestic” cars.
Click the Preview icon
to view the result.
(Optional) You can save the result as a stand-alone dataset by typing a name in the
Save Dataset As
textbox and clicking on theSave Dataset As
button.

Figure 23.1: Random Selection Dialog

Figure 23.2: Output of random selection
You can apply functions to your selected random samples by writing R code. In the example below, we write a function that creates a variable called Efficiency. For each sample selected, we compute the mean of MPG and depending on whether this mean is more than 30, between 20 and 30, or less than 30, the value of Efficiency is set as High, Average, or Low.
Instructions to apply statistics to selected samples
Continue the Rguroo instructions of Section 23.1.
Click the
Statistic
button on the top right of the application. The Custom Statistic dialog opens (see Figure 23.3) .Click the plus icon
on the Custom Statistic dialog. In the textbox that appears, type in a variable name.
Type your R code on the middle panel.
- You can double-click the names of the variables to include in your code or type them in.
- You can write multiple lines of code. However, the result of your code, when applied to each sample (replicate), must be a single number or character.
Click the Preview icon
to view the result.
(Optional) You can save the result as a stand-alone dataset by typing a name in the
Save Dataset As
textbox and clicking on theSave Dataset As
button.

Figure 23.3: Random Selection Dialog for Computing Statistics

Figure 23.4: Output of summary statistics
You can select a stratified random sample from an Rguroo dataset, with or without replacement, by providing a stratification variable. The stratification variable must be a factor. Moreover, once a sample is selected, you can apply statistic to each selected sample using R code, as explained in Section 23.2.
Instructions to select a startified random sample proportional to the stratum size
- Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.

Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the
Probability
dropdown menu and choose Random Selection. The Dataset Random Selection dialog opens (see Figure 23.5).Select your dataset from the Dataset dropdown.
Select your desired Sample Size, number of samples (Replications), and Seed.
Select one of With or Without replacement.
(Optional) If there is a numerical variable that consists of weights (probability of selection) for each case, select the variable from the Probability dropdown.
- If no variable is selected, all cases to be sampled will have the same probability of getting selected.
- The values of the probability variable must be all non-negative.
- If the values of the probability don’t add up to one, they will be internally normalized to add up to one for each stratum.
In the Stratified Sample section, select the stratification variable from the Stratify by dropdown. The stratification variable must be a factor (categorical).
There are two options of Equal and Proportional available for the stratified random sampling. With the option Equal each of the samples selected from each stratum have the same size that you specify. With the option Proportional, the sample size is proportional to the stratum size. Specifically, if \(n\) is the number specified in the Sample Size textbox, \(N\) is the total number of cases in the selected dataset, and \(N_i\) is the number of cases for stratum \(i\), then the number of cases selected from stratom \(i\) will be \(round(n * N_i/N)\).
In this example we select the variable TYPE as a stratification variable. This variable has two levels of Domestic (35 cases) and Import (47 cases). Since we selected \(n=5\), using the proportion option, we get 2 samples from Domestic and 3 samples from Import.
When we use the stratified sampling options, we cannot select row subsets. However, selecting subset of columns is possible. In our example, we select from the fourth (MPG) and fifth (HP) columns of the “cardata” dataset which are in columns 4 and 5 of the dataset.
Click the Preview icon
to view the result.
(Optional) You can save the result as a stand-alone dataset by typing a name in the
Save Dataset As
textbox and clicking on theSave Dataset As
button.

Figure 23.5: Random Selection Dialog for stratified random sampling

Figure 23.6: Output of random selection