22 Random Selection from a Dataset
You can select a random sample from an Rguroo dataset, with or without replacement, and replicate each sample. Moreover, you can apply statistic to each selected sample using R code.
Instructions to select random samples from a dataset
- Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.
Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the
Probability
dropdown menu and choose Random Selection. The Dataset Random Selection dialog opens (see Figure 22.1).Select your dataset from the Dataset dropdown.
Select your desired Sample Size, number of samples (Replications), and Seed.
Select one of With or Without replacement.
(Optional) If there is a numerical variable that consists of weights (probability of selection) for each case, select the variable from the Probability dropdown.
- If no variable is selected, all cases to be sampled will have the same probability of getting selected.
- The values of the probability variable must be all non-negative.
- If the values of the probability don’t add up to one, they will be internally normalized to add up to one.
- (Option) In the Sample a Subset section, you can specify which rows and columns to sample from.
- You can select rows using textboxes From –> To –> By or select rows by writing an R code in the Add Rows that results in specific row numbers. You can use both From –> To –> By and Add Rows at the same time.
- You can select columns by writing an R code in the Columns that results in specific column numbers.
- If length blank, all rows and columns will appear in the sample.
In our example, we select from the fourth (MPG) and fifth (HP) columns of the “cardata” dataset, and we only sample “Domestic” cars.
Click the Preview icon to view the result.
(Optional) You can save the result as a stand-alone dataset by typing a name in the
Save Dataset As
textbox and clicking on theSave Dataset As
button.
You can apply functions to your selected random samples by writing R code. In the example below, we write a function that creates a variable called Efficiency. For each sample selected, we compute the mean of MPG and depending on whether this mean is more than 30, between 20 and 30, or less than 30, the value of Efficiency is set as High, Average, or Low.
Instructions to apply statistics to selected samples
Continue the Rguroo instructions of Section 22.1.
Click the
Statistic
button on the top right of the application. The Custom Statistic dialog opens (see Figure 22.3) .Click the plus icon on the Custom Statistic dialog. In the textbox that appears, type in a variable name.
Type your R code on the middle panel.
- You can double-click the names of the variables to include in your code or type them in.
- You can write multiple lines of code. However, the result of your code, when applied to each sample (replicate), must be a single number or character.
Click the Preview icon to view the result.
(Optional) You can save the result as a stand-alone dataset by typing a name in the
Save Dataset As
textbox and clicking on theSave Dataset As
button.