22 Random Selection from a Dataset

Instructional video icon

You can select a random sample from an Rguroo dataset, with or without replacement, and replicate each sample. Moreover, you can apply statistic to each selected sample using R code.

22.1 Selecting Samples

Instructions to select random samples from a dataset

  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Random Selection. The Dataset Random Selection dialog opens (see Figure 22.1).

  2. Select your dataset from the Dataset dropdown.

  3. Select your desired Sample Size, number of samples (Replications), and Seed.

  4. Select one of With or Without replacement.

  5. (Optional) If there is a numerical variable that consists of weights (probability of selection) for each case, select the variable from the Probability dropdown.

  • If no variable is selected, all cases to be sampled will have the same probability of getting selected.
  • The values of the probability variable must be all non-negative.
  • If the values of the probability don’t add up to one, they will be internally normalized to add up to one.
  1. (Option) In the Sample a Subset section, you can specify which rows and columns to sample from.
  • You can select rows using textboxes From –> To –> By or select rows by writing an R code in the Add Rows that results in specific row numbers. You can use both From –> To –> By and Add Rows at the same time.
  • You can select columns by writing an R code in the Columns that results in specific column numbers.
  • If length blank, all rows and columns will appear in the sample.

In our example, we select from the fourth (MPG) and fifth (HP) columns of the “cardata” dataset, and we only sample “Domestic” cars.

  1. Click the Preview icon preview icon to view the result.

  2. (Optional) You can save the result as a stand-alone dataset by typing a name in the Save Dataset As textbox and clicking on the Save Dataset As button.

Random Selection dialog

Figure 22.1: Random Selection Dialog

Output of random Selection

Figure 22.2: Output of random selection

22.2 Applying Statistics to Selected Samples

You can apply functions to your selected random samples by writing R code. In the example below, we write a function that creates a variable called Efficiency. For each sample selected, we compute the mean of MPG and depending on whether this mean is more than 30, between 20 and 30, or less than 30, the value of Efficiency is set as High, Average, or Low.

Instructions to apply statistics to selected samples

Continue the Rguroo instructions of Section 22.1.

  1. Click the Statistic button on the top right of the application. The Custom Statistic dialog opens (see Figure 22.3) .

  2. Click the plus icon plus button icon on the Custom Statistic dialog. In the textbox that appears, type in a variable name.

  3. Type your R code on the middle panel.

  • You can double-click the names of the variables to include in your code or type them in.
  • You can write multiple lines of code. However, the result of your code, when applied to each sample (replicate), must be a single number or character.
  1. Click the Preview icon preview icon to view the result.

  2. (Optional) You can save the result as a stand-alone dataset by typing a name in the Save Dataset As textbox and clicking on the Save Dataset As button.

Random selection dialog for computing statistics

Figure 22.3: Random Selection Dialog for Computing Statistics

Output of summary stats

Figure 22.4: Output of summary statistics