5 Functions for Data Manipulation

In this section we describe various data functions that are used for manipulating data in various ways.

5.1 Subsetting (Filtering) Data

You can obtain subsets of a dataset by filtering rows, columns, or a combination of rows and columns. You filter rows by selecting row numbers or using logical expressions, and you can obtain a subset of columns by column number or by selecting specific columns. See below for examples.

5.1.1 Subsetting Rows

Instructional video icon    Users guide icon

To obtain a subset of rows of a dataset use the following instructions:

Steps for Obtaing a Subset of Rows:

  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.
Screenshot of the first 5 rows of the Cardata dataset.

Figure 5.1: Screenshot of the first 5 rows of the Cardata dataset.

  1. Open the Data toolbox on the left-hand side of the Rguroo window. Use the Functions dropdown menu and choose the Subset function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

Data subset dialog

Figure 5.2: Data subset dialog.

  1. You can either subset rows using the Sequence option, where you select rows by row number, or using the Logical Expression options, where you subset rows by specifying logical conditions. Below is an example of selecting every other row from rows 1 to 82.
subset sequence dialog

Figure 5.3: Example of subsetting rows using a sequanece of values.

The following is an example where we use the Logical Expressions option to choose only cars whose TYPE is Domestic. You can add logical expressions by clicking the plus icon plus button icon, and you can combine the logical expressions in the Logical Expression Calculator at the bottom.

subset Logical expression dialog showing selection of domestic cars

Figure 5.4: Example of subsetting rows using logical expressions.

You can also create multiple logical expressions and combine them. The example below shows that we take the intersection of of the expressions S4 and L4, defined above.

subset Logical expression dialog showing combining expressions

Figure 5.5: Example of subsetting rows using multiple logical expressions.

  1. When you are done with your selection, click the Preview icon Preview icon.
Click here to see the output. Output of the subset of rows.

5.1.2 Subsetting Columns

Instructional video icon      Users guide icon

To obtain a subset of columns of a dataset use the following instructions:

Steps for Obtaing a Subset of Columns:

  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Use the Functions dropdown menu and choose the Subset function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

Data subset dialog

Figure 5.6: Data subset dialog.

  1. Click the Select Columns button. A dialog opens. Use the arrows (or drag-and-drop, or double-click) to move your desired variables to the right column. The example below shows that we have selected the variables Make.Model, CLASS, TYPE, MPG, and HP.
Data subset dialog for column selection

Figure 5.7: Data subset dialog for column selection.

  1. When you are done with your selection, click the Preview icon .
Click here to see a portion of the output. Output of the subset of rows.

5.2 Sorting Data

Instructional video icon    Users guide icon

You can sort cases in a dataset by sorting on one or more columns. below is an example.

Instructions for sorting data:

  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Sort function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. click the plus sign icon plus sign icon to add a sorting option, which includes selecting a variable and a choice of order, ascending or descending.

  4. When you are done, click the Preview icon .

Screenshot of Data Sort dialog

Figure 5.8: Data Sort dialog.

Click here to see the output. Screenshot of Sort output

5.3 Reshape (Transposing, Stacking and Unstacking Variables)

If you are looking to transpose data, that is switch rows and columns, you can use the Transpose function in the Dataset editor. For instructions on how to use the transpose function see 2.3.4.

If you are looking to stack or unstack data, you can use the Reshape function in the Dataset editor. Below are examples of stacking and unstacking data.

5.3.1 Wide to long (stacking columns)

You can change format of numerical data from wide to long where data values in selected columns will be stacked in one column.

Instructions for stacking data (wide to long):

  1. Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the MooreBP data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Reshape function. The Reshape dialog opens.

  2. Select your Dataset. In this example, we select MooreBP.

  3. In the Variables section, select the variables that you want to be stacked and move them to the Selected column. In this example, we selected variables beginning_bp and end_bp.

  4. (Optional) In the ID Variable Label textbox, type a name for the stacked variable. In this example, we typed \(\color{darkred}{\texttt{Blood\_Pressure}}\).

  5. Click the Preview icon preview icon to see the stacked data.

Screenshot of the Reshape dialog

Figure 5.9: Reshape dialog to stack data.

Click here to see the Rguroo output (stacked data). Screenshot of the MooreBP data.

5.3.2 Long to wide (unstacking columns)

You can unstack data that are in one column for each level of a categorical variable.

  1. Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the MooreBP data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Reshape function. The Reshape dialog opens.

  2. Select your Dataset. In this example, we select MooreBP.

  3. In the Variables section, move the variables that you want to be included to the Selected column. In this example, we selected variables beginning_bp and end_bp.

  4. In the ID Variable dropdown, select the categorical (factor) variable whose levels will be used for unstacking. In thi example, we used the variable group.

  5. Click the Preview icon preview icon to see the unstacked data.

Screenshot of the Reshape dialog

Figure 5.10: Reshape dialog to unstack data.

Click here to see the Rguroo output (unstacked data). Screenshot of the MooreBP data.

5.4 Transform (Transform and Create New Variables)

Instructional video icon    Users guide icon

Rguroo’s Data Transform function enables you to construct new variables and transform existing variables using the R language and R functions.

  1. Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the MooreBP data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Reshape function. The Data Transform dialog opens.

  2. Select a Dataset. All variable names will get listed under the column labeled Returned Variable. In this example, we select MooreBP.

  3. Click on the plus sign icon plus button icon in the Variable column and type a name for the new variable you are creating. The variable’s name appears in the list labeled Retuned Variable on the right side. You can add more than one variable by clicking on the plus sign icon plus button icon.

  4. In the middle text field type your R function (transformation). The function can be multiline. The value obtained in the last line of the code will be assigned to the variable. To write a variable name, you can either type it or double-click on the variable name in the Returned Variable list. In this example, we create a new variable Average_BP that is the mean of the variables, using two lines of R code.

  5. (Optional) In the Returned Variable column, you can rearrange the order of variables. Also, you can remove variables by dropping their names in the Excluded Variable list. In this example, we dropped the variable decrease_bp and moved the newly created variable Average_bp to the last column.

  6. Click the Preview icon preview icon to see the newly created dataset.

Screenshot of the Transform dialog

Figure 5.11: Transform dialog to create new variables.

Click here to see the Rguroo output. Screenshot of the MooreBP data after transformation.

5.5 Merge Datasets

Instructional video icon      Users guide icon

Rguroo’s Merge function enables you to merge two dataset with various options. Below is a simple example.

Instructions for merging datasets:

  1. Use datasets in your Rguroo account or recreate the example below by importing the datasets MergeSet1 and MergeSet3 from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the datasets. Screenshot of the Mergeset1 and Mergeset2 data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Merge function. The Data Merge dialog opens.

  2. Select the two datasets that you want to merge from the Primary Dataset and Secondary Dataset dropdowns. In this example, we selected MergeSet1 and MergeSet3.

  3. (Optional) Click on the plus sign icon plus button icon to select the variable based on which the datasets are to be merged. In this example, we selected the variable Physician from the primary dataset and the variable Doctor from the secondary dataset, both of which have the same content with different variable names.

  4. Click the Preview icon preview icon to see the newly created dataset.

Screenshot of the Merge Data dialog

Figure 5.12: Merge dialog to merge datasets.

Click here to see the Rguroo output. Screenshot of the merge output.

5.6 Append Datasets

Instructional video icon      Users guide icon

Rguroo’s Append function enables you to append two dataset with various options. Below is a simple example.

Instructions for appending datasets:

  1. Use datasets in your Rguroo account or recreate the example below by importing the datasets DataSetA and DataSetB from the Rguroo dataset repository Rguroo Users Guide into your account.
Click here to see the datasets. Screenshot of the Mergeset1 and Mergeset2 data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Append function. The Data Append dialog opens.

  2. Select the two datasets that you want to append from the Top and Bottom dropdowns. In this example, we selected DataSetA and DataSetB.

  3. Click the Preview icon preview icon to see the newly created dataset.

Screenshot of the Append Data dialog

Figure 5.13: Append dialog to append datasets.

Click here to see the Rguroo output. Screenshot of the append output.