5 Functions for Data Manipulation
In this section we describe various data functions that are used for manipulating data in various ways.
5.1 Subsetting (Filtering) Data
You can obtain subsets of a dataset by filtering rows, columns, or a combination of rows and columns. You filter rows by selecting row numbers or using logical expressions, and you can obtain a subset of columns by column number or by selecting specific columns. See below for examples.
To obtain a subset of rows of a dataset use the following instructions:
Steps for Obtaing a Subset of Rows:
- Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.
Open the Data toolbox on the left-hand side of the Rguroo window. Use the
Functions
dropdown menu and choose the Subset function.Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.
- You can either subset rows using the
Sequence
option, where you select rows by row number, or using theLogical Expression
options, where you subset rows by specifying logical conditions. Below is an example of selecting every other row from rows 1 to 82.
The following is an example where we use the Logical Expressions
option to choose only cars whose TYPE is Domestic. You can add logical expressions by clicking the plus icon , and you can combine the logical expressions in the Logical Expression Calculator at the bottom.
You can also create multiple logical expressions and combine them. The example below shows that we take the intersection of of the expressions S4 and L4, defined above.
- When you are done with your selection, click the Preview icon .
Click here to see the output.
To obtain a subset of columns of a dataset use the following instructions:
Steps for Obtaing a Subset of Columns:
- Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.
Open the Data toolbox on the left-hand side of the Rguroo window. Use the
Functions
dropdown menu and choose the Subset function.Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.
- Click the
Select Columns
button. A dialog opens. Use the arrows (or drag-and-drop, or double-click) to move your desired variables to the right column. The example below shows that we have selected the variables Make.Model, CLASS, TYPE, MPG, and HP.
- When you are done with your selection, click the Preview icon .
Click here to see a portion of the output.
You can sort cases in a dataset by sorting on one or more columns. below is an example.
Instructions for sorting data:
- Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset.
Open the Data toolbox on the left-hand side of the Rguroo window. Click on the
Functions
dropdown menu and choose the Sort function.Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.
click the plus sign icon to add a sorting option, which includes selecting a variable and a choice of order, ascending or descending.
When you are done, click the Preview icon .
Click here to see the output.
5.3 Reshape (Transposing, Stacking and Unstacking Variables)
If you are looking to transpose data, that is switch rows and columns, you can use the Transpose function in the Dataset editor. For instructions on how to use the transpose function see 2.3.4.
If you are looking to stack or unstack data, you can use the Reshape function in the Dataset editor. Below are examples of stacking and unstacking data.
5.3.1 Wide to long (stacking columns)
You can change format of numerical data from wide to long where data values in selected columns will be stacked in one column.
Instructions for stacking data (wide to long):
- Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data.
Open the Data toolbox on the left-hand side of the Rguroo window. Click on the
Functions
dropdown menu and choose the Reshape function. The Reshape dialog opens.Select your Dataset. In this example, we select MooreBP.
In the Variables section, select the variables that you want to be stacked and move them to the
Selected
column. In this example, we selected variables beginning_bp and end_bp.(Optional) In the ID Variable Label textbox, type a name for the stacked variable. In this example, we typed \(\color{darkred}{\texttt{Blood\_Pressure}}\).
Click the Preview icon to see the stacked data.
Click here to see the Rguroo output (stacked data).
5.3.2 Long to wide (unstacking columns)
You can unstack data that are in one column for each level of a categorical variable.
- Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data.
Open the Data toolbox on the left-hand side of the Rguroo window. Click on the
Functions
dropdown menu and choose the Reshape function. The Reshape dialog opens.Select your Dataset. In this example, we select MooreBP.
In the Variables section, move the variables that you want to be included to the
Selected
column. In this example, we selected variables beginning_bp and end_bp.In the ID Variable dropdown, select the categorical (factor) variable whose levels will be used for unstacking. In thi example, we used the variable group.
Click the Preview icon to see the unstacked data.
Click here to see the Rguroo output (unstacked data).
Rguroo’s Data Transform function enables you to construct new variables and transform existing variables using the R language and R functions.
- Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data.
Open the Data toolbox on the left-hand side of the Rguroo window. Click on the
Functions
dropdown menu and choose the Reshape function. The Data Transform dialog opens.Select a Dataset. All variable names will get listed under the column labeled
Returned Variable
. In this example, we select MooreBP.Click on the plus sign icon in the
Variable
column and type a name for the new variable you are creating. The variable’s name appears in the list labeledRetuned Variable
on the right side. You can add more than one variable by clicking on the plus sign icon .In the middle text field type your R function (transformation). The function can be multiline. The value obtained in the last line of the code will be assigned to the variable. To write a variable name, you can either type it or double-click on the variable name in the
Returned Variable
list. In this example, we create a new variable Average_BP that is the mean of the variables, using two lines of R code.(Optional) In the
Returned Variable
column, you can rearrange the order of variables. Also, you can remove variables by dropping their names in theExcluded Variable
list. In this example, we dropped the variable decrease_bp and moved the newly created variable Average_bp to the last column.Click the Preview icon to see the newly created dataset.
Click here to see the Rguroo output.
Rguroo’s Merge function enables you to merge two dataset with various options. Below is a simple example.
Instructions for merging datasets:
- Use datasets in your Rguroo account or recreate the example below by importing the datasets MergeSet1 and MergeSet3 from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the datasets.
Open the Data toolbox on the left-hand side of the Rguroo window. Click on the
Functions
dropdown menu and choose the Merge function. The Data Merge dialog opens.Select the two datasets that you want to merge from the Primary Dataset and Secondary Dataset dropdowns. In this example, we selected MergeSet1 and MergeSet3.
(Optional) Click on the plus sign icon to select the variable based on which the datasets are to be merged. In this example, we selected the variable Physician from the primary dataset and the variable Doctor from the secondary dataset, both of which have the same content with different variable names.
Click the Preview icon to see the newly created dataset.
Click here to see the Rguroo output.
Rguroo’s Append function enables you to append two dataset with various options. Below is a simple example.
Instructions for appending datasets:
- Use datasets in your Rguroo account or recreate the example below by importing the datasets DataSetA and DataSetB from the Rguroo dataset repository Rguroo Users Guide into your account.
Click here to see the datasets.
Open the Data toolbox on the left-hand side of the Rguroo window. Click on the
Functions
dropdown menu and choose the Append function. The Data Append dialog opens.Select the two datasets that you want to append from the Top and Bottom dropdowns. In this example, we selected DataSetA and DataSetB.
Click the Preview icon to see the newly created dataset.