1 About This Guide

This quick guide provides step-by-step Rguroo instructions and examples for common statistical analyses and graphs. Throughout, there are clickable video icons Instructional video icon linked to the Rguroo Video Tutorials Library for the presented topic. Additionally, you can get more technical details for most topics by clicking on the User’s Guide icon Users guide icon.

Most examples use data available in the Rguroo dataset repositories, and therefore you can reproduce the examples by importing the datasets from the Rguroo dataset repositories. See the Rguroo’s Dataset Repository section to learn how to import a dataset from the repository to your Rguroo account.

2 Rguroo Dataset Editor

You can use Rguroo’s dataset editor to perform the following tasks:

  • Create new datasets and edit existing datasets
    • Copy-paste data within the editor
    • Copy-paste data to and from an external source
    • Search and replace text
    • Add, remove, and rename variables
    • Add and remove rows
    • Reorder variables
    • Filter data
  • Define variable properties
    • Define variable types (numerical, nominal, ordinal, label/ID)
    • Edit levels of categorical variables (reorder, relabel)
  • Additional utilities
    • Obtain a quick data summary
    • Sort data
    • Pin columns and resize columns
    • Export data as CSV or Excel

2.1 Components of the Dataset Editor

The following are the main components of the dataset editor:

  • magnifier icon Search and Replace icon, used for searching and replacing text.

  • Row column icon Variable and Row Management icon, used for performing tasks such as renaming variables, adding variables, managing variable types, relabeling and reordering levels of Factor (categorical) variables, and adding rows and variables.

  • Summary data icon Summary Data icon, used for getting a quick data summary for the dataset that is being viewed.

  • trash icon Deleted Rows icon, used for retaining deleted rows and restoring them.

  • three horizontal bar Variable Context Menu icon appears when your cursor is placed over a variable name on the header. It provides options for for pinning and auto-sizing columns and renaming the corresponding variable.

  • Right-click on a cell provides options for copying and cutting selected cells, and pasting then (using Cntrl+V). It moreover, allows for adding and deleting rows, and exporting data to a CSV or Excel file.

  • Quick search textbox Quick Search textbox, used for forward and backward text search.

  • Quick Filter textbox Quick Filter textbox, used for filtering rows that contain a specific text within the entire dataset.

  • Save as textbox Save As textbox, used for saving datasets.

  • Columns list Column Sidebar Menu, located on the right-hand side of the editor window, includes a list of all the variables in the dataset. It is used for reordering, removing, or restoring removed variables (columns).

  • Filter variables Filter Sidebar Menu, located on the right-hand side of the editor window, includes a list of all the variables in the dataset. It is used for filtering specific values within each variable.

2.2 Creating and Editing Datasets

2.2.1 Creating a new dataset Instructional video icon

You can create and save a new dataset using the following instructions:

Creating a Dataset
  1. Open the Data toolbox located on the left-hand side of the Rguroo application window.

  2. Click on the Data Import dropdown, Data import Dropdown, and select the Create New Dataset option. The Create New Dataset dialog opens. Specify the number of rows and columns for your new dataset.

 Create New Dataset Dialog


  1. The default variable names will be Var1, Var2, etc. Refer to the Section Changing Variable Names to see how to change the variable names.

  2. Enter your data. See the Notes below on how to navigate between cells when entering data.

  3. Enter a name for your dataset in the Save As textbox on the top, and click the Save as button button to save your dataset.


Notes

  • By default, the number of rows is set to 500 and the number of columns is set to 30. You have the option to modify these values. If you are uncertain about the specific number of rows and columns you will need,select a larger grid by specifying a large number of rows and columns. This ensures that you have enough space to accommodate your data. After saving the dataset, when you reopen it, the grid will automatically adjust to match the actual number of rows and columns used by your data.

  • You can add additional rows and columns, if needed (see section 2.3).

  • When you login, Rguroo’s dataset editor opens in a tab without any data. You can close the dataset editor tab at anytime.

  • To enter values into cells and navigate through them using the keyboard, follow these instructions:

    • Click on a cell to select it.
    • Type in the desired value.
    • To move horizontally between cells: -Press the Tab key to move forward.
      • Press the Shift-Tab keys to move backward.
    • To move up and down between cells:
      • Type in a value.
      • Press the Enter key.
      • Use the up, down, right and left arrow key to move to the adjacent cells in the respective direction.

2.2.2 Editing an existing dataset

To edit an existing dataset use the following instructions:

Editing an Existing Dataset
  1. Open the Data toolbox located on the left-hand side of the Rguroo application window to see the names of the existing datasets.

  2. Use one of the following three options to edit a selected dataset:

    • Double-click on the dataset name
    • Right-click on the name and select the option Edit from the context menu
    • Select the name and press the Enter key.
  3. Use the dataset editor features to modify your dataset. See Notes in the Section Creating a new dataset on how to enter data and navigate between cells.

  4. Click the Save as button button to save your changes.

2.3 Variable and Row Management Dialog

The Variable and Row Management dialog can be accessed by clicking on the its icon Row column icon. This dialog is used for changing variable names, changing variable types, reordering and relabeling levels of factors (categorical variables), and adding rows and columns (variables) to your dataset.


Note: Categorical variables, and specifically Nominal and Ordinal variables, are referred to as Factors in Rguroo.

2.3.1 Changing Variable Names

In the dataset editor, the name of each variable is displayed on the column header. Default variable names are Var1, Var2, etc. You have two options, shown below, for changing variable names. The first option is useful when you need to change the names of multiple variables at the same time, while the second option is useful when you want to change the name of a single variable quickly.

Changing Variable Names: Option 1
  1. Click on the Variable and Row Management icon Row column icon. This will open the Variable and Row Management dialog.

  2. From the list of variables displayed, select the variable that you want to rename.

  3. Locate the Name textbox within the dialog, and type your desired new variable name in the textbox.

  4. Press the Enter key on your keyboard. The new variable name will now appear in the header, replacing the old name.

Note: Variable names cannot contain spaces. Make sure to avoid using spaces when entering the new variable name.


Variable and Row Management Dialog: Changing Variable Names

 Variable and Row Management Dialog


Changing Variable Names: Option 2
  1. Move your cursor over the variable name on the header. You will see the Variable Context Menu icon three horizontal bar.

  2. Click on the icon to open the variable context menu (shown on the left panel in the figure below).

  3. From the menu select the option Rename. This will bring up the Rename Variable dialog box (shown on the right panel in the figure below).

  4. In the dialog box, type in the new name for your variable, and press enter. The new name will now appear on the header. Note that variable names cannot have spaces.


Variable Context Menu

Variable context menu


2.3.2 Adding new variables

Use the following instructions to add new variables to your dataset:

Adding Variables
  1. Click on the Variable and Row management icon Row column icon to open its dialog.

  2. Select the Add Variables option. The dialog shown in the figure below shows.

  3. Type in the number of variables that you want to add in the No. of Variables textbox.

  4. Select one of the Beginning or End options to place the variables.

  5. Type a prefix for the name of the variables in the Name textbox. If you are adding a single variable, simply type in the name of the variable. (See notes below regarding variable names when adding multiple variables).

  6. Click on the Add variables button.


Add Variables Dialog


 Add Variables Dialog


Notes

When adding variables to your dataset, there are two possible scenarios to consider for naming the variables:

  • Adding a Single Variable:

    • If you are adding only one variable, simply type its name directly.
    • For example, if you want to add a variable named “age”, just type “age” as the name.
  • Adding Multiple Variables:

    • If you are adding multiple variables simultaneously, you need to specify a prefix that will be used for all the variable names.
    • For instance, if you type in “foo” as the prefix, the first variable will be named “foo”, the second variable will be named “foo1”, the third variable will be named “foo2”, and so on.
    • To change variable names to your desired values, refer to the Section 2.3.1 for instructions on modifying the names according to your specific needs.

To change the location of variables in your dataset, you have two options:

  • Click and hold:

    • On the header, locate the variable name you want to move.
    • Click and hold the variable name.
    • Drag the variable name to your desired location in the header.
    • Release the mouse button to place the variable in the new location.
  • Columns Side Bar:

    • Open the Columns Side Bar.
    • Click and hold the column handle the column handle Column handle associated with the variable you want to move.
    • Drag the variable to your desired location within the side bar.
    • Release the mouse button to position the variable in the new location.

2.3.3 Adding new rows

There are two options for adding rows to a dataset. The first option, described below, is useful for adding multiple rows simultaneously, while the second option is useful when you want to add one row at a time.

Adding Rows: Option 1
  1. Click on the Row column icon icon to open the Variable and Row Management dialog.

  2. Select the Add Rows option. The dialog box shown in the figure below will appear.

  3. Type in the number of rows that you want to add in the # of Rows textbox.

  4. Select from one of the options Beginning, End, or Below Selected Row to indicate where you want the rows to be added.

  5. Click on the Add rows button.


Add Rows Dialog


Add Rows Dialog


Notes:

  • To move the added rows, you can use the cut and paste option within the editor.

  • If a column is in sort mode and you click on the Add rows button repeatedly, each time you click the newly added rows will be placed in their appropriate sort position.


Adding a Single Row: Option 2

To add a new row above or below a selected row, follow these steps:

  1. Select the row you want to add a new row above or below.

  2. Right-click on the selected row to open the context menu.

  3. Click on the Add Row option in the context menu.

  4. Select either Below or Above to indicate where you want the new row to be added. The new row will be added either below or above the selected row, depending on your selection.


2.3.4 Modifying variable types and factor levels

Each variable in Rguroo is classified into one of the following types: Numerical, Nominal, Ordinal, or Label/ID. In Rguroo’s application dialog boxes, Nominal and Ordinal variables are identified as Factors. For detailed information on how Rguroo determines default variable types and how to convert one variable type to another using the Variable and Row Management dialog, refer to the section Variable Types in Rguroo. This section also provides instructions for relabeling factor levels and reordering them.

2.4 Additional Dataset Editor Utilities

2.4.1 Deleting single or multiple rows

To remove rows from your dataset, you can use the following methods:

Deleting Rows
  • Removing a Single Row:

    • Select the row you want to remove by clicking on it.
    • Right-click your mouse to open a dialog menu.
    • In the dialog menu, locate and select the option Delete Row.
    • The selected row will be deleted from the dataset.
  • Removing Multiple Rows:

    • Select the rows that you want to delete.
    • Right-click your mouse to open the dialog menu.
    • In the dialog menu, find and select the option Delete Selected Rows.
    • The selected rows will be deleted from the dataset.


2.4.2 Variable context menu

When you click on a variable name, a three-horizontal-bar icon three horizontal bar will appear. If you click on it, the Variable Context Menu, shown below, appears where you can perform the following:

  • Pin the selected column to the left or right, and if the column is pinned you can unpin it.

  • Autosize the selected column to fit contents of the cells in that column or autosize all columns in the dataset.

  • Reset the column order to their original order, in case you have moved columns around.

  • Rename the variable shown in the selected column.

Variable Context Menu

Vaiable Context Menu


Note that there is a filter icon filter icon and a four-vertical-bar icon Four-vertical-bar-icon also on the Variable Context menu. Click on the filter icon to filter values for the selected variable. Click on the four-vertical-bar icon to filter variables in the dataset. This option is also available in the Column Sidebar menu.

2.4.3 Sort and multicolumn sort

When you click on a variable name (column label) for the first time, the values in that column get sorted in ascending order, and an up-arrow appears next to the variable name. If you click on the variable name for a second time, the values in the column will be sorted in descending order, and a down-arrow appears next to the variable name. Clicking on the variable name for a third time will return the values of the variable to their original order.

You can repeat this process by clicking on the variable name again to toggle the sorting between original, ascending, and descending orders. Each click on the variable name will cycle through these three orders.

To perform a multicolumn sort, start by sorting the first column. Then, to sort based on additional variables, hold down the shift key and click on the new column. As long as you hold the shift key, the previously sorted columns will maintain their sort status.

Caution: When you apply sort to a column, you will lose your previous actions in the grid. This means that you cannot use the Ctrl-Z command to undo previous actions. However, you can undo the sort by repeatedly clicking on the sorted column yo get to the initial state.


2.4.4 Columns Sidebar menu

To open and close the Columns Sidebar menu, click on the Columns tab tab. Within the sidebar, you can select or deselect variables that you want to include or exclude from the dataset, respectively. You can also use the search textbox to search for specific variables.

2.4.5 Filters Sidebar menu

To open and close the Filters Sidebar menu, click on the Filters tab tab. Variable names are listed in this tab. Within this tab, variable names are listed. When you click on a variable name, you will see all of its unique values. You can filter the values of a selected variable by deselecting them. Note that when you filter a value from a variable, all rows containing that value will be removed from the dataset. To restore a filtered value, simply select the value again.

2.4.6 Row context menu

You can open the row context menu by right-clicking anywhere in the dataset editor. The context menu allows you to Cut, Copy, and Copy with Header your selected items. However, note that to paste copied contents, you must use Ctrl+V. Alternatively, you can cut and copy selected items using the keyboard shortcuts Ctrl+X and Ctrl+C, respectively.

To add a new row, use the Add Row option, which allows you to add a single row either above or below the currently selected row. To delete a row, use the Delete Row option, which deletes a single selected row, or use Delete Selected Rows, to remove multiple selected rows.

Use the Export option on the row context menu to export your data as a CSV or Excel file.

Row Context Menu

Row Context Menu


Notes and Caution

  • When you edit cells within the editor, you can use the Ctrl+Z shortcut to undo changes. However, if you sort a column or apply a function in the Variable and Row Management dialog, the Ctrl+Z undo option will no longer be available.

  • When you delete rows, they are retained in the editor’s recycle bin. You can restore these rows by clicking on the trash (recycle bin) icon trash icon. You can use the Restore option in the row recycle bin or copy and paste items from there to restore the deleted rows.

2.5 Search and Replace text

To quickly search for specific text in your dataset, type the text into the Quick Search textbox Quick search textbox, and click on the down-arrow to search forward or the up-arrow to search backward from the current cursor. location. The first instance of the searched text will be highlighted in green, and the cursor moves to that cell. All other instances will be highlighted in yellow. You can navigate to each instance using the forward and backward arrows within the Quick Search textbox. To cancel the search, click the “X” in the Quick Search textbox.

To perform a refined search in your dataset, click on the Search and Replace icon magnifier icon. This will open the Search and Replace dialog box, shown in the figure below. In the Find textbox type the search text that you like to search. Optionally, refine your search by selecting one of the following: Match case, Whole word, or Starts with. Then click the Find next button. The first instance of the searched text from the current cursor position will be highlighted in green and the cursor will move to that cell. All other instances of the searched text will be highlighted in yellow. To move to the next instance of the searched text (if exists), click on the Find next button repeatedly. To search backwards select the backwards checkbox. If you want to search backward, select the “backwards” checkbox. You can also use the down-arrow and up-arrow buttons within the Quick Search textbox to move forward and backward. To cancel the search, close the Search and Replace dialog box, and click the “X” within the Quick Search textbox.

Search and Replace Dialog


search and replace Dialog


Once you find the text that you are looking for, you can replace it by typing the replacement text in the Replace textbox. Clicking the replace button will replace the first instance of the searched text from the current cursor position and move the cursor to the next instance of the searched text. If you need to replace text backward, check the Backwards checkbox. If you want to replace all instances of the searched text, click the replace all button.

Caution: You cannot undo cells that you replace using the Search and Replace dialog.


3 Variable Types in Rguroo

A dataset in Rguroo consists of rows (cases) and columns (Variables). Each variable in Rguroo is classified as one of Numerical, Nominal, Ordinal, or Label/ID. Within the application dialog boxes in Rguroo, Nominal and Ordinal variables are identified as Factors.

3.1 Default Variable Types

When you upload a dataset to Rguroo, the variables are initially classified as one of Numerical, Nominal, or Label/ID. However, as we will explain shortly, it is possible to change the variable types within the application. The default classifications are as follows:

  • Variables with only numerical values are classified as Numerical Variables.
  • If a variable doesn’t consist entirely of numerical values, it will be classified as Nominal, except in the following cases where it will be classified as Label/ID:
    • If a variable has more than 1000 levels.
    • If a dataset has more than 50 cases, and the variable has the same number of levels as the number of cases (i.e., there is a unique ID for each case).

When you create a new dataset using Rguroo’s dataset editor, by default, a column with no values entered is classified as Numerical. If you only enter numerical values or “NA” (which represents missing data) in that column, the variable type will remain Numerical. However, as soon as you enter a non-numeric character (such as a letter or symbol) in any cell within that column, the variable type will automatically change to Nominal.

3.2 Changing Variable Types

To change variable types, you can either use Rguroo’s dataset editor, or Rguroo’s variable type editor.

Double-click on the name of the dataset to open it in Rguroo’s dataset editor. Then, click on the Row column icon icon to open the Variable and Row Management dialog. Then select the variable that you want to change its type, and select from one of Numerical, Nomina, Ordinal, or Label/ID.

“Variable and Row Management” Dialog

Variable and Row Management Dialog


To change the variable type using the Variable Type Editor, begin by right-clicking on the name of your dataset to open the Variable Type Editor dialog. From there, you can select the variable that you want to change its type and drag it to one of three options: Numerical,, Label/ID, or Factor/categorical(which can be seen in the dialog box shown below). If you want to classify a categorical variable as ordinal, you can select the checkbox labeledOrdinal. If you do not select this checkbox, the categorical variable will be classified as Nominal. When done with your changes, click the update button button.

Variable Type Editor

Variable and Row Management Dialog


When changing variable types in Rguroo, there are a few rules to keep in mind:

  • If a dataset has dependencies (i.e., has been used in another Rguroo function), you cannot change the variable types. This ensures that the functions that you have applied to the dataset are reproducible.

  • Any Numerical variable can be changed to Nominal, Ordinal, or Label/ID.

  • Any Numerical, Nominal, or Ordinal variable can be changed to Label/ID.

  • Any Nominal, Ordinal, or Label/ID variable that consists of all numerical values can be changed to Numerical.

  • Finally, it’s important to note that any variable that has non-numeric characters cannot be changed to numerical.

3.3 Managing Levels of a Factor (Categorical) Variable

You can relabel and reorder levels of a factor variable. This can be done either in Rguroo’s Dataset Editor or in the Variable Type Editor (see images in the previous subsection).

To reorder levels in the Dataset Editor use the following steps:

Option 1: Relabeling and Reordering Levels
  1. Double-click on the name of the dataset to open the dataset in the Rguroo editor.

  2. Click on the Row column icon icon to open the Variable and Row Management dialog.

  3. Select Variable Properties.

  4. Select a nominal or ordinal variable form the Variable list.

  5. Click on the level editor button button to open the Level Editor dialog. See image below for an example.

  6. To reorder levels of the variable, select a variable and drag it up or down.

  7. To relabel a level, Type in the new label in the label column. The name of a level is the default for its label.


Level Editor

Level Editor


3.4 Missing Data

In Rguroo, the letters “NA” (capitalized) are reserved for representing missing data, which is consistent with the notation used in the R computing language. If you want to remove cases with missing values from variables that are classified as Nominal or Ordinal, you can select the Exclude NA checkbox either in the Variable Type Editor dialog or in the Level Editor dialog within the Variable and Row Management dialog. See the previous subsections for how to access these dialog boxes.

The letters “NA” (capitalized) is reserved for missing data in Rguroo. This is in keeping with the notation that is used in the R computing language. In order to remove cases with missing values from variables classified as Nominal or Ordinal, you can select the checkbox Exclude NA in either the Variable Type Editor or in the Level Editor within the Variable and Row Management dialog.

4 Importing Datasets

4.1 Import a Dataset from a File or URL

Instructional video iconUsers guide icon

You can import files in various formats such as Excel, CSV, text, SPSS, SAS, etc. using the following instructions:

Rguroo Instructions
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Use the Data Import dropdown menu and choose the Import Dataset function. The File Import dialog opens.

  2. Choose a file or enter a URL where your file resides. Select or fill in the other optional parts of the File Import dialog appropriately.

  3. Click Upload.


Rguroo will upload your data to your account, and a summary statistic of the uploaded data will show.

Rguroo Dialog

File Import Dialog


Click here to see additional options.


  • You can import specific rows and columns from a file using the options in the Rows/Columns section.
  • You can define missing data and decimal characters using the options in the Characters section.

4.2 Rguroo’s Dataset Repository

Rguroo’s Dataset Repository consists of many data repositories, including data from R and R packages datasets, datasets associated with textbooks, and more. Faculty can create their own private dataset repository to share data with students. For more information on creating your private dataset repository see Rguroo Dashboard Quick Guide.

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Use the Data Import dropdown menu and choose the Dataset Repository function. The Repository Dataset Import dialog opens.

  2. In the textbox Search Repository type in the name of the repository, or locate the repository name from the repository list. Select your desired repository name. The name of the datasets in the selected repository show in the bottom panel.

  3. In the textbox Search Dataset on top of the bottom panel type in the name of the dataset or locate the repository name that you like to import to your account. Select the dataset name in the bottom panel.

  4. Click the Import button. The dataset will get imported to your Rguroo account.

  5. Click the Close button to close the Repository Dataset Import dialog.


Rguroo Dialog

Screenshot of the Repository Dataset Import dialog


Click here to see additional options.


  • You can use the Rguroo Dashboard to create your own private repositories and share your private repositories with others.

  • To access your private repositories, you select the option My Repositories in the Repository Dataset Import dialog.

5 Data Functions

5.1 Subsetting (Filtering) Data

You can obtain subsets of a dataset by filtering rows, columns, or a combination of rows and columns. You filter rows by selecting row numbers or using logical expressions, and you can obtain a subset of columns by column number or by selecting specific columns. See below for examples.

5.1.1 Subsetting Rows

Instructional video iconUsers guide icon

To obtain a subset of rows of a dataset use the following instructions:

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Use the Functions dropdown menu and choose the Subset function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

Rguroo Dialog

Data subset dialog

  1. You can either subset rows using the Sequence option, where you select rows by row number, or using the Logical Expression options, where you subset rows by specifying logical conditions. Below is an example of selecting every other row from rows 1 to 82.
Rguroo Dialog

subset sequence dialog


The following is an example where we use the Logical Expressions option to choose only cars whose TYPE is Domestic. You can add logical expressions by clicking the plus icon plus button icon, and you can combine the logical expressions in the Logical Expression Calculator at the bottom.

Rguroo Dialog

subset Logical expression dialog showing selection of domestic cars

You can also create multiple logical expressions and combine them. The example below shows that we take the intersection of of the expressions S4 and L4, defined above.

Rguroo Dialog

subset Logical expression dialog showing combining expressions

  1. When you are done with your selection, click the Preview icon Preview icon.


Click here to see the output. Output of the subset of rows.

5.1.2 Subsetting Columns

Instructional video iconUsers guide icon

To obtain a subset of columns of a dataset use the following instructions:

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Use the Functions dropdown menu and choose the Subset function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

Rguroo Dialog

Data subset dialog

  1. Click the Select Columns button. A dialog opens. Use the arrows (or drag-and-drop, or double-click) to move your desired variables to the right column. The example below shows that we have selected the variables Make.Model, CLASS, TYPE, MPG, and HP.
Rguroo Dialog

Data subset dialog for column selection

  1. When you are done with your selection, click the Preview icon .


Click here to see a portion of the output. Output of the subset of rows.


5.2 Sorting Data

Instructional video iconUsers guide icon

You can sort cases in a dataset by sorting on one or more columns. below is an example.

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Sort function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. click the plus sign icon plus sign icon to add a sorting option, which includes selecting a variable and a choice of order, ascending or descending.

  4. When you are done, click the Preview icon .


Rguroo Dialog

Screenshot of Data Sort dialog


Click here to see the output. Screenshot of Sort output


5.3 Reshape (Stacking and Unstacking Variables)

5.3.1 Wide to long (stacking columns)

You can change format of numerical data from wide to long where data values in selected columns will be stacked in one column.

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the MooreBP data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Reshape function. The Reshape dialog opens.

  2. Select your Dataset. In this example, we select MooreBP.

  3. In the Variables section, select the variables that you want to be stacked and move them to the Selected column. In this example, we selected variables beginning_bp and end_bp.

  4. (Optional) In the ID Variable Label textbox, type a name for the stacked variable. In this example, we typed \(\color{darkred}{\texttt{Blood\_Pressure}}\).

  5. Click the Preview icon preview icon to see the stacked data.


Rguroo Dialog

Screenshot of the Reshape dialog


Click here to see the Rguroo output (stacked data). Screenshot of the MooreBP data.


5.3.2 Long to wide (unstacking columns)

You can unstack data that are in one column for each level of a categorical variable.

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the MooreBP data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Reshape function. The Reshape dialog opens.

  2. Select your Dataset. In this example, we select MooreBP.

  3. In the Variables section, move the variables that you want to be included to the Selected column. In this example, we selected variables beginning_bp and end_bp.

  4. In the ID Variable dropdown, select the categorical (factor) variable whose levels will be used for unstacking. In thi example, we used the variable group.

  5. Click the Preview icon preview icon to see the unstacked data.


Rguroo Dialog

Screenshot of the Reshape dialog


Click here to see the Rguroo output (unstacked data). Screenshot of the MooreBP data.


5.4 Transform (Transform and Create New Variables)

Instructional video iconUsers guide icon

Rguroo’s Data Transform function enables you to construct new variables and transform existing variables using the R language and R functions.

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the MooreBP dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the MooreBP data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Reshape function. The Data Transform dialog opens.

  2. Select a Dataset. All variable names will get listed under the column labeled Returned Variable. In this example, we select MooreBP.

  3. Click on the plus sign icon plus button icon in the Variable column and type a name for the new variable you are creating. The variable’s name appears in the list labeled Retuned Variable on the right side. You can add more than one variable by clicking on the plus sign icon plus button icon.

  4. In the middle text field type your R function (transformation). The function can be multiline. The value obtained in the last line of the code will be assigned to the variable. To write a variable name, you can either type it or double-click on the variable name in the Returned Variable list. In this example, we create a new variable Average_BP that is the mean of the variables, using two lines of R code.

  5. (Optional) In the Returned Variable column, you can rearrange the order of variables. Also, you can remove variables by dropping their names in the Excluded Variable list. In this example, we dropped the variable decrease_bp and moved the newly created variable Average_bp to the last column.

  6. Click the Preview icon preview icon to see the newly created dataset.


Rguroo Dialog

Screenshot of the Transform dialog


Click here to see the Rguroo output. Screenshot of the MooreBP data after transformation.


5.5 Merge Datasets

Instructional video iconUsers guide icon

Rguroo’s Merge function enables you to merge two dataset with various options. Below is a simple example.

Rguroo Instructions
  1. Use datasets in your Rguroo account or recreate the example below by importing the datasets MergeSet1 and MergeSet3 from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the datasets. Screenshot of the Mergeset1 and Mergeset2 data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Merge function. The Data Merge dialog opens.

  2. Select the two datasets that you want to merge from the Primary Dataset and Secondary Dataset dropdowns. In this example, we selected MergeSet1 and MergeSet3.

  3. (Optional) Click on the plus sign icon plus button icon to select the variable based on which the datasets are to be merged. In this example, we selected the variable Physician from the primary dataset and the variable Doctor from the secondary dataset, both of which have the same content with different variable names.

  4. Click the Preview icon preview icon to see the newly created dataset.


Rguroo Dialog

Screenshot of the Merge Data dialog


Click here to see the Rguroo output. Screenshot of the merge output.


5.6 Append Datasets

Instructional video iconUsers guide icon

Rguroo’s Append function enables you to append two dataset with various options. Below is a simple example.

Rguroo Instructions
  1. Use datasets in your Rguroo account or recreate the example below by importing the datasets DataSetA and DataSetB from the Rguroo dataset repository Rguroo Users Guide into your account.
Click here to see the datasets. Screenshot of the Mergeset1 and Mergeset2 data.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Append function. The Data Append dialog opens.

  2. Select the two datasets that you want to append from the Top and Bottom dropdowns. In this example, we selected DataSetA and DataSetB.

  3. Click the Preview icon preview icon to see the newly created dataset.


Rguroo Dialog

Screenshot of the Append Data dialog


Click here to see the Rguroo output. Screenshot of the append output.


6 Summaries of Numerical Variables

A quick summary of variables in a dataset can be obtained using the instructions given in the Quick Summary section. To get specific summaries of numerical variables, you can use the function Summary Statistic under the Data toolbox, or the function is Numerical Summaries under the Analytics toolbox. The Numerical Summaries function computes both univariate and multivariate summary statistics.

6.1 Quick Data Summary

You can obtain a quick dataset summary by using the following steps:

Rguroo Instructions
  1. Under the Data toolbox, select the dataset that you want to get summaries for.

  2. Right-click (Shift-F10) on a dataset name and select the option Dataset Summary.


Rguroo Output
Screenshot of the Cardata dataset summary.

Figure 6.1: An example of a quick dataset summary output


6.2 The Summary Statistic Function

Users guide icon

To obtain specific univariate summary statistics, use the Summary Statistic function. This function also has the option of obtaining summary statistics for each level of a categorical variable.

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Data toolbox on the left-hand side of the Rguroo window. Click on the Functions dropdown menu and choose the Summary Statistic function. The Basic Summary Statistic dialog opens.

  2. Select your Dataset.

  3. From the Numerical dropdown, select the numerical variable that you like to summarize. (Optional) We have also selected a factor variable from the Factor 1 dropdown menu in this example.

  4. In the Statistics section of the dialog, select the checkboxes for your desired summary statistics.

  5. Click the Preview icon preview icon to see the summary statistics.


Rguroo Dialog

Screenshot of Summary Statistics dialog


Screenshot of the first 5 rows of the Cardata dataset.


Click here to see additional options.


  • You can get numerical summaries for each level of a factor variable if your dataset includes a categorical (factor) variable by selecting the Factor 1 and Factor 2 dropdowns.
  • You can also get weighted numerical summaries by selecting a weight variable from the Frequency dropdown.

6.3 The Numerical Summaries Function

The Numerical Summaries function computes various univariate and multivariate summary statistics. Multivariate statistics include correlation, covariance, and Cronbach Alpha. You can also obtain these statistics for each level of a factor, if your dataset consists of factor variables.

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Numerical Summaries function.

  2. Select the cardata from the Dataset dropdown menu.

  3. Select your variables. In this example, we selected HP and MPG.

  4. Click the button Univariate or Multivariate buttons to select your desired statistics. In this example, we have selected the five-number summary, the 2.5% and 97.5% quantiles, and the multivariate statistic correlation.

  5. Click on the preview icon preview icon.


Rguroo Dialog

Screenshot of Numerical Summaries dilog.


Output of numerical summaries.

Figure 6.2: An example of a Numerical Summary function output


Click here to see additional options.
  • You can add summary tables and name the summary tables by clicking the plus sign icon plus button icon.

  • If your dataset includes categorical (factor) variables, you can get numerical summaries for each level of a factor variable by selecting the Factor dropdown.

  • You can get weighted mean and in general weighted numerical summaries by selecting a weight variable from the Frequency dropdown.

  • By clicking on the Multivariate button, you can obtain correlation, variance-covariance, and Cronbach Alpha.

7 Tabulating Data

You can tabulate categorical or numerical variables in Rguroo and create frequency or relative frequency tables. You would use the Tabulation function in the Analytics toolbox to do so. To tabulate numerical data, you can use one of Freedman-Diaconis, Sturges, or Scott algorithms or use various customization options to set the classes (bins).

7.1 Tabulating Categorical Variables

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Tabulation –> Categorical function. The Data Tabulation dialog opens.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Click the plus icon plus button icon at the top of the first column. Type a name for your table.

  4. Select the qualitative variable that you want to tabulate from the Factor 1 dropdown.

  5. Select one or more of the options Counts, Proportions, or Percentage.

  6. Click the Preview icon preview icon to see the result.


Rguroo Dialog

Screenshot of Categorical Summaries dilog.


Screenshot of the tabulation output.


Click here to see additional options.

-Click the plus icon plus button icon to tabulate additional variables.

  • You can tabulate up to three categorical variables simultaneously using the Factor 1, Factor 2, and Factor 3 dropdowns.

  • You can compute conditional proportions by selecting the Cond checkbox.

  • You can add frequencies (weights) by selecting a frequency variable from the Frequency dropdown.

  • You can use the Level Editor level editor button to reorder the levels of the categorical variables.

  • You can save the tables as an Rguroo dataset by using the Save Dataset option on the top right.

  • For one-way tables, you can order categories (levels) in ascending or descending order of frequencies. By default, categories (levels) are ordered alphabetically based on their names.

7.2 Tabulating (Binning) Numerical Variables

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Tabulation –> Numerical (Binning) function. The Binning dialog opens.

  2. By default, a table name is shown in the Table column. You can overwrite the name with a title for your table.

  3. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  4. From the Variable dropdown, select the numerical Variable for which you like to create a frequency table.

  5. In the Bins section, select a method for binning. For the example shown, the bins start at 90 and have a width of 10.

  6. From the Report section, select one or more options of Counts, Proportions %, or Cumulative %.

  7. Click the Preview icon preview icon to see the result.


Rguroo Dialog

Screenshot of Binning dilog.


Screenshot of the binning output.


Click here for additional details.
  • Click the plus icon plus button icon to tabulate additional variables.

  • You can give a title to your table under the Table column.

  • Click the help icon help button icon to see descriptions of various methods in the Bins section.

  • You can compute mean, standard deviation, and variance based on the tabulated data.

  • On the top row of the Rguroo environment, there are three save options. The Save as ... saves your work as an Rguroo object so you can reproduce it. The Save Detail Dataset adds a column to the raw dataset indicating to what bin (class) each datum is assigned, and the Save Frequency Dataset.

8 Plots for Qualitative Data

This section provides instructions for creating plots for quantitative data. All plots in Rguroo are highly customizable. In addition to the options in the Basics button dialog, you can customize your plots using the options in the Details button dialog.

8.1 Barplot for Categorical Variables

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Barplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. To create a barplot with categories across the x-axis, open the Categorical tab (this is the default tab).

  4. Use the Factor 1 dropdown to select the categorical variable for the barplot. If your data is in the form of a frequency or relative frequency table, use the Frequency dropdown to choose your frequency variable.

  5. Click the Preview icon preview icon to see the barplot.


Rguroo Dialog

Screenshot of Barplot dilog.


Rguroo Output

Screenshot of a barplot displaying the number of cars by class.


Click here to see additional options.
  • You can obtain relative frequency barplots by selecting Proportions.

  • You can add value labels to the bars.

  • You can use Bar Order to reorder the bars or use the Level Editor level editor button to reorder the bar in any desired order.

  • You can create barplots for numerical variables by selecting the Numerical tab.


8.2 Pareto Chart

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Barplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. To create a Pareto Chart with categories across the x-axis, open the Categorical tab (this is the default tab).

  4. Use the Factor 1 dropdown to select the categorical variable for the barplot. Use the Frequency dropdown to choose your frequency variable, if your data is in the form of a frequency or relative frequency table.

  5. From the Bar Order dropdown, select Decreasing Value.

  6. Click the Preview icon preview icon to see the Pareto Chart.


Rguroo Dialog

Screenshot of Barplot dilog.


Rguroo Output

Screenshot of a Pareto chart displaying the number of cars by class.

8.3 Stacked Barplot

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Barplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. To create a barplot with categories across the x-axis, click the Categorical tab (this is the default tab).

  4. Select two categorical variables from the Factor 1 and Factor 2 dropdowns. If your data is in the form of a frequency table, select the frequency variable from the Frequency dropdown.

  5. Select the Stacked option. Adjust the other options appropriately.

  6. Click the Preview icon preview icon to see the stacked barplot.


Rguroo Dialog

Screenshot of Barplot dilog.


Screenshot of a stacked barplot displaying the number of cars by class.


8.4 Side-by-Side Barplot

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Barplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. To create a barplot with categories across the x-axis, click the Categorical tab (this is the default tab).

  4. Select two categorical variables from the Factor 1 and Factor 2 dropdowns. If your data is in the form of a frequency table, select the frequency variable from the Frequency dropdown.

  5. Select the Side by side option. Adjust the other options appropriately.

  6. Click the Preview icon preview icon to see the side-by-side barplot.


Rguroo Dialog

Screenshot of Barplot dilog.


Rguroo Output

Screenshot of a stacked barplot displaying the number of cars by class.


8.5 Pie Charts

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Pie Chart function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Select a categorical variable from the Factor dropdown. If data are in frequency table format, select the frequency variable from the Frequency dropdown.

  4. (Optional) To order the pie slices from largest to smallest, select Increasing Value in the dropdown Slice Order.

  5. Click the Preview icon preview icon to see the side-by-side barplot.


Rguroo Dialog

Screenshot of Piechart dilog.


Rguroo Output

Screenshot of a pie chart  displaying the proportion on of cars in each class.


8.6 Mosaic Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Contingency Table function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Select two categorical variables from the Factor 1 and Factor 2 dropdowns. If data are in a frequency table format, select the frequency variable from the Frequency dropdown.

  4. Select the Mosaic Plot option.

  5. Select the checkbox Mosaic Plot.

  6. Click the Preview icon preview icon to see the mosaic plot.


Rguroo Dialog

Screenshot of Barplot dilog.


Rguroo Output

Screenshot of a stacked barplot displaying the number of cars by class.


9 Plots for Numerical Data

This section gives Rguroo instructions for plots that display numerical data, including plots of numerical variables by categorical variables. All plots in Rguroo are highly customizable. In addition to the options in the Basics button dialog, you can customize your plots using the options in the Details button dialog.

9.1 Barplot for Numerical Variables

Instructional video iconUsers guide icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Barplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. To create a barplot for numerical data, click on the Numerical tab.

  4. Move one or more numerical variables to the Selected column. We selected the variable HP.

  5. (Optional) Use the Factor 1 dropdown to select a categorical variable to compare the numerical variable between the levels of the selected variable.

  6. (Optional) Select Conf. Bar to add confidence bars or Value Labels to add value labels to the bars.

  7. From the Function dropdown, select a function to be applied to the selected numerical variable(s). The default function is Mean, which computes mean of the selected variable(s).

  8. Click the Preview icon preview icon to see the barplot.


Rguroo Dialog

Screenshot of Barplot dilog for numerical variables.


Rguroo Output

Screenshot of a barplot displaying the horse power of cars by types of domestic and import.


9.2 Boxplot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Boxplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Move one or more numerical variables to the Selected column.

  4. (Optional) Move one or more factor variables to the Selected column.

  5. Click the Preview icon preview icon to see the graph.


Rguroo Dialog

Screenshot of Boxplot dilog.


Rguroo Output

Screenshot of side-by-side boxplots displaying the MPG of cars by class.


Click here to see a few additional options.
  • You can display the mean on boxplots by selecting the Show Mean checkbox.

  • You can draw horizontal boxplots by selecting the Horizontal checkbox.

  • You can use the Level Editor level editor button to reorder the boxplots by reordering the levels of the selected categorical variables.

  • You can identify outliers by clicking on the Details button button, and selecting the Outlier tab in the first section of the Graphs Settings dialog.

9.3 Bubble Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Bubbleplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Select a Predictor (x) variable, a Response (y) variable, and a Bubble Size variable from the dropdowns.

  4. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Bubbleplot dilog.


Rguroo Output

Screenshot of a bubbleplot displaying MPG by HorsePower, with Weight as the bubble size.


9.4 Dotplot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the Titanic dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Dotplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the Titanic dataset.

  3. Select a variable from the Numerical Variables section and move it to the Selected column.

  4. (Optional) Select a variable from the Factor Variables section and move it to the Selected column.

  5. Click the Preview icon preview icon view the graph.


Rguroo Dialog

Screenshot of Dotplot dilog.


Rguroo Output

Screenshot of a dotplot displaying the age distribution of Titanic passengers..

9.5 Histogram

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Histogram function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Select a Variable from the dropdown menu. Adjust the other options appropriately.

  4. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Histogram dilog.


Rguroo Output

Screenshot of a histrogram displaying MPG of cars.


Click here for a few additional options.
  • You can customize the bin breakpoints (classes). Click the Details button, open the Bins, Bars, Smoothing section, and under the Bins & Bars section select a method for Bin Breakpoints. Click the help icon help button icon to see descriptions of various methods.

  • You can draw histograms for levels of a categorical variable by selecting a categorical variable from the Factor dropdown.

  • You can choose one of the options of Frequency, Relative Frequency, or Density histogram.

  • You can superimpose your histogram with value labels, density curve, and normal curve.

9.6 Line Graph

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the air passenger dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Scatterplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the AirPassengers dataset.

  3. Select a Predictor (x) variable and a Response (y) variable from the dropdowns. Adjust the other options appropriately.

  4. In the Superimpose section, select the option Line Graph.

  5. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of scatterplot dilog.


Rguroo Output

Screenshot of a line graph showing the number of airline passengers over time.

9.7 Normal Probability Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Mean Inference —> One Population function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Select a Variable from the dropdown.

  4. Select the Normal Probability Plot checkbox.

  5. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Mean Inference dilog.


Rguroo Output

Screenshot of a normal quantile-quantile plot.

9.8 Scatterplot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Scatterplot function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata dataset.

  3. Select a Predictor (x) variable and a Response (y) variable from the dropdowns.

  4. (Optional) Select a categorical variable from the Factor dropdown.

  5. (Optional) You can identify outliers, using an ID Variable by selecting the Outliers checkbox.

  6. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Scatterplot dilog.


Rguroo Output

Screenshot of a scatterplot of the number of airpline passengers over time.

9.9 Stem and Leaf Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Plots toolbox on the left-hand side of the Rguroo window. Click on the Create Plot dropdown menu and choose the Stem and Leaf function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the cardata.

  3. Select a Variable from the dropdown.

  4. (optional) In the Scale section, type a value in the Scale textbox. The default scale is 1.

  5. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Stem-and-Leaf dilog.


Rguroo Output

Screenshot of a stem and leaf plot showing the miles per gallon of vehicle.

9.10 Time Series Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the AirPassengers dataset.

  3. From the Numerical Variables section, select and drag the variable AirPassengers to the Selected column.

  4. (Optional) Under the Time Specification section, from the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute, or seconds). The AirPassengers data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The AirPassengers data starts from January 1949, so set Year to \(\tt{1949}\) and Month to \(\tt{1}\).

  5. In the section Time Series Plot, select the Lines checkbox and optionally the Points checkbox.

  6. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog.


Rguroo Output

Screenshot of a time plot showing the number of airline passengers over time.

10 Proportions Inference

In this section, we describe how you can use Rguroo to construct confidence intervals and conduct tests of hypotheses for a single population proportion and the difference of two population proportions. You can perform these analyses using both summary statistics and raw data. Rguroo has many methods, including theory-based and simulation-based methods. In this guide, we will show examples of basic methods. You can see all available methods by clicking on the Details button buttons.

10.1 Confidence Interval for a Single Population Proportoion

10.1.1 With summary statistics

Instructional video icon

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Proportion Inference —> One Population.

  2. In the box labeled Factor, type a label for the categorical variable. For this example, we typed \(\color{darkred}{\texttt{Health Status}}\).

  3. In the box labeled Success, type a label for success. For this example, we typed \(\color{darkred}{\texttt{Healthy}}\).

  4. (Optional) Fill in the box labeled Failure Label. Note that this is a box next to the Frequency dropdown. For this example, we typed \(\color{darkred}{\texttt{Sick}}\).

  5. Fill in the Sample Size and # of Successes .

  6. In the section Confidence Interval, type in the confidence level and choose a method.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Proportion Inference dilog.


Rguroo Output
One Population Proportion Inference

Data Summary

Counts and Percentages: Health Status

- Healthy Sick Total
Count 30 20 50
Percentage 60 40 100

Confidence Interval: Health Status

Success = Healthy
Sample Size = 50
Number of Successes = 30
Proportion of Successes = 0.6
Confidence level = 95%

Method Midpoint Std Error Lower CL Upper CL Width
Binomial (Exact) 0.595891 0.069282 0.459751 0.732031 0.27228
Bootstrap (Percentile) 0.6 0.069138 0.46 0.74 0.28
Large Sample z 0.6 0.069282 0.46421 0.73579 0.271581

  • Number of Simulations = 10000
  • Random Number Generator Seed = 100

Bootstrap Confidence Interval Graph: Health Status

The title of the graph is  Bootstrap Distribution of the Sample Proportion  ,  The graph shows the distribution of   Proportion of Healthy  ,  paste(Observed ~ Sample ~ Proportion ~ ~hat(p) == 0.6, sep =

10.1.2 With raw data

Instructional video icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the Cowles dataset from the Rguroo dataset repository called car into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose the Proportion Inference —> One Population.

  2. Select a Dataset, Factor variable and Success level. The summary statistics will be automatically populated.

  3. Under the Confidence Interval section, set the Confidence Level and select one or more of the methods Binomial (Exact), Bootstrap, or Large Sample z.

  4. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Proportion Inference dilog.


Rguroo Output
One Population Proportion Inference

Data Summary

Counts and Percentages: volunteer

- yes no Total
Count 597 824 1421
Percentage 42.01267 57.98733 100

Confidence Interval: volunteer

Success = yes
Sample Size = 1421
Number of Successes = 597
Proportion of Successes = 0.4201
Confidence level = 95%

Method Midpoint Std Error Lower CL Upper CL Width
Binomial (Exact) 0.420292 0.0130936 0.3943 0.446285 0.0519853
Bootstrap (Percentile) 0.420127 0.0129199 0.394792 0.445461 0.0506685
Large Sample z 0.420127 0.0130936 0.394464 0.44579 0.051326

  • Number of Simulations = 10000
  • Random Number Generator Seed = 100

Bootstrap Confidence Interval Graph: volunteer

The title of the graph is  Bootstrap Distribution of the Sample Proportion  ,  The graph shows the distribution of   Proportion of yes  ,  Observed Sample Proportion p hat == 0.420127 , Percentile Confidence Limits: (0.395 to 0.445) , Number of Replications = 10000 , Seed = 100

10.2 Confidence Interval for Difference of Two Population Proportions

10.2.1 With summary statistics

Instructional video icon

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Proportion Inference —> Two Populations.

  2. In the Response Label textbox, type a label for the response variable. For our example, we typed \(\color{darkred}{\texttt{Feeling Better}}\).

  3. In the Success Label textbox, type a label for the success. For our example, we typed \(\color{darkred}{\texttt{yes}}\).

  4. (Optional) In the Failure textbox, type a label for the failure. For our example, we typed \(\color{darkred}{\texttt{no}}\).

  5. In the Population section, type a Population Label. For our example, we typed \(\color{darkred}{\texttt{Patients}}\).

  6. In the Population 1 section in the Summary tab, type in a Label, Sample Size, and # of Successes for population 1. For our example, we typed \(\color{darkred}{\texttt{Treatment, 30, 21}}\), respectively.

  7. In the Population 2 section in the Summary tab, type in a Label, Sample Size, and # of Successes for population 2. For our example, we typed \(\color{darkred}{\texttt{Control, 50, 30}}\), respectively.

  8. Click on the Confidence Interval tab, type in a value for the Confidence Level, and select one or more of the methods Large Sample z, Bootstrap (Percentile) or Wilson Score.

  9. Click the Preview icon preview icon to view the result.


Screenshot of Two Population Proportion Inference dilog.


Rguroo Output
Two Population Proportion Inference

Data Summary

Counts: Feeling Better by Patients

- yes no Total
Treatment 21 9 30
Control 30 20 50

Confidence Interval for Difference of Two Population Proportions

Success = yes
Population 1 = Treatment, Population 2 = Control
Sample Size: Treatment = 30, Control = 50
Number of Successes: Treatment = 21, Control = 30
Proportion of Successes: Treatment = 0.7, Control = 0.6
Confidence level = 95%

Method Midpoint Std Error Lower CL Upper CL Width
Large Sample z 0.1 0.108628 -0.112907 0.312907 0.425813
Bootstrap (Percentile) 0.0933333 0.10765 -0.12 0.306667 0.426667
Wilson-Score 0.112735 0.108628 -0.0920369 0.317508 0.409545

Bootstrap Confidence Interval Graph: Feeling Better

The title of the graph is  Bootstrap Distribution of Difference of Sample Proportions  ,  The graph shows the distribution of   Proportion Difference: (Treatment - Control)  ,  hat(p)[1] - hat(p)[2] == 0.1 , Percentile Confidence Limits: (-0.12, 0.307) , Number of Replications = 10000 , Seed = 100

10.2.2 With raw data

Instructional video icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Proportion Inference —> Two Populations.

  2. Select a Dataset. Under the Response/Success section, select a Response variable and a level corresponding to Success. In our example, we selected CLASS as a response variable and Sport as te level indicating success.

  3. From the Population dropdown in the Population section, select a variable that indicates population. In our example, we selected Type.

  4. (Optional) If data are in the form of a frequency table, select the Frequency variable.

  5. In the Population 1 and Population 2 sections in the Data Summary tab, select the levels representing the populations from their respective Level dropdowns. In our example, we selected Domestic cars for population 1 and Import for population 2. The summary statistics will be automatically populated.

  6. Open the Confidence Interval tab. Set a Confidence Level and select one or more of the methods Large Sample z, Bootstrap (Percentile) or Wilson Score.

  7. Click the Preview icon preview icon to view the result.

Rguroo Dialog

Screenshot of Two Population Proportion Inference dilog.


Rguroo Output
Two Population Proportion Inference

Data Summary

Counts: CLASS by TYPE

- Sport Others Total
Domestic 2 33 35
Import 7 40 47

Percentages: CLASS by TYPE

- Sport Others
Domestic 5.714286 94.28571
Import 14.89362 85.10638

Confidence Interval for Difference of Two Population Proportions

Success = Sport
Population 1 = Domestic, Population 2 = Import
Sample Size: Domestic = 35, Import = 47
Number of Successes: Domestic = 2, Import = 7
Proportion of Successes: Domestic = 0.05714, Import = 0.1489
Confidence level = 95%

Method Midpoint Std Error Lower CL Upper CL Width
Large Sample z -0.0917933 0.0650865 -0.219361 0.0357739 0.255135
Bootstrap (Percentile) -0.0881459 0.064403 -0.212766 0.0364742 0.24924
Wilson-Score -0.0991226 0.0650865 -0.240883 0.0426377 0.283521

Bootstrap Confidence Interval Graph: CLASS

The title of the graph is  Bootstrap Distribution of Difference of Sample Proportions  ,  The graph shows the distribution of   Proportion Difference: (Domestic - Import)  ,  hat(p)[1] - hat(p)[2] == -0.091793 , Percentile Confidence Limits: (-0.213, 0.036) , Number of Replications = 10000 , Seed = 100

10.3 Test of Hypothesis for a Single Population Proportion

Instructional video icon

10.3.1 Using summary statistics

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose the Proportion Inference —> One Population.

  2. Type in a Factor Label and Success Label. Enter the Sample Size and # of Successes.

  3. Under the Test of Hypothesis section, specify your Alternative hypothesis, select a method by checking one or more of the checkboxes Binomial, Simulation Method, or Large Sample z (p = p0). Also, set a Significance Level.

  4. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of one Population Proportion Inference dilog.


Rguroo Output
One Population Proportion Inference

Data Summary

Counts and Percentages: Student Ethnicity

- Asian Other Total
Count 14 36 50
Percentage 28 72 100

Test of Hypothesis: Student Ethnicity
Method: Large Sample z Test (Using p0)

Alternative Hypothesis Ha: Proportion of 'Asian' is greater than 0.25

Sample Proportion Std Error Standardized Obs Stat 5% z-Upper Critical P-Value BFB
0.28 0.0612372 0.489898 1.64485 0.312103 1.01227

  • Test is not significant at 5% level.
  • Bayes Factor Bound (BFB): The data imply the odds in favor of
    the alternative hypothesis is at most 1.01 to 1, relative to the null hypothesis.

P-value Graph: Large Sample z (Using p0)

Null density (in units of data): Normal; mean = 0.25 , sd = 0.061237
Alternative Hypothesis Ha: Proportion of 'Asian' is greater than 0.25

The title of the graph is  P-value Graph: Large Sample z (Using p0)  ,  The graph shows the distribution of   Proportion of Asian  ,  Observed Sample Prop = 0.28 , P-value = 0.3121

Test of Hypothesis: Student Ethnicity
Method: Binomial Exact Test

Alternative Hypothesis Ha: Proportion of 'Asian' is greater than 0.25

Sample Size No. of Successes Sample Proportion P-Value BFB
50 14 0.28 0.362963 1.00009

  • Test is not significant at 5% level.
  • Bayes Factor Bound (BFB): The data imply the odds in favor of
    the alternative hypothesis is at most 1 to 1, relative to the null hypothesis.

P-Value Graph: Student Ethnicity
Method: Exact Binomial Test

Null Distribution: Binomial; n = 50, p = 0.25
Alternative Hypothesis Ha: Proportion of 'Asian' is greater than 0.25

The graph shows the distribution of   Number of Successes  ,  Observed Value = 14 , P - Value == P(X >= 14) , P - Value == 0.36296

10.3.2 Using raw data

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the Cowles dataset from the Rguroo dataset repository called car into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose the Proportion Inference —> One Population.

  2. Select a Dataset, a Factor variable, and the Success level. The summary statistics will be automatically populated.

  3. Under the Test of Hypothesis section, specify your Alternative hypothesis, select a method by checking one or more of the checkboxes Binomial, Simulation Method, or Large Sample z (p = p0). Also, set a Significance Level.

  4. Click the Preview icon preview icon to view the result.


Rguroo Dialog

One Population Proportion GUI.


Rguroo Output
One Population Proportion Inference

Data Summary

Counts and Percentages: volunteer

- yes no Total
Count 597 824 1421
Percentage 42.01267 57.98733 100

Test of Hypothesis: volunteer
Method: Binomial Exact Test

Alternative Hypothesis Ha: Proportion of 'yes' is less than 0.45

Sample Size No. of Successes Sample Proportion P-Value BFB
1421 597 0.420127 0.0125125 6.71096

  • Test is significant at 5% level.
  • Bayes Factor Bound (BFB): The data imply the odds in favor of
    the alternative hypothesis is at most 6.71 to 1, relative to the null hypothesis.

P-Value Graph: volunteer
Method: Exact Binomial Test

Null Distribution: Binomial; n = 1421, p = 0.45
Alternative Hypothesis Ha: Proportion of 'yes' is less than 0.45

  ,  The graph shows the distribution of   Number of Successes  ,  Observed Value = 597 , P - Value == P(X <= 597) , P - Value == 0.01251

Test of Hypothesis: volunteer
Method: Large Sample z Test (Using p0)

Alternative Hypothesis Ha: Proportion of 'yes' is less than 0.45

Sample Proportion Std Error Standardized Obs Stat 5% z-Lower Critical P-Value BFB
0.420127 0.0131975 -2.26357 -1.64485 0.0118004 7.02202

  • Test is significant at 5% level.
  • Bayes Factor Bound (BFB): The data imply the odds in favor of
    the alternative hypothesis is at most 7.02 to 1, relative to the null hypothesis.

P-value Graph: Large Sample z (Using p0)

Null density (in units of data): Normal; mean = 0.45 , sd = 0.013197
Alternative Hypothesis Ha: Proportion of 'yes' is less than 0.45

The title of the graph is  P-value Graph: Large Sample z (Using p0)  ,  The graph shows the distribution of   Proportion of yes  ,  Observed Sample Prop = 0.42013 , P-value = 0.0118/div>

10.4 Test of Hypothesis about Difference of Two Population Proportions

Instructional video icon

10.4.1 Using summary statistics

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Proportion Inference —> Two Populations.

  2. In the Response/Success section, type in a Response Label, Success Label, and optionally Failure Label.

  3. In the Population section, type in a Population Label.

  4. In the Data Summary tab, enter the following for each of Populations 1 and 2: Label, Sample Size, and # of Successes. The proportion of successes auto-fills.

  5. Click the Test of Hypothesis tab. Specify your Alternative Hypothesis for the difference of proportions \(p_1 - p_2\), type in a Significance Level, and in the Methods section, select one or more of the methods Large Sample z, Permutation Test, or Fisher Exact Test.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Two Population Proportion GUI.


Rguroo Output
Two Population Proportion Inference

Data Summary

Counts: Raise Cigarette Tax by Smoking

- Yes No Total
Non-Smoker 351 254 605
Smoker 95 100 195

Two Population Proportion Test of Hypothesis
Method: Large Sample z Test (Pooled Standard Error)

Success = Yes
Population 1 = Non-Smoker, Population 2 = Smoker
Sample Size: Non-Smoker = 605, Smoker = 195
Number of Successes: Non-Smoker = 351, Smoker = 95
Proportion of Success: Non-Smoker = 0.5802, Smoker = 0.4872
Significance level = 5%
Alternative Hypothesis Ha: Proportion of 'Non-Smoker - Smoker' is greater than 0

Proportion Non-Smoker Proportion Smoker Difference Std Error Obs z Stat 5% z-Upper Critical P-Value BFB
0.580165 0.487179 0.0929858 0.0409005 2.27346 1.64485 0.0114992 7.16424

  • Test is significant at 5% level.
  • Bayes Factor Bound (BFB): The data imply the odds in favor of
    the alternative hypothesis is at most 7.16 to 1, relative to the null hypothesis.

P-value Graph: Large Sample z, Pooled SE

Null density (in units of data): Normal; mean = 0 , sd = 0.040901
Alternative Hypothesis Ha: Proportion of 'Non-Smoker - Smoker' is greater than 0

The title of the graph is  P-value Graph: Large Sample z, Pooled SE  ,  The graph shows the distribution of   Proportion Difference (Non-Smoker - Smoker)  ,  Obs Proportion Diff = 0.092986 , P-Value = 0.011499

10.4.2 Using raw data

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Proportion Inference —> Two Populations.

  2. Select a Dataset.

  3. In the Response/Success section, select a Response variable, and the Success level.

  4. In the Population section, select your Population variable. If data are in tabular form so that the number of observations is given in a frequency variable, then select the variable containing the frequencies from the Frequency dropdown.

  5. In the Data Summary tab, use the Level dropdowns to select the levels that indicate Population 1 and Population 2. In our example, ‘Domestic’ and ‘Import’ are selected as Population 1 and Population 2, respectively. By clicking on the refresh buttons, the summary statistics for each population will show.

  6. Click the Test of Hypothesis tab. Specify your Alternative Hypothesis for the difference of proportions \(p_1 - p_2\), type in a Significance Level, and in the Methods section, select one or more of the methods Large Sample z, Permutation Test, or Fisher Exact Test.

6.Click the Preview icon preview icon to view the result.


Rguroo Dialog

Two population proportion inference dialog


Rguroo Output
Two Population Proportion Inference

Data Summary

Counts: CLASS by TYPE

- Sport Others Total
Domestic 2 33 35
Import 7 40 47

Percentages: CLASS by TYPE

- Sport Others
Domestic 5.714286 94.28571
Import 14.89362 85.10638

Two Population Proportion Test of Hypothesis
Method: Large Sample z Test (Pooled Standard Error)

Success = Sport
Population 1 = Domestic, Population 2 = Import
Sample Size: Domestic = 35, Import = 47
Number of Successes: Domestic = 2, Import = 7
Proportion of Success: Domestic = 0.05714, Import = 0.1489
Significance level = 5%
Alternative Hypothesis Ha: Proportion of 'Domestic - Import' is not equal to 0

Proportion Domestic Proportion Import Difference Std Error Obs z Stat 2.5% z-Lower Critical 2.5% z-Upper Critical P-Value BFB
0.0571429 0.148936 -0.0917933 0.0697899 -1.31528 -1.95996 1.95996 0.188416 1.16978

  • Test is not significant at 5% level.
  • Bayes Factor Bound (BFB): The data imply the odds in favor of
    the alternative hypothesis is at most 1.17 to 1, relative to the null hypothesis.

P-value Graph: Large Sample z, Pooled SE

Null density (in units of data): Normal; mean = 0 , sd = 0.06979
Alternative Hypothesis Ha: Proportion of 'Domestic - Import' is not equal to 0

The title of the graph is  P-value Graph: Large Sample z, Pooled SE  ,  The graph shows the distribution of   Proportion Difference (Domestic - Import)  ,  Obs Proportion Diff = -0.091793 , P-Value = 0.18842

11 Mean Inference

In this section, we describe how you can use Rguroo to construct confidence intervals and conduct tests of hypotheses for a single population mean and the difference of two population mean. You can perform these analyses using both summary statistics and raw data. Rguroo has many methods, including theory-based and simulation-based methods. In this guide, we will show examples of basic methods. By default, tests of hypotheses are performed using p-values. However, you can see output based on the critical region method by clicking on the Details button buttons.

11.1 Confidence Interval for a Single Population Mean

Instructional video icon

11.1.1 Using summary statistics

Rguroo Dialog
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One Population .

  2. In the Data section, Enter Sample Mean, one or both of Sample Standard Deviation or Population Standard Deviation, and the Sample Size.

  3. In the textbox labeled \(\bf\mu\) = Mean of, enter a variable label.

  4. In the Confidence Interval tab, set the confidence level, and select one or both of t-statistic or r des(“z-statistic”)`.

  5. Click the Preview icon preview icon to view the result.


Rguroo Dialog

One population mean inference R guroo dialog


Rguroo Output
One Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Heart Beats 15 75 5

t-Based Confidence Interval

95% Confidence interval

Variable Mean Std Error DF Lower CL Upper CL Margin of Error
Heart Beats 75 1.29099 14 72.2311 77.7689 2.76891

Normal-Based Confidence Interval

95% Confidence interval

Variable Mean Std Error Lower CL Upper CL Margin of Error
Heart Beats 75 1.29099 72.4697 77.5303 2.5303

11.1.2 Using raw data

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Select a Dataset and a Variable. The summary statistics will automatically populate.

  2. In the Confidence Interval tab, set the confidence level, and select one or more of the methods.

  3. Click the Preview icon preview icon to view the result.


Rguroo Dialog


Rguroo Output
One Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
MPG 82 33.7817 10.0046

Normal-Based Confidence Interval

95% Confidence interval

Variable Mean Std Error Lower CL Upper CL Margin of Error
MPG 33.7817 1.10482 31.6163 35.9471 2.16541

Bootstrap-Based Confidence Interval
MPG

Confidence level = 95%
Sample mean = 33.78171
Bootstrap Std Error = 1.091078
Number of replications = 10000 ; Random generator seed = 100

Variable Method Lower CL Upper CL Width
MPG Percentile 31.6865 35.9866 4.30012

Distribution of Bootstrap Replications of Sample Mean
MPG

Number of replications = 10000, Random generator seed = 100

  ,  The graph shows the distribution of   bootstrap replicates of   MPG  ,  5% tail area , Percentile confidence limits , Observed sample mean

11.2 Confidence Interval for Difference of Two Population Means (Independent Samples)

Instructional video icon

11.2.1 Using summary statistics

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One & Two Population.

  2. Under the Summary tab enter the following information for Populations 1 and 2: Label, Sample Mean, Sample Size, and one or both of Sample S.d. and Pop S.d..

  3. Click the Population 1-2 tab, and in the Confidence Interval tab enter the Confidence level, and select one or both of the methods t-statistic or z-statistic, as appropriate.

  4. Click the Preview icon preview icon to view the result.


Rguroo Dialog


Rguroo Output
Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Domestic MPG 35 33 10
Import MPG 47 34 9

t-Based Confidence Interval

95% Confidence interval
Assumed unequal variances for variables

Variable Midpoint Std Error DF Lower CL Upper CL Margin of Error
Domestic MPG - Import MPG -1 2.14022 68.8674 -5.26977 3.26977 4.26977

Normal-Based Confidence Interval

95% Confidence interval
Assumed unequal variances for variables

Variable Midpoint Std Error Lower CL Upper CL Margin of Error
Domestic MPG - Import MPG -1 2.14022 -5.19476 3.19476 4.19476

11.2.2 Using raw data

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One & Two Population.

  2. Select a Dataset.

  3. There are two methods for data input in the Data section:

    1. If the data for the two populations are in two numerical columns, select the columns from the Variable 1 and Variable 2 dropdowns. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

    2. If one variable includes numerical values and there is a corresponding factor (categorical) variable that includes indicators for Population 1 and 2, select the numerical variable from the Variable dropdown and the factor (categorical) variable from the By Factor dropdown. In this example, we use this method where we compare Miles Per Gallon (MPG) for Domestic and Imported cars.

  4. (This step is required if you used Method 4b). In the Summary tab from the Level dropdowns, select the indicators for Population 1 and Population 2. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

  5. Click the Population 1-2 tab, and in the Confidence Interval tab enter the Confidence level and select one or more of the methods as appropriate.

  6. (Optional) In the Assumptions section select appropriate assumptions.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Dialog for two population mean inference


Rguroo Output
Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
MPG (Import) 47 34.2298 9.64369
MPG (Domestic) 35 33.18 10.5821

t-Based Confidence Interval

95% Confidence interval
Assumed unequal variances for variables

Variable Midpoint Std Error DF Lower CL Upper CL Margin of Error
MPG (Import) - MPG (Domestic) 1.04979 2.27556 69.4313 -3.48932 5.5889 4.53911

Bootstrap-Based Confidence Interval
MPG (Import) - MPG (Domestic)

Confidence level = 95%
Difference of sample means = 1.049787
Bootstrap Std Error = 2.256204
Number of replications = 10000 ; Random generator seed = 100

Variable Method Lower CL Upper CL Width
MPG (Import) - MPG (Domestic) Percentile -3.4847 5.37068 8.85538

Distribution of Bootstrap Replications of Difference of Sample Means
MPG (Import) - MPG (Domestic)

Number of replications = 10000, Random generator seed = 100

  ,  The graph shows the distribution of   bootstrap replicates of   MPG (Import) - MPG (Domestic)  ,  5% tail area , Percentile confidence limits , Observed sample mean

11.3 Confidence Interval for Difference of Two Population Means (Paired Data)

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the freshman_weight dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Freshmen weight dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One & Two Population.

  2. Select a Dataset.

  3. There are two methods for data input in the Data section:

    1. If the data for the two populations are in two numerical columns, select the columns from the Variable 1 and Variable 2 dropdowns. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

    2. If one variable includes numerical values and there is a corresponding factor (categorical) variable that includes indicators for Population 1 and 2, select the numerical variable from the Variable dropdown and the factor (categorical) variable from the By Factor dropdown. In this example, we use this method where we compare Miles Per Gallon (MPG) for Domestic and Imported cars.

  4. (This step is required if you used Method 4b). In the Summary tab from the Level dropdowns, select the indicators for Population 1 and Population 2. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

  5. After you are done with steps 4 and 5, select the checkbox Paired Data. The summary statistics for the Paired Differences auto-fill.

  6. Click the Population 1-2 tab, and in the Confidence Interval tab enter the Confidence level and select one or more of the methods as appropriate.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Dialog for two population mean inference paired data


Rguroo Output
Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Initial.Weight 68 136.074 24.3711
Terminal.Weight 68 137.985 24.6101
[Initial.Weight - Terminal.Weight] 68 -1.91176 2.12824

Statistics for the difference [Initial.Weight - Terminal.Weight] is based on differences of observed pairs of data.

t-Based Confidence Interval

95% Confidence interval
Paired differences are used

Variable Midpoint Std Error DF Lower CL Upper CL Margin of Error
[Initial.Weight - Terminal.Weight] -1.91176 0.258087 67 -2.42691 -1.39662 0.515144

Bootstrap-Based Confidence Interval - Paired Difference
[Initial.Weight - Terminal.Weight]

Confidence level = 95%
Sample mean of paired differences = -1.911765
Bootstrap Std Error = 0.2577998
Number of replications = 10000 ; Random generator seed = 100

Variable Method Lower CL Upper CL Width
[Initial.Weight - Terminal.Weight] Percentile -2.41176 -1.41176 1

Distribution of Bootstrap Replications of Mean of Paired Differences
[Initial.Weight - Terminal.Weight]

Number of replications = 10000, Random generator seed = 100

  ,  The graph shows the distribution of   bootstrap replicates of   Initial.Weight - Terminal.Weight  ,  5% tail area , Percentile confidence limits , Observed sample mean

11.4 Test of Hypothesis for a Single population mean

Instructional video icon

11.4.1 Using summary statistics

Rguroo Dialog
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One Population .

  2. In the Data section, Enter Sample Mean, one or both of Sample Standard Deviation or Population Standard Deviation, and the Sample Size.

  3. In the textbox labeled \(\bf\mu\) = Mean of, enter a variable label.

  4. In the Test of hypothesis tab, set the confidence level, and select one or both of t-statistic or r des(“z-statistic”)`.

  5. Click the Preview icon preview icon to view the result.


Rguroo Dialog

One population mean inference R guroo dialog


Rguroo Output
One Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Heart Beats 15 75 5

Test of Hypothesis: t-Test
Heart Beats

Alternative Hypothesis Ha: Mean of 'Heart Beats' is greater than 72
5% upper critical value in units of data = 74.27384

Sample Mean Std Error Obs t Stat DF 5% t-Upper Critical P-value
75 1.29099 2.32379 14 1.76131 0.0178504

Test is significant at 5% level.

P-value Graph: t-Test

Null density (in units of data): Student t; mean = 72, scale = 1.291, df = 14
Alternative Hypothesis Ha: Mean of 'Heart Beats' is greater than 72

The title of the graph is  P-value Graph: t-Test  ,  The graph shows the distribution of   Sample Mean  ,  Obs Sample Mean = 75 , P-Value = 0.01785

Test of Hypothesis: z-Test
Heart Beats

Alternative Hypothesis Ha: Mean of 'Heart Beats' is greater than 72
5% upper critical value in units of data = 74.1235

Sample Mean Std Error Obs z Stat 5% z-Upper Critical P-Value
75 1.29099 2.32379 1.64485 0.0100684

Test is significant at 5% level.

P-value Graph: z-Test

Null density (in units of data): Normal; mean = 72 , sd = 1.291
Alternative Hypothesis Ha: Mean of 'Heart Beats' is greater than 72

The title of the graph is  P-value Graph: z-Test  ,  The graph shows the distribution of   Sample Mean  ,  Obs Sample Mean = 75 , P-Value = 0.010068

11.4.2 Using raw data

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Select a Dataset and a Variable. The summary statistics will automatically populate.

  2. In the Test of Hypothesis tab, set the confidence level, and select one or more of the methods t-statistic, z-statistic, Bootstrap t-statistic, and Bootstrap Unscaled.

  3. Click the Preview icon preview icon to view the result.


Rguroo Dialog


Rguroo Output
One Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Miles Per Gallon 82 33.7817 10.0046

Test of Hypothesis: t-Test
Miles Per Gallon

Alternative Hypothesis Ha: Mean of 'Miles Per Gallon' is greater than 32
5% upper critical value in units of data = 33.8383

Sample Mean Std Error Obs t Stat DF 5% t-Upper Critical P-value
33.7817 1.10482 1.61266 81 1.66388 0.0553538

Test is not significant at 5% level.

P-value Graph: t-Test

Null density (in units of data): Student t; mean = 32, scale = 1.1048, df = 81
Alternative Hypothesis Ha: Mean of 'Miles Per Gallon' is greater than 32

The title of the graph is  P-value Graph: t-Test  ,  The graph shows the distribution of   Sample Mean  ,  Obs Sample Mean = 33.782 , P-Value = 0.055354

Test of Hypothesis: z-Test
Miles Per Gallon

Alternative Hypothesis Ha: Mean of 'Miles Per Gallon' is greater than 32
5% upper critical value in units of data = 33.81727

Sample Mean Std Error Obs z Stat 5% z-Upper Critical P-Value
33.7817 1.10482 1.61266 1.64485 0.053409

Test is not significant at 5% level.

P-value Graph: z-Test

Null density (in units of data): Normal; mean = 32 , sd = 1.1048
Alternative Hypothesis Ha: Mean of 'Miles Per Gallon' is greater than 32

The title of the graph is  P-value Graph: z-Test  ,  The graph shows the distribution of   Sample Mean  ,  Obs Sample Mean = 33.782 , P-Value = 0.053409

Test of Hypothesis: Bootstrap (Unscaled Sample Mean)
Miles Per Gallon

Alternative Hypothesis Ha: Mean of 'Miles Per Gallon' is greater than 32
Number of replications = 10000; Random generator seed = 100

Observed Sample Mean Bootstrap Mean Bootstrap SD 5% Upper Critical Value P-Value
33.7817 32.0074 1.09117 33.8402 0.0564944

Test is not significant at 5% level.

Distribution of Bootstrap Replicates: Sample Mean
Miles Per Gallon

Number of replications = 10000, Random generator seed = 100

The title of the graph is  Alternative Hypothesis Ha: Mean of 'Miles Per Gallon' is greater than 32
Distribution of bootstrap replications of sample mean (unscaled)  ,  The graph shows the distribution of   bootstrap replicates of   Miles Per Gallon  ,  Observed Sample Mean = 33.78171 , P-Value = 0.05649435 , Critical boundary for  alpha  =  5% , 
 Critical value = 33.84024

11.5 Test of Hypothesis for Difference of Two Population Means (Independent Samples)

Instructional video icon

11.5.1 Using summary statistics

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One & Two Population.

  2. Under the Summary tab, enter the following information for Populations 1 and 2: Label, Sample Mean, Sample Size, and one or both of Sample S.d. and Pop S.d..

  3. Click the Population 1-2 tab, and in the Test of Hypothesis tab, enter the Significance level, and Alternative Hypothesis for the difference \(\mu_1-\mu_2\).

  4. Select one or both of the methods t-statistic or z-statistic.

  5. (Optional) In the Assumptions section, select Unequal Variances or Equal Variances.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Entering data for two population mean


Rguroo Dialog

Entering data for two population mean


Rguroo Output
Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Domestic MPG 35 33 10
Import MPG 47 34 9

Test of Hypothesis: t-Test
Domestic MPG - Import MPG

Alternative Hypothesis Ha: Difference of means 'Domestic MPG - Import MPG' is less than 0
5% lower critical value in units of data = -3.568353
Unequal population variances was assumed.

Difference of Means Std Error Obs t Stat DF 5% t-Lower Critical P-value
-1 2.14022 -0.467241 68.8674 -1.66728 0.320901

Test is not significant at 5% level.

P-value Graph: t-Test

Null density (in units of data): Student t; mean = 0, scale = 2.1402, df = 68.867
Alternative Hypothesis Ha: Difference of means 'Domestic MPG - Import MPG' is less than 0
Assumed unequal population variances

The title of the graph is  P-value Graph: t-Test  ,  The graph shows the distribution of   Diff. of Sample Means  ,  Obs Diff. of Sample Means = -1 , P-Value = 0.3209

Test of Hypothesis: z-Test
Domestic MPG - Import MPG

Alternative Hypothesis Ha: Difference of means 'Domestic MPG - Import MPG' is less than 0
5% lower critical value in units of data = -3.520351

Difference of Means Std Error Obs z Stat 5% z-Lower Critical P-Value
-1 2.14022 -0.467241 -1.64485 0.320164

Test is not significant at 5% level.

P-value Graph: z-Test

Null density (in units of data): Normal; mean = 0 , sd = 2.1402
Alternative Hypothesis Ha: Difference of means 'Domestic MPG - Import MPG' is less than 0

The title of the graph is  P-value Graph: z-Test  ,  The graph shows the distribution of   Diff. of Sample Means  ,  Obs Diff. of Sample Means = -1 , P-Value = 0.32016

11.5.2 Using raw data

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One & Two Population.

  2. Select a Dataset.

  3. There are two methods for data input in the Data section:

    1. If the data for the two populations are in two numerical columns, select the columns from the Variable 1 and Variable 2 dropdowns. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

    2. If one variable includes numerical values and there is a corresponding factor (categorical) variable that includes indicators for Population 1 and 2, select the numerical variable from the Variable dropdown and the factor (categorical) variable from the By Factor dropdown. In this example, we use this method where we compare Miles Per Gallon (MPG) for Domestic and Imported cars.

  4. (This step is required if you used Method 4b). In the Summary tab from the Level dropdowns, select the indicators for Population 1 and Population 2. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

  5. Click the Population 1-2 tab, and in the Test of Hypothesis tab, enter the Significance level, and Alternative Hypothesis for the difference \(\mu_1-\mu_2\).

  6. Select one or more of the methods t-statistic or z-statistic, Bootstrap t-statistic , Bootstrap Unscaled, Permutation t-statistic , Permutation Unscaled.

  7. (Optional) In the Assumptions section, select Unequal Variances or Equal Variances.

  8. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Dialog for two population mean inference inputing data


Rguroo Dialog

Dialog for two population mean inference test of hypothesis input


Rguroo Output
Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
MPG (Subcompact) 22 41.3318 8.46097
MPG (Compact) 16 36.1313 8.62428

Test of Hypothesis: t-Test
MPG (Subcompact) - MPG (Compact)

Alternative Hypothesis Ha: Difference of means 'MPG (Subcompact) - MPG (Compact)' is greater than 0
5% upper critical value in units of data = 4.73138
Equal population variances was assumed.

Difference of Means Std Error Obs t Stat DF 5% t-Upper Critical P-value
5.20057 2.80246 1.85572 36 1.6883 0.0358476

Test is significant at 5% level.

P-value Graph: t-Test

Null density (in units of data): Student t; mean = 0, scale = 2.8025, df = 36
Alternative Hypothesis Ha: Difference of means 'MPG (Subcompact) - MPG (Compact)' is greater than 0
Assumed equal population variances; pooled variance was used.

The title of the graph is  P-value Graph: t-Test  ,  The graph shows the distribution of   Diff. of Sample Means  ,  Obs Diff. of Sample Means = 5.2006 , P-Value = 0.035848

Test of Hypothesis: z-Test
MPG (Subcompact) - MPG (Compact)

Alternative Hypothesis Ha: Difference of means 'MPG (Subcompact) - MPG (Compact)' is greater than 0
5% upper critical value in units of data = 4.623953

Difference of Means Std Error Obs z Stat 5% z-Upper Critical P-Value
5.20057 2.81116 1.84997 1.64485 0.032159

Test is significant at 5% level.

P-value Graph: z-Test

Null density (in units of data): Normal; mean = 0 , sd = 2.8112
Alternative Hypothesis Ha: Difference of means 'MPG (Subcompact) - MPG (Compact)' is greater than 0

The title of the graph is  P-value Graph: z-Test  ,  The graph shows the distribution of   Diff. of Sample Means  ,  Obs Diff. of Sample Means = 5.2006 , P-Value = 0.032159

Test of Hypothesis: Bootstrap Unscaled Difference of Means
MPG (Subcompact) - MPG (Compact)

Alternative Hypothesis Ha: Difference of means 'MPG (Subcompact) - MPG (Compact)' is greater than 0
Number of replications = 10000; Random generator seed = 100

Diff Obs Sample Means Mean Bootstrap Diff SD Bootstrap Diff 5% Upper Critical Value P-Value
5.20057 -0.00948922 2.70779 4.40057 0.0235976

Test is significant at 5% level.

Distribution of Bootstrap Replicates: Sample Mean
MPG (Subcompact) - MPG (Compact)

Number of replications = 10000, Random generator seed = 100

The title of the graph is  Alternative Hypothesis Ha: Mean of 'MPG (Subcompact)' is greater than Mean of 'MPG (Compact)'
Distribution of bootstrap replications of difference of sample means (unscaled)  ,  The graph shows the distribution of   bootstrap replicates of   MPG (Subcompact) - MPG (Compact)  ,  Obs. Diff. of Sample Means = 5.200568 , P-Value = 0.02359764 , Critical boundary for  alpha  =  5% , 
 Critical value = 4.400568

Test of Hypothesis: Permutation Unscaled Difference of Means
MPG (Subcompact) - MPG (Compact)

Alternative Hypothesis Ha: Difference of means 'MPG (Subcompact) - MPG (Compact)' is greater than 0
Number of replications = 10000; Random generator seed = 100

Diff Obs Sample Means Mean Permutation Diff SD Permutation Diff 5% Upper Critical Value P-Value
5.20057 -0.00930998 2.89499 4.69318 0.0349965

Test is significant at 5% level.

Distribution of Permutation Replicates: Sample Mean
MPG (Subcompact) - MPG (Compact)

Number of replications = 10000, Random generator seed = 100

The title of the graph is  Alternative Hypothesis Ha: Mean of 'MPG (Subcompact)' is greater than Mean of 'MPG (Compact)'
Distribution of permutation replications of difference of sample means (unscaled)  ,  The graph shows the distribution of   Permutation-test   MPG (Subcompact) - MPG (Compact)  ,  Obs. Diff. of Sample Means = 5.200568 , P-Value = 0.0349965 , Critical boundary for  alpha  =  5% , 
 Critical value = 4.693182

11.6 Test of Hypothesis for Difference of Two Population Means (Paired Data)

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the freshman_weight dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Freshmen weight dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Mean Inference —> One & Two Population.

  2. Select a Dataset.

  3. There are two methods for data input in the Data section:

    1. If the data for the two populations are in two numerical columns, select the columns from the Variable 1 and Variable 2 dropdowns. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

    2. If one variable includes numerical values and there is a corresponding factor (categorical) variable that includes indicators for Population 1 and 2, select the numerical variable from the Variable dropdown and the factor (categorical) variable from the By Factor dropdown. In this example, we use this method where we compare Miles Per Gallon (MPG) for Domestic and Imported cars.

  4. (This step is required if you used Method 4b). In the Summary tab from the Level dropdowns, select the indicators for Population 1 and Population 2. The summary statistics for Populations 1 and 2 automatically populate under the Summary tab.

  5. After you are done with steps 4 and 5, select the checkbox Paired Data. The summary statistics for the Paired Differences auto-fills.

  6. Click the Population 1-2 tab click the Hypothesis Test tab and select Paired Data in the Assumptions section on the left.

  7. Enter the Significance level, and state the Alternative Hypothesis for \(\mu_d\), the mean of differences.

  8. Select one or more of the methods t-statistic or z-statistic, Bootstrap t-statistic , Bootstrap Unscaled, Permutation t-statistic , Permutation Unscaled.

  9. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Dialog for two population mean inference paired data


Rguroo Dialog

Dialog for two population mean inference paired data


Rguroo Output
Population Mean Inference

Data Summary

Variable Sample Size Mean Sample Std Dev
Initial.Weight 68 136.074 24.3711
Terminal.Weight 68 137.985 24.6101
[Initial.Weight - Terminal.Weight] 68 -1.91176 2.12824

Statistics for the difference [Initial.Weight - Terminal.Weight] is based on differences of observed pairs of data.

Test of Hypothesis: Paired t-Test
[Initial.Weight - Terminal.Weight]

Alternative Hypothesis Ha: Mean of paired differences '[Initial.Weight - Terminal.Weight]' is less than 0
5% lower critical value in units of data = -0.4304678
Differences of pairs were used.

Mean of Paired Diff Std Error Obs t Stat DF 5% t-Lower Critical P-value
-1.91176 0.258087 -7.40744 67 -1.66792 1.40637e-10

Test is significant at 5% level.

P-value Graph: Paired t-Test

Null density (in units of data): Student t; mean = 0, scale = 0.25809, df = 67
Alternative Hypothesis Ha: Mean of differences '[Initial.Weight - Terminal.Weight]' is less than 0

The title of the graph is  P-value Graph: Paired t-Test  ,  The graph shows the distribution of   Diff. of Sample Means  ,  Obs Diff. of Sample Means = -1.9118 , P-Value = 1.4064e-10

Test of Hypothesis: Paired z-Test
[Initial.Weight - Terminal.Weight]

Alternative Hypothesis Ha: Mean of paired differences '[Initial.Weight - Terminal.Weight]' is less than 0
5% lower critical value in units of data = -0.4245156
Differences of pairs were used.

Mean of Paired Diff Std Error Obs z Stat 5% z-Lower Critical P-Value
-1.91176 0.258087 -7.40744 -1.64485 6.43816e-14

Test is significant at 5% level.

P-value Graph: Paired z-Test

Null density (in units of data): Normal; mean = 0 , sd = 0.25809
Alternative Hypothesis Ha: Mean of differences '[Initial.Weight - Terminal.Weight]' is less than 0

The title of the graph is  P-value Graph: Paired z-Test  ,  The graph shows the distribution of   Mean of Paired Diffs  ,  Obs Mean of Paired Diffs = -1.9118 , P-Value = 6.4382e-14

Test of Hypothesis: Bootstrap (Unscaled Mean of Paired Differences)
Initial.Weight - Terminal.Weight

Alternative Hypothesis Ha: Mean of differences 'Initial.Weight - Terminal.Weight' is less than 0
Number of replications = 10000; Random generator seed = 100

Mean Obs Paired Diffs Mean Bootstrap Paired Diff SD Bootstrap Paired Diff 5% Lower Critical Value P-Value
-1.91176 0.00226595 0.258497 -0.426471 9.999e-05

Test is significant at 5% level.

Distribution of Bootstrap Replicates: Mean of Paired Differences
Initial.Weight - Terminal.Weight

Number of replications = 10000, Random generator seed = 100

The title of the graph is  Alternative Hypothesis Ha: Mean of 'Initial.Weight' is less than Mean of 'Terminal.Weight'
Distribution of simulated paired differences (Bootstrap Unscaled)  ,  The graph shows the distribution of   bootstrap replicates of   Initial.Weight - Terminal.Weight  ,  Obs. Diff. of Sample Means = -1.911765 , P-Value = 9.999e-05 , Critical boundary for  alpha  =  5% , 
 Critical value = -0.4264706

Test of Hypothesis: Permutation (Unscaled Mean of Paired Differences)
Initial.Weight - Terminal.Weight

Alternative Hypothesis Ha: Mean of differences 'Initial.Weight - Terminal.Weight' is less than 0
Number of replications = 10000; Random generator seed = 100

Mean Obs Paired Diffs Mean Permutation Paired Diff SD Permutation Paired Diff 5% Lower Critical Value P-Value
-1.91176 -0.00254975 0.344153 -0.558824 9.999e-05

Test is significant at 5% level.

Distribution of Permutation Replicates: Mean of Paired Differences
Initial.Weight - Terminal.Weight

Number of replications = 10000, Random generator seed = 100

The title of the graph is  Alternative Hypothesis Ha: Mean of 'Initial.Weight' is less than Mean of 'Terminal.Weight'
Distribution of replications of difference of permuted pairs (unscaled)  ,  The graph shows the distribution of   Permutation-test   Initial.Weight - Terminal.Weight  ,  Obs. Diff. of Sample Means = -1.911765 , P-Value = 9.999e-05 , Critical boundary for  alpha  =  5% , 
 Critical value = -0.5588235

12 Median Inference (Nonparametric Tests and Confidence Intrevals)

12.1 Confidence Interval for a Single Population Median

Rguroo Instruction: Binomial, Wilcoxon, Bootstrap Percentile, Bootstrap BCa
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Median Inference —> One Population.

  2. Select a Dataset. In this example, we choose cardata.

  3. Select a Variable from the dropdown menu.

  4. (Optional) Type a label for the populations in the text boxes M = Median of.

  5. Click on the Confidence Interval tab:

  • Type in your Confidence Level.
  • Select one or more methods in the Methods section or in the Details button menu. In this example, we selected Binomial, Wilcoxon, Bootstrap Percentile and Bootstrap BCa.
  1. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Two Population median Inference dilog.


Rguroo Output
One Population Median Inference

Data Summary

Cases read Cases missing Cases used Min Median Max
82 0 82 13.2 32.45 65.4

Confidence Interval for Population Median: Miles per Gallon

Sample Size = 82
Median = 32.45

Method Confidence Level Lower CL Upper CL
Binomial Exact 96.48% 31.3 36.1
Binomial Exact 94.02% 31.4 35.4
Interpolated* 95% 31.3601 35.6795

* Based on linear interpolation of the two exact confidence intervals

Wilcoxon Confidence Interval for Population Location: Miles per Gallon

Sample Size = 82
Median = 32.45
Confidence Level = 95%

Method Lower CL Upper CL
Normal Approximation with CC 31.35 35.5001

Warning:


The Exact method was not used due to ties in the data.

Bootstrap Confidence Interval for Population Median: Miles per Gallon

Confidence Level = 95%
Number of replications = 10000; Random generator seed = 100

Sample size = 82
Sample Median = 32.45; Bootstrap SE = 1.246038

Method Lower CL Upper CL
Percentile 31.4 35.6
BCa 31.35 35.4

Distribution of Bootstrap Replications of Sample Median
Miles per Gallon

  ,  The graph shows the distribution of   bootstrap replicates of   Miles per Gallon  ,  5% tail area , Percentile confidence limits , BCa confidence limits , Observed sample median , Number of replications = 10000 , Random generator seed = 100

12.2 Confidence Interval for Difference of Two Population Medians (Independent Data)

Rguroo Instruction: Mann-Whitney U ((Wilcoxon Rank-Sum), Bootstrap Percentile, Bootstrap BCa
  1. Use a dataset in your Rguroo account or recreate the example below by importing the glucose dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the glucose dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Median Inference —> Two Population.

  2. Select a Dataset. In this example, we choose glucose.

  3. If values for the two variables to be compared are in two columns, choose Variable 1 and Variable 2 from the dropdown menus. If observed numerical values for both groups are in one column and the two groups to be compared are indicated by a factor variable, use the option Variable and By Factor to select the numerical variable and the variable that represents the groups, respectively.

  4. (Optional) Type a label for the two populations in the text boxes Pop 1 Label and Pop 2 Label.

  5. Click on the Confidence Interval tab:

  • Type in your Confidence Level.
  • Select one or more methods in the Methods section or in the Details button menu. In this example, we selected Mann-Whitney and Bootstrap Percentile. Note the method Mann-Whitney U is the same as Wilcoxon Rank Sum.
  1. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Two Population median Inference dilog.


Rguroo Output
Two Population Median Inference

Data Summary

Variable Cases read Cases missing Cases used Min Median Max
Healthy 14 0 14 2.83216 4.93049 5.74247
Diabetic 14 0 14 4.62004 5.17533 8.2125

Mann-Whitney Confidence Interval for Population Location: Healthy - Diabetic

Confidence Level = 95%

Method Lower CL Upper CL
Exact -1.19128 0.303986

Bootstrap Confidence Interval for Difference of Population Medians:
Healthy - Diabetic

Confidence Level = 95%
Number of replications = 10000; Random generator seed = 100

Sample sizes: Healthy = 14; Diabetic = 14
Difference of Sample Medians = -0.244838; Bootstrap SE = 0.4387024

Method Lower CL Upper CL
Percentile -1.176764 0.5119937

Distribution of Bootstrap Replications of Difference of Sample Medians
Healthy - Diabetic

  ,  The graph shows the distribution of   bootstrap replicates of   Healthy - Diabetic  ,  5% tail area , Percentile confidence limits , BCa confidence limits , Observed sample median , Number of replications = 10000 , Random generator seed = 100

12.3 Confidence Interval for Difference of Population Medians (Paired Data)

Rguroo Instruction: Binomial, Wilcoxon, and Bootstrap
  1. Use a dataset in your Rguroo account or recreate the example below by importing the anorexia dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the data. Screenshot of a portion of the anorexia dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Median Inference —> Two Population.

  2. Select a Dataset. In this example, we choose anorexia.

  3. Select the checkbox Paired Data.

  4. If values for the two variables to be compared are in two columns, choose Variable 1 and Variable 2 from the dropdown menus. If observed numerical values for both groups are in one column and the two groups to be compared are indicated by a factor variable, use the option Variable and By Factor to select the numerical variable and the variable that represents the groups, respectively.

  5. (Optional) Type a label for the two populations in the text boxes Pop 1 Label and Pop 2 Label.

  6. Click on the Confidence Interval tab.

  • Type in your Confidence Level.
  • Select one or more methods in the Methods section or in the Details button menu. In this example, we selected Binomial Exact, and Bootstrap Percentile. Optionally you can select Graph to see p-value graphs.
  1. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Two Population median Inference dilog.


Rguroo Output
Two Population Median Inference

Data Summary

Variable Cases read Cases missing Cases used Min Median Max
Post Treatment 72 0 72 71.3 84.05 103.6
Pre Treatment 72 0 72 79.6 82.3 94.9
[Post Treatment - Pre Treatment] 72 0 72 -12.2 1.65 21.5

Confidence Interval for Median of Differences: [Post Treatment - Pre Treatment]

Sample Size = 72
Median of differences = 1.65

Method Confidence Level Lower CL Upper CL
Binomial Exact 95.56% -0.1 3.7
Binomial Exact 92.36% -0.1 3.5
Interpolated* 95% -0.1 3.66482

* Based on linear interpolation of the two exact confidence intervals

Bootstrap Confidence Interval for Median of paired differences:
[Post Treatment - Pre Treatment]

Confidence Level = 95%
Number of replications = 10000; Random generator seed = 100

Method Lower CL Upper CL
Percentile -0.1 3.6

Distribution of Bootstrap Replications of Median of Paired Differences
[Post Treatment - Pre Treatment]

  ,  The graph shows the distribution of   bootstrap replicates of   Post Treatment - Pre Treatment  ,  5% tail area , Percentile confidence limits , BCa confidence limits , Observed sample median , Number of replications = 10000 , Random generator seed = 100

12.4 Test of Hypothesis for a Single Population Median

Rguroo Instruction: Sign Test and Wilcoxon Signed Rank Test
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Median Inference —> One Population.

  2. Select a Dataset. In this example, we choose cardata.

  3. Select a variable from the Variable dropdown.

  4. (Optional) Type a label for the populations in the text boxes M = Median of.

  5. In the Test of Hypothesis tab, select your Significance Level and state the Alternative Hypothesis M.

  6. Select one or more methods in the Methods section or in the Details button menu. In this example, we selected Sign Test and Wilcoxon Signed-Rank Test.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Two Population median Inference dilog.


Rguroo Output
One Population Median Inference

Data Summary

Cases read Cases missing Cases used Min Median Max
82 0 82 13.2 32.45 65.4

Test of Hypothesis about Median of Differences: Miles per Gallon
Method: Sign Test

Null Hypothesis H0: Median of difference 'Miles per Gallon' is equal to 32
Alternative Hypothesis Ha: Median of difference 'Miles per Gallon' is greater than 32

Sample Size Sample Median Number Below Number Equal Number Above P-Value
82 32.45 36 0 46 0.160147

Test is not significant at 5% level.

P-Value Graph: Miles per Gallon
Sign Test Using the Binomial Distribution

Null Density: Binomial; n = 82, p = 0.5
Alternative Hypothesis Ha: Median of 'Miles per Gallon' is greater than 32

  ,  The graph shows the distribution of   Number of observed deviations from the null value m0 = 32  ,  Observed value = 46 , P-Value = 0.16015 , P(X >= 46)

Critical Region Graph: Miles per Gallon
Sign Test Using the Binomial Distribution

Null Density: Binomial; n = 82, p = 0.5
Alternative Hypothesis Ha: Median of 'Miles per Gallon' is greater than 32

  ,  The graph shows the distribution of   Number of observed deviations from the null value m0 = 32  ,  Observed value = 46 , Critical Region: X >=  49 , Exact Significance level = 4.852%

Wilcoxon Signed-Rank Test of Location: Miles per Gallon

Null Hypothesis H0: Location of 'Miles per Gallon' is equal to 32
Alternative Hypothesis Ha: Location of 'Miles per Gallon' is greater than 32

Method Wilcoxon Stat. P-Value
Normal Approximation with CC 1953.5 0.12245

Warning:


The Exact method was not used due to ties in the data.

12.5 Test of Hypothesis for Difference of Two Population Medians (Independent Data)

Rguroo Instruction: Mann-Whitney U ((Wilcoxon Rank-Sum) Test and Permutation Test
  1. Use a dataset in your Rguroo account or recreate the example below by importing the glucose dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see the data. Screenshot of the glucose dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Median Inference —> Two Population.

  2. Select a Dataset. In this example, we choose glucose.

  3. If values for the two variables to be compared are in two columns, choose Variable 1 and Variable 2 from the dropdown menus. If observed numerical values for both groups are in one column and the two groups to be compared are indicated by a factor variable, use the option Variable and By Factor to select the numerical variable and the variable that represents the groups, respectively.

  4. In the Test of Hypothesis tab, select your Significance Level and state the Alternative Hypothesis M1 - M2.

  5. Select one or more methods in the Methods section or in the Details button menu. In this example, we selected Mann-Whitney and Permutation. Note the method Mann-Whitney U test is also called Wilcoxon Rank Sum Test.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Two Population median Inference dilog.


Rguroo Output
Two Population Median Inference

Data Summary

Variable Cases read Cases missing Cases used Min Median Max
Healthy 14 0 14 2.83216 4.93049 5.74247
Diabetic 14 0 14 4.62004 5.17533 8.2125

Mann-Whitney Test of Shift in Location: Healthy - Diabetic

Sample sizes: Healthy = 14; Diabetic = 14
Difference of Sample Medians Healthy - Diabetic = -0.244838

Null Hypothesis H0: Shift in Location 'Healthy - Diabetic' is equal to 0
Alternative Hypothesis Ha: Shift in Location 'Healthy - Diabetic' is less than 0

Method Test Stat. P-Value
Exact 72 0.122815

Permutation Test of Difference of Medians: Healthy - Diabetic

Alternative Hypothesis Ha: Difference of Medians of Healthy and Diabetic is less than 0
Number of replications = 10000; Random generator seed = 100

Diff Obs Sample Medians Median Permutation Diff 5% Lower Critical Value P-value
-0.244838 -0.00189991 -0.59858 0.287671

Test is not significant at 5% level.

Distribution of Permutation Replicates: Sample Median
Healthy - Diabetic

The title of the graph is  Distribution of Permutation Replicates: Sample MedianHealthy - Diabetic
  ,  The graph shows the distribution of  Permutation Test  Healthy - Diabetic  ,   Obs. Diff. of Sample Medians = -0.244838 , P-Value = 0.2876712 , Critical boundary for  alpha  =  5% , 
 Critical value = -0.5985804 , Number of replications = 10000 , Random generator seed = 100

12.6 Test of Hypothesis for Difference of Population Medians (Paired Data)

Rguroo Instruction: Wilcoxon Signed-Rank Test, Sign Test, and Permutation Test
  1. Use a dataset in your Rguroo account or recreate the example below by importing the anorexia dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the data. Screenshot of a portion of the anorexia dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and select Median Inference —> Two Population.

  2. Select a Dataset. In this example, we choose anorexia.

  3. Select the checkbox Paired Data.

  4. If values for the two variables to be compared are in two columns, choose Variable 1 and Variable 2 from the dropdown menus. If observed numerical values for both groups are in one column and the two groups to be compared are indicated by a factor variable, use the option Variable and By Factor to select the numerical variable and the variable that represents the groups, respectively.

  5. (Optional) Type a label for the two populations in the textboxes Pop 1 Label and Pop 2 Label.

  6. In the Test of Hypothesis tab, typ in your Significance Level and state the Alternative Hypothesis M1 - M2.

  7. Select one or more methods in the Methods section or in the Details button menu. In this example, we selected Wilcoxon Signed-Rank Test, Sign Test, and Permutation Test. Optionally you can select Graph to see p-value graphs.

  8. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Screenshot of Two Population median Inference dilog.


Rguroo Output
Two Population Median Inference

Data Summary

Variable Cases read Cases missing Cases used Min Median Max
Post-Weight 72 0 72 71.3 84.05 103.6
Pre-Weight 72 0 72 79.6 82.3 94.9
[Post-Weight - Pre-Weight] 72 0 72 -12.2 1.65 21.5

Wilcoxon Signed-Rank Test of Difference in Location: [Post-Weight - Pre-Weight]

Sample size = 72
Median of differenced = 1.65

Null Hypothesis H0: Shift in Location '[Post-Weight - Pre-Weight]' is equal to 0
Alternative Hypothesis Ha: Shift in Location '[Post-Weight - Pre-Weight]' is not equal to 0

Method Test Stat. P-Value
Normal Approximation with CC 1724.5 0.0106022

Warning:

The Exact method was not used due to ties in the data.

This is a warning message!

Test of Hypothesis about Median of Differences: [Post-Weight - Pre-Weight]
Method: Sign Test

Null Hypothesis H0: Median of difference '[Post-Weight - Pre-Weight]' is equal to 0
Alternative Hypothesis Ha: Median of difference '[Post-Weight - Pre-Weight]' is not equal to 0

Sample Size Median of Diff. Number Below Number Equal Number Above P-Value
72 1.65 29 1 42 0.153913

Test is not significant at 5% level.

P-Value Graph: [Post-Weight - Pre-Weight]
Sign Test Using the Binomial Distribution

Null Density: Binomial; n = 71, p = 0.5
Alternative Hypothesis Ha: Median of '[Post-Weight - Pre-Weight]' is not equal to 0

  ,  The graph shows the distribution of   Number of observed deviations from the null value m0 = 0  ,  Observed value = 42 , P-Value = 0.15391 , P(X <= 29) + P(X >= 42)

Permutation Test of Median of Paired Differences: [Post-Weight - Pre-Weight]

Alternative Hypothesis Ha: Median of paired differences '[Post-Weight - Pre-Weight]' is not equal to 0
Number of replications = 10000; Random generator seed = 100

Median Obs Paired Diffs Median Permutation Paired Diff 2.5% Lower Critical Value 2.5% Upper Critical Value P-value
1.65 0.05 -1.4 1.3 0.0249975

Test is significant at 5% level.

Distribution of Permutation Replicates: Median of Paired Differences
[Post-Weight - Pre-Weight]

The title of the graph is  Distribution of Permutation Replicates: Median of Paired Differences
[Post-Weight - Pre-Weight]  ,  The graph shows the distribution of  Permutation Test  [Post-Weight - Pre-Weight]  ,  Obs. Median of Paired Diff = 1.65 , P-Value = 0.0249975 , Critical boundary for  alpha  =  5% , 
 Critical values = -1.4, 1.3 , Number of replications = 10000 , Random generator seed = 100

13 Linear Regression

In this section you will find basic instructions for fitting simple and multiple regression models.

13.1 Simple Linear Regression

Instructional video icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Linear Regression —> Simple Regression.

  2. In the Data section, select a Dataset. Then select your predictor and response variables from the Predictor (x) and Response (y) dropdowns.

  3. (Optional) You can obtain predicted values and residuals by selecting the Predictions & Residuals (Observed data) checkbox. Moreover, you can get predictions for new values and perform tests of hypotheses about correlation and slope and obtain confidence intervals for parameters and mean predictions and prediction intervals using both theory-based and bootstrap methods.

  4. Click the Preview icon preview icon to view the result.


Rguroo Dialog

 =Simple regression dialog


Rguroo Output
Simple Regression

Data and Model Summary

Number of cases used in the analysis 82
Number of incomplete (omitted) cases 0
Pearson Correlation Coefficient (r) 0.96655
Coefficient of Determination (R-Squared) 0.93421
Equation of Least Squares Line SP = 84.454 + 0.2387 * HP

Response Versus Numerical Predictor

Least Squares Line: SP = 84.454 + 0.2387 * HP

This graph shows scatter plot of  Response Versus Numerical Predictor  The plot also shows the
                              LOESS curve corresponding to the data points.

Residual Versus Fit

Least Squares Line: SP = 84.454 + 0.2387 * HP

This graph shows  Residual Versus Fit

Normal Probability Plot: Residuals

Least Squares Line: SP = 84.454 + 0.2387 * HP

This graph shows  Normal Probability Plot: Residuals

13.2 Multiple Regression

Instructional video icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Linear Regression —> Simple & Multiple Regression.

  2. Select a Dataset.

  3. In the Model Specification section, select your response variable from the Response drop down.

  4. In the formula textbox, add your predictors. Predictors must be separated by a + sign. To get a model without an intercept, add -1 to your formula. See R documentation for details on how to specify models with interactions using “*” and “:”.

  5. (Optional) Click the Details button to add additional output, including model estimate and diagnostics graphs, diagnostic indices, fitted values, and prediction intervals.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Multiple regression dialog


Rguroo Output
Report of Regression Analysis

Cases Used in the Analysis

Model formula: SP ~ HP + TYPE

Data Used No.
Number of cases read 82
Number of cases used in the analysis 82
Number of incomplete (omitted) cases 0

Regression Coefficients Estimates and t-Test

Model formula: SP ~ HP + TYPE
Reference Level(s): TYPEDomestic

Term Coefficient Estimate Standard Error t Value Pr > |t| (P-Value)
(Intercept) 83.2944 0.944344 88.2035 1.1053e-80
HP 0.236552 0.00673929 35.1005 5.96077e-50
TYPEImport 2.46318 0.769735 3.20004 0.00197921

Model Summary: Coefficient of Determination (R-Squared)

Model formula: SP ~ HP + TYPE

Residual Standard Error DF R-Squared Adjusted R-Squared
3.43039 79 0.941759 0.940284

Analysis of Variance

Model formula: SP ~ HP + TYPE

Source DF Sum of Squares Mean Square F Value Pr > F
Regression 2 15032.3 7516.13 638.716 1.68482e-49
Residual 79 929.638 11.7676
Total 81 15961.9

Residual Versus Fit

Model formula: SP ~ HP + TYPE

This graph shows  Residual Versus Fit

Normal Probability Plot: Residuals

Model formula: SP ~ HP + TYPE

This graph shows  Normal Probability Plot: Residuals

13.3 Regression Prediction Intervals

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Air Passenger dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Linear Regression —> Simple & Multiple Regression.

  2. Select a Dataset.

  3. In the Model Specification section, select your response variable from the Response drop down.

  4. In the formula textbox, add your predictors. Predictors must be separated by a + sign. To get a model without an intercept, add -1 to your formula. See R documentation for details on how to specify models with interactions using “*” and “:”.

  5. Click the Details button and select the Fitted Values, Predictions, and Interval Estimates tab. Here, move Prediction Interval to the Selected column using drag-and-drop or the menu arrows.

  6. Check one or both of the options Internal Data or External Data. Internal data refers to cases that are used to fit the model. External data refers to cases that are not used to fit the model. You specify the external data by adding them to your dataset and setting the response variable column for these cases to NA.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog


14 Goodness of Fit Test

14.0.1 Using a dataset

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the Starburst dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Starburst dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Goodness of Fit.

  2. Select a Dataset, the categorical variable from the Factor dropdown, and if your data is in tabular form where frequencies are given in a separate column, select your Frequency variable. The observed counts appear in the column labeled Obs. Count.

  3. In the column labeled Null Probability, type in the null hypothesis probability corresponding to each level of the factor variable. A few notes:

    • If you leave the Null Probability column blank, by default, equal probabilities are assumed for all levels.
    • You can use fractions or decimal values for the null probabilities. Moreover, if the null probabilities don’t add up to one, they will be normalized internally to add to one.
  4. Under the Test of Hypothesis section, check one or both of the Chi-Square or Simulation checkboxes (with graph, if desired). Also, set the Significance Level.

  5. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Goodness of fit test dialog


Rguroo Output
Goodness of Fit Test(s)

Data Summary and Diagnostics: Color

Color Observed Count Expected Count Observed Proportion Expected Proportion
Orange 50 44 0.28409 0.25
Pink 41 44 0.23295 0.25
Red 41 44 0.23295 0.25
Yellow 44 44 0.25 0.25

Null probabilities not specified by user set to 0

Chi-Squared Goodness of Fit Test

Research Hypothesis Ha: Population proportions of Color are different from the expected distribution

Observed Test Statistic Degrees of Freedom P-Value
1.22727 3 0.74647

Test is not significant at the 5% significance level

P-Value Graph:
Method: Chi-Squared Goodness of Fit Test

Null Density: Chi-Squared; df = 3
Null Hypothesis H0: The data agree with the proposed distribution of Color

  ,  The graph shows the distribution of   Chi-Squared Values  ,  Observed  Chi-Squared Value = 1.2273 , P-Value = 0.74647

Critical Region Graph:
Method: Chi-Squared Goodness of Fit Test

Null Density: Chi-Squared; df = 3
Null Hypothesis H0: The data agree with the proposed distribution of Color

  ,  The graph shows the distribution of   Chi-Squared Values  ,  Observed  Chi-Squared Value = 1.2273 , Critical Region for  alpha  =  5% 
 critical value = 7.8147

Chi-Squared Goodness of Fit Test by Simulation

Random generator seed = 100

Observed Test Statistic P-Value Number of Simulations
1.22727 0.7640 10000

Test is not significant at the 5% significance level

Chi-Squared Goodness of Fit Test by Simulation: Graph

  ,  The graph shows the distribution of  simulated  Chi-Squared Values  ,  Observed Chi-Squared Value = 1.2273 , P-Value = 0.764 , Critical boundary for  alpha  =  5% , 
 Critical value = 7.8182 , Number of simulations = 10000 , Random generator seed = 100

14.0.2 Entering data manually

Rguroo Instructions
  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Goodness of Fit.

  2. In the Factor textbox, enter a label for your categorical variable.

  3. Click the green plus button Green plus button to add a category; a row appears. Type in a label and its corresponding count on the columns labeled Label, and Obs. Count. Repeat this for each level of the categorical variable.

  4. In the column labeled Null Probability, type in the null hypothesis probability corresponding to each level of the factor variable. A few notes:

    • If you leave the Null Probability column blank, by default, equal probabilities are assumed for all levels.
    • You can use fractions or decimal values for the null probabilities. Moreover, if the null probabilities don’t add up to one, they will be normalized internally to add to one.
  5. Under the Test of Hypothesis section, check one or both of the Chi-Square or Simulation checkboxes (with graph, if desired). Also, set the Significance Level.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Goodness of fit test dialog


Rguroo Output
Goodness of Fit Test(s)

Data Summary and Diagnostics: Ethnicity

Ethnicity Observed Count Expected Count Observed Proportion Expected Proportion
Asian 30 25.83 0.19355 0.16667
Hispanic 45 51.67 0.29032 0.33333
African-American 15 12.92 0.09677 0.08333
White 60 51.67 0.3871 0.33333
Other 5 12.92 0.03226 0.08333

Null probabilities not specified by user set to 0

Chi-Squared Goodness of Fit Test

Research Hypothesis Ha: Population proportions of Ethnicity are different from the expected distribution

Observed Test Statistic Degrees of Freedom P-Value
8.06452 4 0.08924

Test is not significant at the 5% significance level

P-Value Graph:
Method: Chi-Squared Goodness of Fit Test

Null Density: Chi-Squared; df = 4
Null Hypothesis H0: The data agree with the proposed distribution of Ethnicity

  ,  The graph shows the distribution of   Chi-Squared Values  ,  Observed  Chi-Squared Value = 8.0645 , P-Value = 0.089243

Critical Region Graph:
Method: Chi-Squared Goodness of Fit Test

Null Density: Chi-Squared; df = 4
Null Hypothesis H0: The data agree with the proposed distribution of Ethnicity

  ,  The graph shows the distribution of   Chi-Squared Values  ,  Observed  Chi-Squared Value = 8.0645 , Critical Region for  alpha  =  5% 
 critical value = 9.4877

15 Analysis of Two-Way Contingency Tables

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the HairEyeColor dataset from the Rguroo dataset repository called R Dataset into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Hair eye color dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Contingency Table.

  2. Select a Dataset.

  3. From the Factor 1 and Factor 2 dropdowns, select the categorical variables. If your data is in tabular form where frequencies are given in a separate column, select your Frequency variable.

  4. Under the Test of Independence section, select a method (with graph, if desired). Also, set the Significance Level.

  5. (Optional) Select Mosaic Plot.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Goodness of fit test dialog


Rguroo Output
Contingency Test

Observed Counts

Row Variable is Sex
Column Variable is Hair

Black Blond Brown Red Total
Female 52 81 143 37 313
Male 56 46 143 34 279
Total 108 127 286 71 592

Mosaic Plot of Hair vs Sex

, ,Mosaic Plot of Hair vs Sex, , The x-axis label is, Sex, , The y-axis label is, Hair

Chi-Squared Test of Independence

Research Hypothesis Ha: Sex and Hair are associated

Observed Test Statistic Degrees of Freedom P-Value
7.99424 3 0.04613

Test is significant at the 5% significance level

P-Value Graph:
Method: Chi-Squared Test of Independence

Null Density: Chi-Squared; df = 3
Null Hypothesis H0: Sex and Hair are independent

  ,  The graph shows the distribution of   Chi-Squared Values  ,  Observed  Chi-Squared Value = 7.9942 , P-Value = 0.046131

Critical Region Graph:
Method: Chi-Squared Test of Independence

Null Density: Chi-Squared; df = 3
Null Hypothesis H0: Sex and Hair are independent

  ,  The graph shows the distribution of   Chi-Squared Values  ,  Observed  Chi-Squared Value = 7.9942 , Critical Region for  alpha  =  5% 
 critical value = 7.8147

Fisher Exact Test: Exact p-value

Research Hypothesis Ha: Sex and Hair are not independent.

p-value
0.0448888

Test is significant at the 5% significance level

Expected Counts

Row Variable is Sex
Column Variable is Hair

Black Blond Brown Red Total
Female 57.1 67.15 151.21 37.54 313
Male 50.9 59.85 134.79 33.46 279
Total 108 127 286 71 592

16 Analysis of Variance (ANOVA)

16.1 One-Way ANOVA

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the PlantGrowthX dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the PlantGrowthData dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose ANOVA.

  2. Select a dataset from the Dataset dropdown.

  3. Select your response variable from the Response dropdown.

  4. In the One-Way tab, select your categorical variable from the Factor dropdown, and type in a significance level.

  5. (Optional) Check the Diagnostics checkbox, and click on the Diagnostics button. The ANOVA Diagnostics dialog opens. Select your desired diagnostics.

  6. (Optional) Check the Post-hoc Test checkbox, and click on the Post-hoc Test button. The ANOVA Post-Hoc dialog opens. Select one or both of Tukey’s HSD or Pairwise Student t.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Analysis of Variance dialog


Rguroo Output
ANOVA Report

Data Summary

Model: weight ~ group
Model: fixed effect
Data: Balanced

Data.Used No.
Number of cases read 30
Number of cases used in analysis 30
Number of incomplete cases (omitted) 0

Count Summary

Model: weight ~ group
Response Variable: weight

Count NAs (Missings)
30 0

Effect: group

group Count NAs (Missings)
ctrl 10 0
trt1 10 0
trt2 10 0

ANOVA Table

Model: weight ~ group
H0: The means for all levels are equal

Source DF Sum of Squares Mean Square F Value Pr>F BFB
group 2 3.76634 1.88317 4.84609 0.01591 5.58407
Residual 27 10.4921 0.388596

group is significant at 5% significance level.

Boxplot:

Model: weight ~ group

Boxplot of Response values.

Residuals vs. Fitted

Model: weight ~ group

Residual scatter plot: residuals on y-axis and fitted values on x-axis.

QQ-Plot for Residuals

Model: weight ~ group

A quantile-quantile plot of residuals.

Levene's Test on Equality of Variances
(Median is used as center)

The null hypothesis: variances are equal.

F Value DF P(>|F|)
1.11919 (2, 27) 0.341227

Test is not significant at 5% significance level.

Tukey's HSD: Multiple Comparison of Means

Table of 95% family-wise confidence level

Difference Levels Mean Lower Limit Upper Limit Adjusted p-value
trt1-ctrl -0.371 -1.06222 0.320216 0.390871
trt2-ctrl 0.494 -0.197216 1.18522 0.197996
trt2-trt1 0.865 0.173784 1.55622 0.0120064

Tukey's HSD: Multiple Comparison of Means

Graph of 95% family-wise confidence level

Tukey's HSD graph showing the interval for mean difference between different levels of term: group

16.2 Two-Way ANOVA

Instructional video icon

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the PlantGrowthX dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the PlantGrowthData dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose ANOVA.

  2. Select a dataset from the Dataset dropdown, and select your response variable from the Response dropdown.

  3. Click the Two-Way tab, select your categorical variables from the Factor A and Factor B dropdowns, and type in a significance level.

  4. (Optional) Check the Diagnostics checkbox, and click on the Diagnostics button. The ANOVA Diagnostics dialog opens. Select your desired diagnostics.

  5. (Optional) Check the Post-hoc Test checkbox, and click on the Post-hoc Test button. The ANOVA Post-Hoc dialog opens. Select one or both of Tukey’s HSD or Pairwise Student t.

  6. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Two-Way ANOVA dialog


Rguroo Output
ANOVA Report

Data Summary

Model: weight ~ group + plot
Model: fixed effect
Data: Balanced

Data.Used No.
Number of cases read 30
Number of cases used in analysis 30
Number of incomplete cases (omitted) 0

Count Summary

Model: weight ~ group + plot
Response Variable: weight

Count NAs (Missings)
30 0

Effect: group

group Count NAs (Missings)
ctrl 10 0
trt1 10 0
trt2 10 0

Effect: plot

plot Count NAs (Missings)
A 15 0
B 15 0

ANOVA Table

Model: weight ~ group + plot
H0: The means for all levels are equal

Source DF Sum of Squares Mean Square F Value Pr>F BFB
group 2 3.76634 1.88317 9.40051 0.000847054 61.3967
plot 1 5.2836 5.2836 26.375 2.35008e-05 1468.68
Residual 26 5.20849 0.200326

group is significant at 5% significance level.
plot is significant at 5% significance level.

Boxplot:

Model: weight ~ group + plot

Boxplot of Response values.

Residuals vs. Fitted

Model: weight ~ group + plot

Residual scatter plot: residuals on y-axis and fitted values on x-axis.

QQ-Plot for Residuals

Model: weight ~ group + plot

A quantile-quantile plot of residuals.

Levene's Test on Equality of Variances
(Median is used as center)

The null hypothesis: variances are equal.

F Value DF P(>|F|)
0.715407 (5, 24) 0.61802

Test is not significant at 5% significance level.

Tukey's HSD: Multiple Comparison of Means

Table of 95% family-wise confidence level

Difference Levels Mean Lower Limit Upper Limit Adjusted p-value
trt1-ctrl -0.371 -0.868384 0.126384 0.172501
trt2-ctrl 0.494 -0.00338414 0.991384 0.0518359
trt2-trt1 0.865 0.367616 1.36238 0.000571992

Tukey's HSD: Multiple Comparison of Means

Graph of 95% family-wise confidence level

Tukey's HSD graph showing the interval for mean difference between different levels of term: group

Tukey's HSD: Multiple Comparison of Means

Table of 95% family-wise confidence level

Difference Levels Mean Lower Limit Upper Limit Adjusted p-value
B-A 0.839333 0.503393 1.17527 2.35008e-05

Tukey's HSD: Multiple Comparison of Means

Graph of 95% family-wise confidence level

Tukey's HSD graph showing the interval for mean difference between different levels of term: plot

17 Nonparametric Analysis of Variance (Nonparametric ANOVA)

17.1 One-Way ANOVA (Kruskal- Wallis and Permutation Test)

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the PlantGrowthX dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the PlantGrowthData dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Non-Parametric ANOVA.

  2. Select a dataset from the Dataset dropdown.

  3. Select your response variable from the Response dropdown.

  4. In the One-Way tab, select your categorical variable from the Factor dropdown, and type in a significance level.

  5. (Optional) Select the Boxplot checkbox to see the boxplots for the selected variable by the levels of the selected factor.

  6. Select one or both methods of Kruskal-Wallis or Permutation Test.

  7. (Optional) If you select the Permutation test, click the Details button button to specify the number of Replications and a seed for the random number generator.

  8. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Analysis of Variance dialog


Rguroo Output
Non-Parametric ANOVA Report

Case Summary

Model: weight ~ group

Data.Used No.
Number of cases read 30
Number of cases used in analysis 30
Number of incomplete cases (omitted) 0

Response Variable Summary

Model: weight ~ group

Count NAs (Missings)
30 0

Effect: group

group Count NAs (Missings)
ctrl 10 0
trt1 10 0
trt2 10 0

Kruskal-Wallis Test

Model: weight ~ group

Method Observed Chi-Squared df p-value BFB
Kruskal-Wallis rank sum test 7.98823 2 0.0184238 4.99927

Response is significant at 5% significance level.

Boxplot:
weight ~ group

Boxplot for One-Way

Kruskal-Wallis P-value Graph

Model: weight ~ group

Density Graph

Response is significant at 5% significance level.

Permutation Test

Model: weight ~ group

Method Observed Chi-Squared p-value BFB
Permutation K-W Chi-square 7.98823 0.009 8.67747

Response is significant at 5% significance level.

Permutation Test P-value Graph

Model: weight ~ group

  ,  The graph shows the distribution of  simulated    ,  Observed Chi-squared Value = 7.9882 , P-value = 0.009 , Critical boundary for alpha = 0.05% , Critical values = (-Inf, 5.3963) , Number of simulations = 1000 ,

Response is significant at 5% significance level.

17.2 Two-Way ANOVA (Friedman and Permutation Tests)

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the PlantGrowthX dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the PlantGrowthData dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Use the Analysis dropdown menu and choose Non-Paramtric ANOVA.

  2. Select a dataset from the Dataset dropdown, and select your response variable from the Response dropdown.

  3. Click on the Two-Way tab, select your categorical variables from the Factor A and Factor B dropdowns, and type in a significance level.

  4. (Optional) Select the Boxplot checkbox to see the boxplots for the selected variable by the levels of the selected factors.

  5. Select one or both methods of Friedman or Permutation Test.

  6. (Optional) If you select the Permutation test, click the Details button button to specify the number of Replications and a seed for the random number generator.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Non-Parametric Two-Way ANOVA dialog


Rguroo Output
Non-Parametric ANOVA Report

Case Summary

Model: weight ~ group + plot

Data.Used No.
Number of cases read 30
Number of cases used in analysis 30
Number of incomplete cases (omitted) 0

Response Variable Summary

Model: weight ~ group + plot

Count NAs (Missings)
30 0

Effect: group

group Count NAs (Missings)
ctrl 10 0
trt1 10 0
trt2 10 0

Effect: plot

plot Count NAs (Missings)
A 15 0
B 15 0

Boxplot:
weight ~ group + plot

Boxplot for Two-Way

Friedman (Non-Parametric) Analysis of Variance Table

There are 5 observations of weight for each combination of levels of group and plot.
The mean value for each combination level is used since Friedman Test requires one observation for each combination.

Model: weight ~ group + plot

Method Observed Chi-Squared df p-value BFB
Friedman rank sum test 4 2 0.135335 1.35914

Response is not significant different at 5% significance level.

Friedman (Non-Parametric) ANOVA P-value Graph

H0: The response for all treatment levels (groups) are the same across the subjects/blocks

Density Graph

Response is not significant at 5% significance level.

Two-way Permutation Test

There are 5 observations of weight for each combination of levels of group and plot.
The mean value for each combination level is used since Permutation Test requires one observation for each combination.

Model: weight ~ group + plot

Method Observed Chi-Squared p-value BFB
Permutation Friedman Chi-square 4 0.161 1.25111

Response is not significant different at 5% significance level.

Two-way Permutation Test P-value Graph

  ,  The graph shows the distribution of  simulated    ,  Observed Chi-squared Value = 4 , P-value = 0.161 , Critical boundary for alpha = 0.05% , Critical values = (-Inf, 4) , Number of simulations = 1000 ,

Response is not significant at 5% significance level.

18 Time Series

18.1 Time Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the AirPassengers dataset.

  3. From the Numerical Variables section, select and drag the variable AirPassengers to the Selected column.

  4. (Optional) Under the Time Specification section, from the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The AirPassengers data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The AirPassengers data starts from January 1949, so set Year to \(\tt{1949}\) and Month to \(\tt{1}\).

  5. In the section Time Series Plot, select the Lines checkbox and optionally the Points checkbox.

  6. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog.


Rguroo Output

Screenshot of a time plot showing the number of airline passengers over time.

18.2 Moving Average Plot

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the AirPassengers dataset.

  3. From the Numerical Variables section, select and drag the variable AirPassengers to the Selected column.

  4. (Optional) Under the Time Specification section, from the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The AirPassengers data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The AirPassengers data starts from January 1949, so set Year to \(\tt{1949}\) and Month to \(\tt{1}\).

  5. In the section Time Series Plot, select the Lines and Moving Average checkbox.

  6. In the text field to the right of Moving-Average labeled q-Param, enter an integer value equal to the desired number of periods to calculate the moving average over.

  7. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog.


Rguroo Output
Time Series

Time Specification

Time Type Start Date End Date Increment No. observed
MONTH 1949-01-01 1960-12-01 1 144

Time Series Plot

, ,This graph shows time series plot for the variables, , AirPassengers, , The x-axis label is, Year, , The x-axis limits are between, 1949 and 1960, , The y-axis label is, AirPassengers, , The y-axis limits are between, 83.28 and 642.72, , The graph includes a legend., , Legend 1, , AirPassengers,  , The character used is a filled circle,  , The line type used is solid, ,, , Legend 2, , MA-AirPassengers,  , The character used is a NA,  , The line type used is solid, ,

18.3 Exponential Smoothing

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the USAccDeaths dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the USAccDeaths dataset.

  3. From the Numerical Variables section, select and drag the variable USAccDeaths to the Selected column.

  4. (Optional) Under the Time Specification section, from the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The USAccDeaths data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The USAccDeaths data starts from January 1, 1973 so set Year to \(\tt{1973}\) and Month to \(\tt{1}\).

  5. Click on the modeling button button to open the modeling dialog.

  6. In the Methods and Forecasting dialog under the Method section, check Exponential and Graph.

  7. In the text field to the right of Exponential, optionally enter an Alpha value between 0 and 1. If a value is not specified, a default value will be chosen by Rguroo. We use a value of 0.5 for Alpha in this example.

  8. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog for exponential smoothing.


Rguroo Output
Time Series

Time Specification

Time Type Start Date End Date Increment No. observed
MONTH 1973-01-01 1978-12-01 1 72

Exponential Smoothing: Parameter and Error Estimates [USAccDeaths]

Alpha Beta Gamma SSE MSE RMSE
0.5 --- --- 4.88896e+07 688586 829.811

Exponential Smoothing: Coefficient Estimates [USAccDeaths]

Level
9092.6

Exponential Smoothing of [USAccDeaths] (alpha = 0.5)

, ,This graph shows time series plot for the variables, , USAccDeaths, , The x-axis label is, Year, , The x-axis limits are between, 1973 and 1978, , The y-axis label is, USAccDeaths, , The y-axis limits are between, 6715 and 11494, , The graph includes a legend., , Legend 1, , USAccDeaths,  , The character used is a filled circle,  , The line type used is solid, ,, , Legend 2, , Exp Fit,  , The character used is a NA,  , The line type used is NA, ,

18.4 Exponential Smoothing with Trend (Double-Exponential)

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the USAccDeaths dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the USAccDeaths dataset.

  3. From the Numerical Variables section, select and drag the variable USAccDeaths to the Selected column.

  4. (Optional) Under the Time Specification section, from the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The USAccDeaths data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The USAccDeaths data starts from January 1, 1973 so set Year to \(\tt{1973}\) and Month to \(\tt{1}\).

  5. Click on the modeling button button to open the modeling dialog.

  6. In the Methods and Forecasting dialog under the Method section, check Double Exponential and Graph.

  7. In the text field to the right of Double Exponential, optionally enter an Alpha and/or a Beta value between 0 and 1. If a value is not specified, a default value will be chosen by Rguroo. We use a value of 0.5 for Alpha and 0.3 for Beta are used for this example.

  8. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog for double exponential smoothing.


Rguroo Output
Time Series

Time Specification

Time Type Start Date End Date Increment No. observed
MONTH 1973-01-01 1978-12-01 1 72

Double Exponential Smoothing: Parameter and Error Estimates [USAccDeaths]

Alpha Beta Gamma SSE MSE RMSE
0.5 0.3 --- 7.887e+07 1.12671e+06 1061.47

Double Exponential Smoothing: Coefficient Estimates [USAccDeaths]

Level Trend
9124.66 -56.0626

Double-Exponential Smoothing of [USAccDeaths] (alpha = 0.5, beta = 0.3)

, ,This graph shows time series plot for the variables, , USAccDeaths, , The x-axis label is, Year, , The x-axis limits are between, 1973 and 1978, , The y-axis label is, USAccDeaths, , The y-axis limits are between, 6701.0904219945 and 11494.5349837694, , The graph includes a legend., , Legend 1, , USAccDeaths,  , The character used is a filled circle,  , The line type used is solid, ,, , Legend 2, , Double Exp Fit,  , The character used is a NA,  , The line type used is NA, ,

18.5 Additive and Multiplicative Seasonal Smoothing (The Holt-Winters Method)

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the air passenger dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the AirPassengers dataset.

  3. From the Numerical Variables section, select and drag the variable AirPassengers to the Selected column.

  4. Under the Time Specification section

    • In the text field next to Freq., enter an integer value greater than or equal to 2 representing the number of observations per unit of time (4 for quarterly data, 12 for monthly data, etc.). A Freq. value of 12 is used for this example.
    • (Optional) From the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The AirPassengers data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The AirPassengers data starts from January 1, 1949 so set Year to \(\tt{1949}\) and Month to \(\tt{1}\).
  5. Click on the modeling button button to open the modeling dialog.

  6. In the Methods and Forecasting dialog under the Method section, check Holt-Winters’ and select the Graph checkbox.

  7. In the text field to the right of Double Exponential, optionally enter an Alpha, Beta, and/or Gamma value between 0 and 1. If a value is not specified, a default value will be chosen by Rguroo. In this example we use Rguroo’s default values.

  8. In the dropdown menu labeled Holt-Winters’ Model Type select one of Additive or Multiplicative for the type of seasonal smoothing. We select Additive for this example.

  9. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog for double exponential smoothing.


Rguroo Output
Time Series

Time Specification

Time Type Start Date End Date Increment No. observed
MONTH 1949-01-01 1960-12-01 1 144

Holt-Winters' Smoothing: Parameter and Error Estimates [AirPassengers]

Alpha Beta Gamma SSE MSE RMSE
0.247959 0.0345337 1 21860.2 165.607 12.8689

Holt-Winters' Smoothing: Coefficient Estimates [AirPassengers]

Level Trend Seasonal 1 Seasonal 2 Seasonal 3 Seasonal 4 Seasonal 5 Seasonal 6 Seasonal 7 Seasonal 8 Seasonal 9 Seasonal 10 Seasonal 11 Seasonal 12
477.828 3.12763 -27.4577 -54.6925 -20.1746 12.9191 18.8736 75.2944 152.888 134.613 33.7783 -18.3791 -87.7724 -45.8278

Holt-Winters' Smoothing (additive) of [AirPassengers] (alpha = 0.248, beta = 0.0345, gamma = 1)

, ,This graph shows time series plot for the variables, , AirPassengers, , The x-axis label is, Year, , The x-axis limits are between, 1949 and 1960, , The y-axis label is, AirPassengers, , The y-axis limits are between, 83.28 and 642.72, , The graph includes a legend., , Legend 1, , AirPassengers,  , The character used is a filled circle,  , The line type used is solid, ,, , Legend 2, , Holt-Winters' Fit,  , The character used is a NA,  , The line type used is NA, ,

18.6 Classical Decomposition Graph

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the AirPassengers dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the air passenger dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the AirPassengers dataset.

  3. From the Numerical Variables section, select and drag the variable AirPassengers to the Selected column.

  4. Under the Time Specification section

    • In the text field next to Freq., enter an integer value greater than or equal to 2 representing the number of observations per unit of time (4 for quarterly data, 12 for monthly data, etc.). A Freq. value of 12 is used for this example.
    • (Optional) From the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The AirPassengers data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The AirPassengers data starts from January 1, 1949 so set Year to \(\tt{1949}\) and Month to \(\tt{1}\).
  5. At the bottom of the Basics button dialog, select Classical Decomposition Graph.

  6. From the Model Type dropdown menu to the right of the Classical Decomposition Graph checkbox, select one of Additive or Multiplicative.

  7. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog for decomposition graph.


Rguroo Output
Time Series

Time Specification

Time Type Start Date End Date Increment No. observed
MONTH 1949-01-01 1960-12-01 1 144

Classical Decomposition Graph (additive) [AirPassengers]

TBD

18.7 Fitted Values and Residuals

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the USAccDeaths dataset from the Rguroo dataset repository called R Datasets into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.


  1. Open the Analytics toolbox on the left-hand side of the Rguroo window. Click on the Analysis dropdown menu and choose the Time Series function.

  2. Select your dataset from the Dataset dropdown menu. In this example, we select the USAccDeaths dataset.

  3. From the Numerical Variables section, select and drag the variable USAccDeaths to the Selected column.

  4. (Optional) Under the Time Specification section, from the Type dropdown, select the time frequency of your data (yearly, quarterly, monthly, daily, hourly, by minute or seconds). The USAccDeaths data are monthly, so we select Month from the Type dropdown menu. Then set an appropriate start month and year for the data. The USAccDeaths data starts from January 1, 1973 so set Year to \(\tt{1973}\) and Month to \(\tt{1}\).

  5. Click on the modeling button button to open the modeling dialog.

  6. In the Methods and Forecasting dialog under the Method section

    • Check the modeling method(s) you would like to use (Exponential, Double Exponential, and/or Holt-Winters). Alpha, Beta, and Gamma parameters may be optionally specified in the text fields. For this example, we choose the (Exponential and Double Exponential methods with default values of Alpha and Beta.
    • Select the Fitted Values Table checkbox.
  7. Click the Preview icon preview icon to view the graph.


Rguroo Dialog

Screenshot of Time Series dilog for forecasting error.


Rguroo Output
Time Series

Time Specification

Time Type Start Date End Date Increment No. observed
MONTH 1973-01-01 1978-12-01 1 72

Exponential Smoothing: Parameter and Error Estimates [USAccDeaths]

Alpha Beta Gamma SSE MSE RMSE
0.999934 --- --- 3.78523e+07 533132 730.159

Exponential Smoothing: Coefficient Estimates [USAccDeaths]

Level
9239.96

Double Exponential Smoothing: Parameter and Error Estimates [USAccDeaths]

Alpha Beta Gamma SSE MSE RMSE
1 0.109594 --- 4.58233e+07 654619 809.085

Double Exponential Smoothing: Coefficient Estimates [USAccDeaths]

Level Trend
9240 40.5777

Fitted Values and Residuals for Model Fit [USAccDeaths]

Date/Time USAccDeaths Exponential: Fitted Values Exponential: Residuals Double-Exponential: Fitted Values Double-Exponential: Residuals
1973-01-01 9007
1973-02-01 8106 9007 -901
1973-03-01 8928 8106.06 821.94 7205 1723
1973-04-01 9137 8927.95 209.054 8215.83 921.17
1973-05-01 10017 9136.99 880.014 8525.78 1491.22
1973-06-01 10826 10016.9 809.058 9569.21 1256.79
1973-07-01 11317 10825.9 491.053 10515.9 801.051
1973-08-01 10744 11317 -572.968 11094.7 -350.739
1973-09-01 9713 10744 -1031.04 10483.3 -770.3
1973-10-01 9938 9713.07 224.932 9367.88 570.12
1973-11-01 9161 9937.99 -776.985 9655.36 -494.362
1973-12-01 8927 9161.05 -234.051 8824.18 102.817
1974-01-01 7750 8927.02 -1177.02 8601.45 -851.451
1974-02-01 6981 7750.08 -769.078 7331.14 -350.137
1974-03-01 8038 6981.05 1056.95 6523.76 1514.24
1974-04-01 8422 8037.93 384.07 7746.72 675.285
1974-05-01 8714 8421.97 292.025 8204.72 509.278
1974-06-01 9512 8713.98 798.019 8552.54 959.464
1974-07-01 10120 9511.95 608.053 9455.69 664.313
1974-08-01 9823 10120 -296.96 10136.5 -313.492
1974-09-01 8743 9823.02 -1080.02 9805.14 -1062.14
1974-10-01 9129 8743.07 385.929 8608.73 520.268
1974-11-01 8710 9128.97 -418.974 9051.75 -341.75
1974-12-01 8680 8710.03 -30.0277 8595.3 84.7038
1975-01-01 8162 8680 -518.002 8574.58 -412.579
1975-02-01 7306 8162.03 -856.034 8011.36 -705.363
1975-03-01 8124 7306.06 817.943 7078.06 1045.94
1975-04-01 7870 8123.95 -253.946 8010.69 -140.688
1975-05-01 9387 7870.02 1516.98 7741.27 1645.73
1975-06-01 9556 9386.9 169.1 9438.63 117.368
1975-07-01 10093 9555.99 537.011 9620.49 472.506
1975-08-01 9620 10093 -472.964 10209.3 -589.278
1975-09-01 8285 9620.03 -1335.03 9671.7 -1386.7
1975-10-01 8466 8285.09 180.912 8184.72 281.277
1975-11-01 8160 8465.99 -305.988 8396.55 -236.55
1975-12-01 8034 8160.02 -126.02 8064.63 -30.6252
1976-01-01 7717 8034.01 -317.008 7935.27 -218.269
1976-02-01 7461 7717.02 -256.021 7594.35 -133.348
1976-03-01 7767 7461.02 305.983 7323.73 443.266
1976-04-01 7925 7766.98 158.02 7678.31 246.687
1976-05-01 8623 7924.99 698.01 7863.35 759.652
1976-06-01 8945 8622.95 322.046 8644.6 300.398
1976-07-01 10078 8944.98 1133.02 8999.52 1078.48
1976-08-01 9179 10077.9 -898.925 10250.7 -1071.72
1976-09-01 8037 9179.06 -1142.06 9234.26 -1197.26
1976-10-01 8488 8037.08 450.925 7961.05 526.949
1976-11-01 7874 8487.97 -613.97 8469.8 -595.802
1976-12-01 8647 7874.04 772.959 7790.51 856.495
1977-01-01 7792 8646.95 -854.949 8657.37 -865.372
1977-02-01 6957 7792.06 -835.057 7707.53 -750.533
1977-03-01 7726 6957.06 768.945 6790.28 935.721
1977-04-01 8106 7725.95 380.051 7661.83 444.172
1977-05-01 8890 8105.97 784.025 8090.51 799.493
1977-06-01 9299 8889.95 409.052 8962.13 336.874
1977-07-01 10625 9298.97 1326.03 9408.05 1216.95
1977-08-01 9302 10624.9 -1322.91 10867.4 -1565.42
1977-09-01 8314 9302.09 -988.087 9372.86 -1058.86
1977-10-01 8850 8314.07 535.935 8268.81 581.188
1977-11-01 8265 8849.96 -584.965 8868.51 -603.507
1977-12-01 8796 8265.04 530.961 8217.37 578.634
1978-01-01 7836 8795.96 -959.965 8811.78 -975.781
1978-02-01 6892 7836.06 -944.063 7744.84 -852.841
1978-03-01 7791 6892.06 898.938 6707.38 1083.62
1978-04-01 8192 7790.94 401.059 7725.13 466.866
1978-05-01 9115 8191.97 923.027 8177.3 937.701
1978-06-01 9434 9114.94 319.061 9203.07 230.934
1978-07-01 10484 9433.98 1050.02 9547.37 936.625
1978-08-01 9827 10483.9 -656.931 10700 -873.023
1978-09-01 9110 9827.04 -717.043 9947.35 -837.345
1978-10-01 9070 9110.05 -40.0474 9138.58 -68.5772
1978-11-01 8633 9070 -437.003 9091.06 -458.062
1978-12-01 9240 8633.03 606.971 8603.86 636.139

19 Probability Calculator - Continuous Random Variables

Rguroo can compute probabilities and inverse probabilities for various discrete and continuous random variables. You can use the option Values –> Probability on the probability calculator to compute probabilities at a set of values that you specify for your selected random variable. To compute inverse probabilities, you can use the option Probability –> Value to obtain values of a random variable for the probability value(s) that you specify.

You can compute probabilities and inverse probabilities for the following continuous random variables: Beta, Cauchy, Chi-Square, Exponential, F, Gamma, Logistic, Log-Normal, Normal, Student t, Triangular, Uniform, and Weibull.

You can compute probabilities and inverse probabilities for the following discrete random variables: Bernoulli, Binomial, Discrete Uniform, Geometric, Hypergeometric, Negative Binomial, and Poisson.

19.1 Probability Distributions (Continuous): Normal, Student t, Chi-Square, F, and more

Instructional video icon

Rguroo Instructions
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Probability Calculator —> Continuous.

  2. By default, the option Values –> Probability is selected. From the Distribution dropdown, select your distribution. In this example, we selected Normal.

  3. Fill in the distribution parameters. For example, for the normal, we fill in the Mean and SD textboxes.

  4. From the next dropdown, select the region for which the probability you want to calculate. Specifically, for a value of \(a\) that you type in the textbox, Below corresponds to \(P(X\leq a)\), and Above corresponds to \(P(X\geq a)\). Also, for given values of \(a\) and \(b\) that you respectively type in the two Lower Tail and Upper Tail textboxes, Between corresponds to \(P(a \leq X\leq b)\), and Outside corresponds to $P(Xa Xb) $.

  5. (Optional) You can select the Graph checkbox to see a graph of the distribution in the output.

  6. (Optional) By clicking the plus icon plus button icon, you can add another probability calculation.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Continuous Probability Calculator Dialog


Rguroo Output

Probability Calculation: Calculation_1

X ∼ Normal(μ = 2, σ = 0.5)
The Expected Value is E(X) = 2 and the Variance is V(X) = 0.25.
The probability that X is between 1.5 and 3 equals 0.8186.
P(1.5 ≤ X ≤ 3) = 0.8186

Calculation graph of Probability Calculation: Calculation_1



19.2 Inverse Probabilities (Continuous): Normal, Student t, Chi-Square, F, and more

Instructional video icon

Rguroo Instructions
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Probability Calculator —> Continuous.

  2. Select the option Probability –> Value.

  3. From the Distribution dropdown, select your distribution. In this example, we selected Normal.

  4. Fill in the distribution parameters. For example, for the normal, we fill in the Mean and SD textboxes.

  5. In the next dropdown, type in a probability value(s) (fraction(s) between 0 to 1), and select the region for which the probability applies.

    • For a value of \(p\) that you type in the textbox, selecting Below computes an \(a\) value such that \(P(X\leq a) = p\), and selecting Above computes an \(a\) such that \(P(X\geq a) = p\).
    • If you select one of the options Between or Outside and type in values of \(p\) and \(q\) in the Lower Tail and Upper Tail textboxes, respectively, then \(a\) and \(b\) are computed such that \(P(X\leq a) = p\) and \(P(X\geq b) = q\).
    • For the Between selection, the graph highlights between \(a\) and \(b\), and for the and Outside selection, the graph highlights the tails (outside \(a\) and \(b\)).
  6. (Optional) You can select the Graph checkbox to see a graph of the distribution in the output.

  7. (Optional) By clicking the plus icon plus button icon, you can add another probability calculation.

  8. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Continuous Probability Calculator Dialog


Rguroo Output

Inverse Probability Calculation: Inverse Normal

X ∼ Normal(μ = 2, σ = 0.5)
The Expected Value is E(X) = 2 and the Variance is V(X) = 0.25.
The values at which the probability of X is below 0.05 and 0.1 are 1.178 and 2.641.
P(X ≤ 1.178) + P(X ≥ 2.641) = 0.15
Note: P(X ≤ 1.178) = 0.05 and P(X ≥ 2.641) = 0.1

Calculation graph of Inverse Probability Calculation: Inverse Normal



20 Probability Calculator - Discrete Random Variables

20.1 Probability Distributions (Discrete): Binomial, Poisson, Hypergeometric, and more

Instructional video icon

Rguroo Instructions
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Probability Calculator —> Discrete.

  2. By default, the option Values –> Probability is selected. From the Distribution dropdown, select your distribution. In this example, we selected Binomial.

  3. Fill in the distribution parameters. For example, for the binomial, we fill in the No. of Trials and Probability of Success textboxes.

  4. From the next dropdown, select the value(s) for which the probability you want to calculate.

    • If you select the option Equal from the dropdown, then for a value \(a\) that you type in the probability \(P(X=a)\) is computed.

    • For all options, except the Equal option, there is a “Include the Value” checkbox that, if checked, the probability at the typed-in value is included.

    • If the checkbox “Include the Value” on the right side of the textbox is checked, then for a value of \(a\) that you type in the textbox, Below corresponds to \(P(X\leq a)\), and Above corresponds to \(P(X\geq a)\). Also, for given values of \(a\) and \(b\) that you respectively type in the two Lower Tail and Upper Tail textboxes, Between corresponds to \(P(a \leq X\leq b)\), and Outside corresponds to $P(Xa Xb) $.

    • If the checkbox “Include the Value” on the right side of the textbox is not checked, then probabilities with strict inequality are calculated.

  5. (Optional) You can select the Graph checkbox to see a graph of the distribution in the output.

  6. (Optional) By clicking the plus icon plus button icon, you can add another probability calculation.

  7. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Continuous Probability Calculator Dialog


Rguroo Output

Probability Calculation: Binomial

X ∼ Binomial(n = 15, p = 0.25)
The Expected Value is E(X) = 3.75 and the Variance is V(X) = 2.812.
The probability that X is less than or equal to 2 or greater than 6 equals 0.2927.
P(X ≤ 2) + P(X > 6) = 0.2927

Calculation graph of Probability Calculation: Binomial



20.2 Inverse Probabilities (Discrete): Binomial, Poisson, Hypergeometric, and more

Instructional video icon

Rguroo Instructions
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Probability Calculator —> Discrete.

  2. Select the option Probability –> Value.

  3. From the Distribution dropdown, select your distribution. In this example, we selected Binomial.

  4. Fill in the distribution parameters. For example, for the binomial, we fill in the No. of Trials and Probability of Success textboxes.

  5. In the next dropdown, type in probability value(s) (fraction(s) between 0 to 1), and select the region for which the probability applies.

    • For a value of \(p\) that you type in the textbox, selecting Below computes an \(a\) value such that \(P(X\leq a) \approx p\), and selecting Above computes an \(a\) such that \(P(X\geq a)\approx p\).
    • If you select one of the options Between or Outside and type in values of \(p\) and \(q\) in the Lower Tail and Upper Tail textboxes, respectively, then \(a\) and \(b\) are computed such that \(P(X\leq a)\approx p\) and \(P(X\geq b)\approx q\).
    • For the Between selection, the graph highlights between \(a\) and \(b\), and for the and Outside selection, the graph highlights the tails (outside \(a\) and \(b\)).
    • In most cases approximate values are computed, since exact values are not available due to discreteness. See the output for details.
  6. (Optional) You can select the Graph checkbox to see a graph of the distribution in the output.

  7. (Optional) By clicking the plus icon plus button icon, you can add another probability calculation.

  8. Click the Preview icon preview icon to view the result.


Rguroo Dialog

Continuous Probability Calculator Dialog


Rguroo Output

Inverse Probability Calculation: Calculation_1

X ∼ Binomial(n = 15, p = 0.25)
The Expected Value is E(X) = 3.75 and the Variance is V(X) = 2.812.
The x values such that the tail probabilitiy densities are at least 0.1 and 0.2 are 2 and 5.
P(X ≤ 2) + P(X ≥ 5) = 0.5496 ≥ 0.3

Calculation graph of Inverse Probability Calculation: Calculation_1



21 Random Number Generation

You can generate random numbers from various continuous and discrete random variables. You can replicate samples, and apply either a set of predefined statistics or customized statistic (written using R code) to each sample.

21.1 Generating Random Samples

Rguroo Instructions
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Multiple Distribution Generator. The Multiple Distribution Random Generator dialog opens.

  2. Click the plus icon plus button icon to generate a set of random numbers. You can type in a name in the textbox that appears. You can add more than one random variable by clicking on the plus icon plus button icon.

  3. From the Distribution dropdown, select the distribution from which you want to generate random values. In this example, we selected Binomial.

  4. Fill in the distribution parameters. For example, for the binomial, we fill in the No. of Trials and Probability of Success textboxes.

  5. Specify the Sample Size, number of times you want to replicate your sample (Replication), and a seed for the random generation.

  6. Click the Preview icon preview icon to view the result.

  7. (Optional) You can save the result as a stand-alone dataset by typing a name in the Save Dataset As textbox and clicking on the Save Dataset As button.


Rguroo Dialog

Multiple random genertor dialog


Rguroo Output

Output of random generation

21.2 Applying Statistics to Selected Random Samples

You can apply statistics to the random samples you generate by selecting from a set of predefined standard statistics (e.g., mean, median. standard deviation, etc.) or by writing custom statistics using R code. In the example below, we compute the standard statistic mean and range for the three randomly generated samples of Section 21.1. Moreover, we write an R code to compute the midrange for the three samples.

Rguroo Instructions

Continue the Rguroo instructions of Section 21.1.

  1. From the Statistic dropdown, select Mean and Range.

  2. To write custom statistics using R code, click on the Custom Statistic button. The Custom Statistic dialog opens.

    • Click the plus icon plus button icon to add a statistic. Name your statistic in the textbox that appears.
    • Type your R code on the right-hand side panel. Use lower case x to refer to the generated values (for example, min(x)).
    • You can write multiple lines of code. However, the result of your code, when applied to each replicate, must be a single number or character.
  3. Click the Preview icon preview icon to view the result.

  4. (Optional) You can save the result as a stand-alone dataset by typing a name in the Save Dataset As textbox and clicking on the Save Dataset As button.


Rguroo Dialog

Multiple random genertor dialog for computing statistics


Rguroo Output

Output of summary stats random generation

22 Random Selection from a Dataset

You can select a random sample from an Rguroo dataset, with or without replacement, and replicate each sample. Moreover, you can apply statistic to each selected sample using R code.

22.1 Selecting Samples

Rguroo Instructions
  1. Use a dataset in your Rguroo account or recreate the example below by importing the cardata dataset from the Rguroo dataset repository called Rguroo Users Guide into your account.
Click here to see a portion of the dataset. Screenshot of the first 5 rows of the Cardata dataset.
  1. Open the Probability-Simulation toolbox on the left-hand side of the Rguroo window. Use the Probability dropdown menu and choose Random Selection. The Dataset Random Selection dialog opens.

  2. Select your dataset from the Dataset dropdown.

  3. Select your desired Sample Size, number of samples (Replications), and Seed.

  4. Select one of With or Without replacement.

  5. (Optional) If there is a numerical variable that consists of weights (probability of selection) for each case, select the variable from the Probability dropdown.

    • If no variable is selected, all cases to be sampled will have the same probability of getting selected.
    • The values of the probability variable must be all non-negative.
    • If the values of the probability don’t add up to one, they will be internally normalized to add up to one.
  6. (Option) In the Sample a Subset section, you can specify which rows and columns to sample from.

    • You can select rows using textboxes From –> To –> By or select rows by writing an R code in the Add Rows that results in specific row numbers. You can use both From –> To –> By and Add Rows at the same time.
    • You can select columns by writing an R code in the Columns that results in specific column numbers.
    • If length blank, all rows and columns will appear in the sample.

In our example, we select from the fourth (MPG) and fifth (HP) columns of the “cardata” dataset, and we only sample “Domestic” cars.

  1. Click the Preview icon preview icon to view the result.

  2. (Optional) You can save the result as a stand-alone dataset by typing a name in the Save Dataset As textbox and clicking on the Save Dataset As button.


Rguroo Dialog

Random Selection dialog


Rguroo Output

Output of random Selection

22.2 Applying Statistics to Selected Samples

You can apply functions to your selected random samples by writing R code. In the example below, we write a function that creates a variable called Efficiency. For each sample selected, we compute the mean of MPG and depending on whether this mean is more than 30, between 20 and 30, or less than 30, the value of Efficiency is set as High, Average, or Low.

Rguroo Instructions

Continue the Rguroo instructions of Section 22.1.

  1. Click the Statistic button on the top right of the application. The Custom Statistic dialog opens.

  2. Click the plus icon plus button icon on the Custom Statistic dialog. In the textbox that appears, type in a variable name.

  3. Type your R code on the middle panel.

    • You can double-click the names of the variables to include in your code or type them in.
    • You can write multiple lines of code. However, the result of your code, when applied to each sample (replicate), must be a single number or character.
  4. Click the Preview icon preview icon to view the result.

  5. (Optional) You can save the result as a stand-alone dataset by typing a name in the Save Dataset As textbox and clicking on the Save Dataset As button.


Rguroo Dialog

Random selection dialog for computing statistics


Rguroo Output

Output of summary stats

23 Applets

Statistical applets ArtofStat and Rossman and Chance are available within the Applets-Calculator toolbox of Rguroo. Moreover, you can open Desmos scientific and graphical calculators within Rguroo. Below we show a few examples of the use of the Desmos calculator.

23.1 Computing a Factorial

Rguroo Instructions
  1. Open the Applets toolbox on the left-hand side of the Rguroo window. Expand the Calculators (Desmos), and choose the Scientific Calculator.
  2. In the main section, enter the number that you want to compute its factorial.
  3. Click on the func section to open the functions menu of the calculator.
  4. Select the factorial sign!. The example below shows the computation of \(10!\).


Desmos Calculator in Rguroo

Desmos calculator for computing factorial

23.2 Computing a Combination

Rguroo Instructions
  1. Open the Applets toolbox on the left-hand side of the Rguroo window. Expand the Calculators (Desmos), and choose the Scientific Calculator.
  2. Click on the func option to open the functions menu of the calculator.
  3. Select nCr and enter the two required values separated by a comma. The example below shows the computation of fifteen choose 5 \(\pmatrix{15\\5}\).


Desmos Calculator in Rguroo

Desmos calculator for computing combination

23.3 Computing a Permutation

Rguroo Instructions
  1. Open the Applets toolbox on the left-hand side of the Rguroo window. Expand the Calculators (Desmos), and choose the Scientific Calculator.
  2. Click on the func option to open the functions menu of the calculator.
  3. Select nPr and enter the two required values separated by a comma. The example below shows the permutation of fifteen items by 5 \(15\ P\ 5\).


Desmos Calculator in Rguroo

Desmos calculator for computing permutation