+1443 776-2705 panelessays@gmail.com
  

please make sure that it is your won and not copy and paste. please watch out for spelling and grammar errors. Please read the study guide. Pease use the APA 7th edition.

Book Reference: Fox, J. (2017). Using the R Commander: A point-and-click interface for R. CRC Press. https://online.vitalsource.com/#/books/9781498741934

What did you find to be most challenging about the importing data exercise from the unit lessons?

RCH 8303, Quantitative Data Analysis 1

Course Learning Outcomes for Unit II

Upon completion of this unit, students should be able to:

1. Perform statistical tests using software tools.
1.1 Describe the procedures to report descriptive statistics of a data set.
1.2 Report normality statistics.

2. Explain results of statistical tests.

2.1 Describe the process to upload a data set into R and process the data using Rcmdr.
2.2 Discuss the procedures necessary to successfully save Rcmdr-generated data on a student’s

computer.

3. Judge whether null hypotheses should be rejected or maintained.
3.1 Discuss the differences between null and alternative hypotheses.
3.2 Discuss differences between one-sided and two-sided hypotheses and when to use them.
3.3 Explain how to rule our rival hypotheses.

Course/Unit

Learning Outcomes
Learning Activity

1.1, 1.2

Unit Lesson
Chapter 3
Chapter 4
Unit II Assignment 2

2.1, 2.2

Unit Lesson
Chapter 3
Chapter 4
Unit II Assignment 1
Unit II Assignment 2

3.1, 3.2, 3.3 Unit Lesson Unit II Assignment 1

Required Unit Resources

Chapter 3: A Quick Tour of the R Commander

Chapter 4: Data Input and Data Management

Unit Lesson

Introduction

The knowledge you have gained already by completing the Collaborative Institutional Training Initiative (CITI)
Essentials of Statistical Analysis (EOSA) modules in Unit will help you in the remainder of the course and as
you work your way through your doctoral study.

In Unit II, we now turn our focus to the statistical program R and how to navigate the program and upload a
data set. We will also investigate the R Commander graphical user interface (GUI) interface, which will allow
you to use the R statistical program using a familiar point-and-click interface.

UNIT II STUDY GUIDE
R and Rcmdr

RCH 8303, Quantitative Data Analysis 2

UNIT x STUDY GUIDE
Title

R is mostly a code-driven statistical program and therefore difficult for many students to initially grasp how to
use when they are also trying to learn how to perform data analysis. The author of our textbook, John Fox,
developed the Rcmdr interface that allows the user to utilize the program using familiar point-and-click
methods. While it is helpful to perform advanced analyses, there is no source code to remember.

Unit II Plan

The Unit II assignment will be in two parts.

Part 1 of your assignment requires you to complete modules of the CITI Program EOSA that relate directly to
the readings in this unit. Each of the modules has a final quiz that must be completed and successfully
passed, demonstrating your knowledge of basic statistics and the research process.

For Part 2, you will need to have R and the R Commander installed on your computer. This lesson will briefly
explore some of the features and capabilities of R Commander as well as data input and data management
following with an assignment requiring you to report descriptive statistics of a data set. If you are not
comfortable utilizing R and R commander, you may use whatever statistical software program you choose.
The answers you submit for your assignment must be correct regardless of the software you choose.

These are the topics for the Unit II CITI EOSA course.

Standard Error and Type I-II Errors (ID 17617): This module discusses, explains, and demonstrates the use of
the central limit theorem, imprecision and the standard error, null and alternative hypotheses, and type I and
Type II errors. This module is important to understanding the central limit theorem and how repeatedly
sampling from the same population would form a normal distribution. Also, the concept of statistical versus
practical significance is introduced.

The Four Horsemen (ID 17618): This module identifies the four factors that interact to affect the outcome of
any given study or analysis. This module demonstrates the proper procedures for conducting a power
analysis, what to do with the information obtained, and how to plan a study to maximize the likelihood of
detecting an effect, if there is one to be found. This module introduces the student to the Effect size—
magnitude of the difference or relationship that emerges in a study.

Confidence Intervals and Degrees of Freedom (ID 17619): This module will introduce the student confidence
intervals and how to create and interpret them. Students will learn how to test hypotheses (introduced in
previous modules) using confidence intervals and explain what degrees of freedom and what role they play in
statistical analysis.

R Commander

Chapter 3 introduces the R Commander GUI by demonstrating its use for a simple problem. The chapter
illustrates the following steps:

In short, this is the typical workflow of data analysis using the R Commander. Turning to Appendix A on pages
200–205 of your textbook, all the menus available to you with the R Commander are displayed. As we move
though this course, we will utilize more of the menus.

• start the R Commander,
• describe the structure of the R Commander interface,
• how to read data into R Commander,
• how to modify data to prepare them for analysis,
• how to draw a graph,
• how to compute numerical summaries of data,
• how to create a printed report of your work,
• how to edit and re-execute commands generated by the R Commander, and
• how to terminate your R and R Commander session.

RCH 8303, Quantitative Data Analysis 3

UNIT x STUDY GUIDE
Title

Data Sets

In this unit, we will also turn our attention to loading a data set and exploring some of the options available to
us.

Chapter 4 of your textbook demonstrates how to upload data into the R Commander from a variety of sources,
including entering data directly at the keyboard, reading data from a plain-text file, accessing data stored in an
R package, and importing data from an Excel or other spreadsheet or from other statistical software. Chapter
4 also demonstrates how to save and export R data sets from the R Commander and how to modify data.

Now that you have R loaded on your computer, we will load the data set Datasets.xlsx from the textbook
website.

Importing Data in R Using the R Commander

Here is a brief tour/example on how to import data in R using the R Commander:

First, starting R Commander takes you to the top-level home screen (Figure 1). This is the starting point for
every session. Make sure that when you access R, you also load R Commander. Type in library(Rcmdr)
or see Unit I for a refresher on how to gain access to R Commander. Once R and R Commander have been
loaded, the home screen displays.

Figure 1
R Commander Home Screen

RCH 8303, Quantitative Data Analysis 4

UNIT x STUDY GUIDE
Title

Next, click on the Data menu item (Figure 2).

Figure 2
Accessing the Data Menu in R Commander

RCH 8303, Quantitative Data Analysis 5

UNIT x STUDY GUIDE
Title

Once selected, drill down to the menu option Import Data…From Excel file (Figure 3).

Figure 3
Importing Data From an Excel File in R Commander

RCH 8303, Quantitative Data Analysis 6

UNIT x STUDY GUIDE
Title

Once you select Importing Data from Excel file, a new screen will open allowing you to name your data set
and address variable names, row names, whether to convert characters to factors, and how to address
missing data (Figure 4).

Figure 4
Import Data From Excel File Options in R Commander

It is important to save the name of the data set as a one-word name. For illustration purposes, let’s use the
default name Dataset. Once you address the options and select OK, you can navigate to the data file on your
computer. You are able to save files in the cloud or on your computer, so the location is not a big issue.

RCH 8303, Quantitative Data Analysis 7

UNIT x STUDY GUIDE
Title

Once you click on the file, you will notice whether the file upload was a success or not.

Figure 5
Confirmation of Data File Import in R Commander

Notice that this data set has 45 rows and 5 columns.

RCH 8303, Quantitative Data Analysis 8

UNIT x STUDY GUIDE
Title

The next step is to view the data set. Go to the top of R Commander and find View data set. You may need to
move the items around since they are separate interfaces (Figure 6).

Figure 6
Viewing a data set in R Commander (Step 1)

The data set was successfully imported and you can view the data set on the screen that pops up with the
data (Figure 7).

RCH 8303, Quantitative Data Analysis 9

UNIT x STUDY GUIDE
Title

Figure 7
Viewing a data set in R Commander (Step 2)

RCH 8303, Quantitative Data Analysis 10

UNIT x STUDY GUIDE
Title

Exploring the Features and Capabilities of R Commander

Now that we have imported an Excel data set to R, we can briefly explore some of the features and
capabilities of R Commander. For a complete view of the menus, please see pages 200–205 of the textbook.

First, let’s obtain descriptive statistics of the active data set (see Figure 8).

Figure 8
Obtaining summary statistics in R Commander

RCH 8303, Quantitative Data Analysis 11

UNIT x STUDY GUIDE
Title

Next, obtain common summary statistics by selecting the Numerical Summary option (see Figure 9).

Figure 9
Obtaining numerical summaries in R Commander

RCH 8303, Quantitative Data Analysis 12

UNIT x STUDY GUIDE
Title

To examine if the education variable of the active data set follows an approximate normal distribution, one
can select the Test of Normality menu option. Once selected, you can select the education variable and one
of six common tests to review the results. In addition, you can select whether to examine the distribution of
the variable by groups. For this illustration, select the Shapiro-Wilk Normality Test (Figure 10).

Figure 10
Selecting Test of Normality in R Commander

Mandy McHaney
Added blue arrow

RCH 8303, Quantitative Data Analysis 13

UNIT x STUDY GUIDE
Title

Once “OK” is clicked, the results are displayed in the output window (Figure 11).

Figure 11
Results of Shapiro-Wilk Normality Test in R Commander

Learning Activities (Nongraded)

Nongraded Learning Activities are provided to aid students in their course of study. You do not have to submit
them. If you have questions, contact your instructor for further guidance and information.

For this unit, when studying APA formatting, pay particular attention to the sections that pertain to formatting
for research and statistics. Review formatting as needed.

  • Course Learning Outcomes for Unit II
  • Required Unit Resources
  • Unit Lesson
  • Introduction
  • Unit II Plan
  • R Commander
  • Data Sets
  • Importing Data in R Using the R Commander
  • Exploring the Features and Capabilities of R Commander
  • Learning Activities (Nongraded)


A Quick Tour of the R Commander

This chapter introduces the R Commander graphical user interface (GUI) by demonstrating its use for a simple problem: constructing a contingency table to examine the relationship between two categorical variables. In developing the example, I explain how to start the R Commander, describe the structure of the R Commanderinterface, show how to read data into the R Commander, how to modify data to prepare them for analysis, how to draw a graph, how to compute numerical summaries of data, how to create a printed report of your work, how to edit and re-execute commands generated by the R Commander, and how to terminate your R and R Commander session—in short, the typical work flow of data analysis using the R Commander. I also explain how to customize the R Commander interface.

In the course of this chapter, you’ll get an overview of the operation of the R Commander. Later in the book, I’ll return in more detail to many of the topics addressed in the chapter.



3.1 Starting the R Commander

I assume that you have installed R and the Rcmdr package, as described in the preceding chapter. As well, if you haven’t read 
Chapter 1
, now is a good time to do so—
Chapter 1
 explains some typographical conventions used in this book, discusses the general characteristics and origin of R and the R Commander, and introduces the web site for the book.

Start R in the normal manner for your computer, for example, by double-clicking on the R desktop icon in Windows, by double-clicking on R.app in the Mac OS X Applications folder, or by clicking on the R icon in the Mac OS X Launchpad.
1
 On a Linux or Unix machine, you’d normally start R by typing R at the command prompt in a terminal window.

Once R starts up, type the command library(Rcmdr) at the R > command prompt, and then press the Enter or Return key. This command should load the Rcmdr package and—after a brief delay—start the R Commander GUI, as shown in 
Figure 3.1
 for Windows
2
 or 
Figure 3.2
 for Mac OS X. If you encounter a problem in starting R or the R Commander, see the sections on troubleshooting in 
Chapter 2
 (
Section 2.2.1
 for Windows, 2.3.4 for Mac OS X, or 2.4.1 for Linux/Unix).



3.2 The R Commander Interface

Under Windows, the R Commander (
Figure 3.1
) looks like a standard program. In contrast, under Mac OS X(
Figure 3.2
), the R Commander has its own main menu bar, unlike a standard application, which would use the menu bar at the top of the Mac OS X desktop.
3

As you can see, the main R Commander window looks very similar under Windows and Mac OS X. After this introductory chapter, I will show R Commander dialog boxes as they appear under Windows 10. As well, all dialogs and graphs in the text are rendered in monochrome (gray-scale) rather than in color. 
4

At the top of the R Commander window there is a menu bar with the following top-level menus:


File
 contains menu items for opening and saving various kinds of files, and for changing the R working directory—the folder or directory in your file system where R will look for and write files by default.


Edit
 contains common menu items for editing text, such as Copy and Paste, along with specialized items for R Markdown documents (discussed in 
Section 3.6.2
).


Data
 contains menu items and submenus for importing, exporting, and manipulating data (see in particular 
Sections 3.3
 and 
3.4
, and 
Chapter 4
).


Statistics
 contains submenus for various kinds of statistical data analysis (discussed in several subsequent chapters), including fitting statistical models to data (
Chapter 7
).


Graphs
 contains menu items and submenus for creating common statistical graphs (see in particular 
Chapter 5
).


Models
 contains menu items and submenus for performing various operations on statistical models that have been fit to data (see 
Chapter 7
).


Distributions
 contains a menu item for setting the R random-number-generator seed for simulations, and submenus for computing, graphing, and sampling from a variety of common (and not so common) statistical distributions (see 
Chapter 8
).


Tools
 contains menu items for loading R packages and R Commander plug-in packages (see 
Chapter 9
), for setting and saving R Commander options (see 
Section 3.9
), for installing optional auxiliary software (see 
Section 2.5
), and, under Mac OS X, for managing app nap for R.app (see 
Section 2.3.3
).


Help
 contains menu items for obtaining information about the R Commander and R, including links to a brief introductory manual and to the R Commander and R web sites; information about the active data set; and a link to a web site with detailed instructions for using R Markdown to create reports (see 
Section 3.6
).

The complete R Commander menu tree is shown in the appendix to this book (starting on 
page 199
).

FIGURE 3.1: The R Console and R Commander windows at startup under Windows 10.

FIGURE 3.1: The R Console and R Commander windows at startup under Windows 10.

FIGURE 3.2: The R.app and R Commander windows at startup under Mac OS X.

Below the menus is a toolbar, with a button showing the name of the active data set (displaying < No active dataset> at startup), buttons to edit and view the active data set, and a button showing the active statistical model (displaying <No active model> before a statistical model has been fit to data in the active data set). The Data set and Model buttons may also be used to choose from among multiple data sets and associated statistical models if more than one data set or model resides in the R workspace—the region of your computer’s main memory where R stores data sets, statistical models, and other objects.

Below the toolbar there is a window pane with two tabs, labelled respectively R Script and R Markdown, that collect the R commands generated during your R Commander session. The contents of the R Script and R Markdowntabs can be edited, saved, and reused (as described in 
Section 3.6
), and commands in the R Script tab can be modified and re-executed by selecting a command or commands with the mouse (left-click and drag the mouse cursor over the command or commands) and pressing the Submit button below the R Script tab. If you know how, you can also type your own commands into the R Script tab and execute them with the Submit button (see 
Section 3.7
).
5
 The R Markdown tab, initially behind the R Script tab, also accumulates the R commands that are generated during a session, but in a dynamic document that you can edit and elaborate to create a printed report of your work (as described in 
Section 3.6.2
).

The R Commander Output pane appears next: The Output pane collects R commands generated by the R Commander along with associated printed output. The text in the Output pane is also editable, and it can be copied and pasted into other programs (as described in 
Section 3.6.1
).

Finally, at the bottom of the R Commander window, the Messages pane records messages generated by R and the R Commander—numbered and color-coded notes (dark blue), warnings (green), and error messages (red). For example, the startup note indicates the R Commander version, along with the date and time at the start of the session.

Once you have started the R Commander GUI, you can safely minimize the R Console window—this window occasionally reports messages, such as when the R Commander causes other R packages to be loaded, but these messages are incidental to the use of the R Commander and can almost always be safely ignored.
6



3.3 Reading Data into the R Commander

Statistical data analysis in the R Commander is based on an active data set in the form of an R data frame. A data frame is a rectangular data set in which the rows (running horizontally) represent cases (often individuals) and the columns (running vertically) represent variables descriptive of those cases. Columns in data frames can contain various forms of data —numeric variables, character-string variables (with values such as “Yes”, “No”, or “Maybe”), logical variables (with values TRUE or FALSE), and factors, which are the standard representation of categorical data in R. Typically, data frames used in the R Commander consist of numeric variables and factors, and character and logical variables, if present, are treated as factors.

R and the R Commander permit you to have as many data frames in your workspace as will fit,
7
 but only one is active at any given time. You can read data into data frames from several sources using the R Commander menus:
8
See the Data > Import data submenu, and the Data > Data in packages > Read data set from an attached packagemenu item and associated dialog. If more than one data frame resides in your workspace, you can choose among them by pressing the Data set button in the toolbar or via the menus: Data > Active data set > Select active data set.

One convenient source of data is a plain-text (“ASCII”) file with one line per case, variable names in the first line, and values in each line separated by a simple delimiter such as spaces or a comma. An example of a plain-text data file with comma-separated values, GSS.csv, is shown in 
Figure 3.3
.
9

The data in the file GSS.csv are drawn from the U.S. General Social Survey (GSS), and were collected between 1972 and 2012. The GSS is a periodic cross-sectional sample survey of the U. S. population conducted by the National Opinion Research Center at the University of Chicago. Many of the questions in the GSS are repeated in each survey, while other questions are repeated at intervals. To compile the GSS data set, I selected instances of the GSS that asked the question, “There’s been a lot of discussion about the way morals and attitudes about sex are changing in this country. If a man and a woman have sex relations before marriage, do you think it is always wrong, almost always wrong, wrong only sometimes, or not wrong at all?” I also included information about the year of the survey, and the respondents’ gender, education, and religion. 
Table 3.1
 shows the definition of the variables in the GSS data set.

FIGURE 3.3: The GSS.csv file, with comma-delimited data from the U.S. General Social Survey from 1972 to 2012. Only a few of the 33,355 lines in the file are shown; the widely spaced ellipses (…) represent elided lines. The first line in the file contains variable names.

TABLE 3.1: Variables in the GSS data set.

Variable

Values

year

numeric, year of survey, between 1972 and 2012

gender

character, female or male

premarital.sex

character, always wrong, almost always wrong, sometimes wrong, or not wrong at all

education

character, less than high school, high school, or post-secondary

religion

character, Protestant, Catholic, Jewish, other, or none

This is a natural point at which to explain how objects, including data sets and variables, are named in R: Standard R names are composed of lower- and upper-case letters (a–z, A–Z), numerals (0–9), periods (.), and underscores (_), and must begin with a letter or a period. As well, R is case sensitive; so, for example, the names education, Education, and EDUCATION are all distinct.

In order to keep this introductory example as simple as possible, when I compiled the GSS data set from the original source, I eliminated cases with missing values for any of the four substantive variables (of course, there were no missing values for the year of the survey). In R, missing values are represented by NA (“not available”), and in the R Commander, NA is the default missing-data code for text-data input, although another missing-data code (such as ?, ., or 99) can be specified. This and some other complications and variations are discussed in 
Chapter 4
 on reading and manipulating data in the R Commander.

To read simply formatted data in plain-text files into the R Commander, you can use Data > Import data > from text file, clipboard, or URL. As the name of this menu item implies, the data can be copied to the clipboard (e.g., from a suitably formatted spreadsheet) or read from a file on the Internet, but most often the data will reside in a file on your computer.

The resulting dialog box is shown in 
Figure 3.4
. This is a comparatively simple R Commander dialog box—for example, it doesn’t have multiple tabs—but it nevertheless illustrates several common elements of R Commanderdialogs:

FIGURE 3.4: The Read Text Data dialog as it appears on a Windows computer (left) and under Mac OS X (right).

•  There is a check box to indicate whether variable names are included with the data, as they are in the GSS.csv data file.

•  There are radio buttons for selecting one of several choices—here, where the data are located, how data values are separated, and what character is used for decimal points (e.g., commas are used in France and the Canadian province of Québec).

•  There are text fields into which the user can type information—here, the name of the data set, the missing-data indicator, and possibly the data-field separator.

I’ve taken all of the defaults in this dialog box, with the following two exceptions: I changed the default data set name, which is Dataset, to the more descriptive GSS. Recall the rules, explained above, for naming R objects. For example, GSS data, with an embedded blank, would not be a legal data set name. I also changed the default field separator from White space (one or more spaces or a tab) to Commas, as is appropriate for the comma-separated-values file GSS.csv.

The Read Text Data dialog also has buttons at the bottom that are standard in R Commander dialogs:

•  The Help button opens an R help page in a web browser, documenting either the use of the dialog or the use of an Rcommand that the dialog invokes. In this case, pressing the Help button opens up the help page for the Rread.table function, which is used to input simple plain-text data. R help pages are hyper-linked, so clicking on a link will open another, related help page in your browser. (Try it!)

FIGURE 3.5: The Open file dialog with the data file GSS.csv selected.

•  Pressing the OK button generates and executes an R command (or, in the case of some dialogs, a sequence of R commands).
10
 These commands are usually entered into the R Script and R Markdown tabs, and the commands and associated printed output appear in the Output pane. If graphical output is produced, it appears in a separate Rgraphics-device window.

Clicking OK in the Read Text Data dialog brings up a standard Open file dialog box, as shown in 
Figure 3.5
. I navigated to the location of the data file on my computer and selected the GSS.csv file. Notice that files of type .csv, .txt, and .dat (and their upper-case analogs) are listed by default—these are common file types associated with plain-text data files.

Clicking OK causes the data to be read from GSS.csv, creating the data frame GSS, and making it the active data set in the R Commander. The read.table command invoked by the dialog converts character data in the input file to R factors (here, the variables gender, premarital.sex, education, and religion).

•  Clicking the Cancel button simply dismisses the Read Text Data dialog.

As is apparent, the order of the buttons at the bottom of the dialog box is different in Windows and in Mac OS X, reflecting differing GUI conventions on these two computing platforms.

FIGURE 3.6: The R Commander data-set viewer displaying the GSS data set.



3.4 Examining and Recoding Variables

Having read data into the R Commander from an external source, it’s generally a good idea to take a quick look at the data, if only to confirm that they’ve been read properly. Clicking the View data set button in the R Commandertoolbar brings up the data-viewer window shown in 
Figure 3.6
. Variable names remain at the top of the display as the rows are scrolled using the scrollbar at the right of the data-viewer window. Row numbers appear to the left of the data; if the rows of the data set were named, the row names would appear here (and row numbers or names remain at the left if it’s necessary to scroll the data viewer horizontally). You may leave the data-viewer window open on your desktop as you continue to work in the R Commander, or you may close the data viewer. If you leave it open, the data viewer will be automatically updated if you make subsequent changes to the active data set.

Although the GSS data set contains a moderately large number of cases (with n = 33, 354 rows), there are only five variables, and so I request a summary of all the variables in the data set, invoked by Statistics > Summaries > Active data set. The result is shown in 
Figure 3.7
:

•  R commands generated in the R Commander session are accumulated in the R Script tab (and in the R Markdowntab, which is currently behind the R Script tab and consequently isn’t visible).

•  These commands, along with associated printed output, appear in the Output pane; the scrollbar at the right of the pane allows you to examine previous input and output that has scrolled out of view. If some printed material is wider than the pane, you can similarly use the horizontal scrollbar at the bottom to inspect it. The R Commandermakes an effort to fit output to the width of the Output pane, but it isn’t always successful.

•  Notice that the Messages pane now includes a note about the dimensions of the GSS data set, generated when the data set was read, and which appears below the initial start-up message.

The output produced by the summary(GSS) command includes a “five-number summary” for the numeric variable year, reporting the minimum, first