Getting Started with R

Installing R

  1. Go to CRAN.
  2. Click on the link for your operating system (Windows, macOS, or Linux).

For Windows:

  1. Click on Download R for Windows.
  2. Click on base.
  3. Click on the Download R X.X.X for Windows link (where X.X.X is the latest version number).
  4. Run the downloaded .exe file and follow the installation instructions.

For macOS:

  1. Click on Download R for macOS.
  2. Click on the .pkg file link for the latest version compatible with your macOS version.
  3. Run the downloaded .pkg file and follow the installation instructions.

For Linux:

  1. Click on Download R for Linux.
  2. Follow the instructions for your specific Linux distribution (e.g., Ubuntu, Fedora).

Installing RStudio

  1. Go to RStudio.
  2. Navigate to the section titled RStudio Desktop. Then,

For Windows:

  1. Click on the Download RStudio for Windows button.
  2. Run the downloaded .exe file and follow the installation instructions.

For macOS:

  1. Click on the Download RStudio for macOS button.
  2. Run the downloaded .dmg file, drag RStudio to your Applications folder, and open it.

For Linux:

  1. Click on the Download RStudio for Linux button.
  2. Choose the appropriate file for your Linux distribution and follow the installation instructions provided on the page.

After installing RStudio, open the application. It comes default using the bright white template. I don’t want you guys to go blind. So, follows these steps if you wish to have a cooler template:

  1. Click on Tools at the top of your screen.
  2. Click on Global Options...
  3. Click on Appearance

From here you can tinker with the settings to see what you like. I use the Modern RStudio theme, set zoom at 90%, set editor font to Cascadia Code, leave text rendering at default, have the editor and help panel font sizes at 9, and I use a custom theme I made which I have under the Content>R and RStudio section in D2L. You can use any theme you wish (Cobalt is a good one), but to use my custom theme, follow these instructions:

  1. Navigate to Content>R and RStudio in D2L.
  2. Download My_RStudio_Theme.rstheme.
  3. Once in the Appearance window in RStudio by following the steps above, click Add.
  4. Find the My_RStudio_Theme.rstheme file and select it.
  5. In the bottom right, click Apply and then OK.

Furthermore, you can position your console, environment, etc. differently in RStudio by selecting Pane Layout on the left once in the Global Options... window.

.R Scripts

You will write your code in a .R script. To create a .R script, follow these instructions:

  1. Open RStudio.
  2. Click on File in the top-left corner.
  3. Select New File from the dropdown menu.
  4. Choose R Script from the submenu.

You should now see a blank script editor where you can write your R code. Save your script with a .R extension by clicking File > Save As... and giving your file a name ending with .R. You can also save your .R file by pressing Ctrl + S (Windows/Linux) or Cmd + S (Mac).

In this blank .R file, you will write your code. You can execute code line by line by following these instructions:

  1. Place your cursor on the line of code you want to run.
  2. Press Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac) to execute the current line.

Alternatively, you can highlight a block of code and press Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac) to run the selected code. You can also run a line or block of code by navigating to the run button on the top right of your .R script pane.

To run the entire .R script, follow these steps:

  1. Open your .R script in RStudio.
  2. Click on the Source button at the top of the script editor. This will run the entire script. Alternatively, you can use the shortcut Ctrl + Shift + S (Windows/Linux) or Cmd + Shift + S (Mac) to source the entire script.

I always start my .R script with the following lines of code:

# Clear environment, plot pane, and console
rm(list = ls())
graphics.off()
cat("\014")

The code clears all objects from your environment, removes any existing plots from your plot pane, and clears out your console. Essentially, you start working with a clean slate.

The R Console

The R console is the place the code you write in your .R script as well as its results will be displayed. You can also write code directly into the R console. This is good if you want to quickly run a line of code without actually putting it in your .R script. To write code in the R console, follow these instructions:

  1. Navigate to the tab that says Console within RStudio.
  2. Click to the right of the > symbol.
  3. Enter the code you would like to run.
  4. Press Enter.

Where to Get Help

Upon opening RStudio, you can navigate to the Help tab to get assistance. Once you find and click on the Help tab, search print. This will generate information on the print() function in R. You can do this for all functions.

More importantly, I highly recommended using ChatGPT when you write code. I always have ChatGPT open when I am coding. It makes your coding experience much more efficient, almost always giving you the correct or close to the correct answer on any coding question you have. Click ChatGPT to get the answers to your coding questions.

Packages

R has a rich ecosystem of packages that extend its functionality. Packages in R are collections of functions, data, and compiled code that are bundled together for easy distribution and use. They provide additional capabilities beyond the base R installation, such as advanced statistical techniques, data manipulation tools, and graphical capabilities. To use functions from packages, follow these steps:

  1. Install the package
  2. Load the package

Installing and Loading Packages

To install packages, you can use the install.packages() function which comes included with R. To install and manage packages more easily, you can install the package pacman with the following code:

# If pacman is not already installed, then install the pacman package
if(!require(pacman)) install.packages("pacman")

After running the code, if you are prompted to select a CRAN Mirror, please scroll down and choose any option from the U.S.

pacman provides a suite of functions for installing, loading, and managing R packages. Via the function p_load() contained in the pacman package, any packages passed into the p_load() function will be automatically installed and loaded into your current R environment.

# Use the pacman function p_load to load in packages needed for this document
pacman::p_load(ggplot2, data.table)

The double colon in pacman:: allows us to use functions without loading the pacman package first. All packages can do this, but you typically want to load in packages by using p_load and then use their functions like normal. You can load packages one at time via the require() function, but I encourage you to use pacman::p_load() and then pass all the packages you will be using in your .R script. Here, I passed the ggplot2 and data.table packages since I plan on using them in this tutorial. data.table is great package used for efficient data manipulation while ggplot2 has extremely powerful plotting capabilities.

Working Directory

The working directory is the folder where R reads and saves files by default. You can set the working directory using the setwd() function.

# Setting the working directory
setwd("C:/Users/wbras/OneDrive/Desktop/UA/Fall_2024/ECON_418-518/ECON_418-518_R_Tutorial")

# Get the working directory
getwd()
## [1] "C:/Users/wbras/OneDrive/Desktop/UA/Fall_2024/ECON_418-518/ECON_418-518_R_Tutorial"

Upon setting the working directory, you can read in data from a .csv or .xlsx file by running the next code as long as those files are contained in the current working directory.

# Read in .csv file from working directory
csv_data <- read.csv("file.csv")

# Read in .xlsx file from working directory
pacman::p_load(readxl)
xlsx_data <- readxl("file.xlsx")

If you have a very large dataset as a .csv file, you can use data.table’s fread() and fwrite() functions to read data from your working directory into R and write data from R to your working directory, respectively, much faster than base R’s read.csv() and write.csv().

# Read in .csv file from working directory fast
dt <- fread("file.csv")

# Write datatable to working directory fast
xlsx_data <- frwrite(dt)

Basic R Programming

Variables and Data Types

In R, you can assign values to variables using the <- operator. R supports various data types, including numeric, character, and logical. The name to the left of the <- operator is a variable that stores the value(s) stored to the right of the <- operator. When you name variables, you typically want to separate words with underscores _ or use camelcase. An example of camelcase is myNewVariable.

# x is assigned the value 10
x <- 10  

# Show the value of x
paste("x is", x)
## [1] "x is 10"
# y is assigned the value 5.5
y <- 5.5  

# Show the value of y
paste("y is", y)
## [1] "y is 5.5"
# Show the value of x + y
paste("x + y is", x + y)
## [1] "x + y is 15.5"
# Show the value of x * y
paste("x * y is", x * y)
## [1] "x * y is 55"
# Show the value of x / y rounded to two decimal places
paste("x / y is", round(x / y, 2))
## [1] "x / y is 1.82"
# Show the value of ln(x) rounded to two decimal places
paste("ln(x) is", round(log(x), 2))
## [1] "ln(x) is 2.3"
# Reassign x to 5 and show its value
x <- 5
paste("x is now", x)
## [1] "x is now 5"
# Assigning a character value to a variable
z <- "Hello, World!"  

# Show what z holds
paste("Will says", z)
## [1] "Will says Hello, World!"
# Assigning a logical value to a variable
flag <- TRUE  

# Show what is assigned to flag
paste("flag is", flag)
## [1] "flag is TRUE"

Control Flow

If statements are used for decision making. They allow you to execute certain sections of code based on conditions.

# Assign x and y to numeric values
x <- 10
y <- 8

# Basic if-else statement
if (x > y) {
  print("x is greater than y")  # This will be executed if x is greater than y
} else {
  print("x is not greater than y")  # This will be executed if x is less than or equal to y
}
## [1] "x is greater than y"
# Use the & (and) operator to check if both conditions hold
if (x > 9 & y > 9) {
  print("x and y are greater than 9")  # This will be executed if x and y are greater than 9
} else {
  print("x or y is less than or equal to 9")  # This will be executed if x or y is less than or equal to 9
}
## [1] "x or y is less than or equal to 9"
# Use the | (or) operator to check if at least one condition holds
if (x > 9 | y > 9) {
  print("x or y is greater than 9")  # This will be executed if x or y is greater than 9
} else {
  print("x and y are less than or equal to 9")  # This will be executed if x and y are less than or equal to 9
}
## [1] "x or y is greater than 9"
# Assign a score of 85 to the variable score
score <- 85  

# If-else example
if (score >= 90) {
  grade <- "A"  # Assign grade A if score is 90 or above
} else if (score >= 80) {
  grade <- "B"  # Assign grade B if score is 80 or above
} else if (score >= 70) {
  grade <- "C"  # Assign grade C if score is 70 or above
} else if (score >= 60) {
  grade <- "D"  # Assign grade D if score is 60 or above
} else {
  grade <- "F"  # Assign grade F if score is below 60
}

# Show what grade is assigned to
paste("Grade assigned is", grade)
## [1] "Grade assigned is B"

Loops

Loops allow you to execute code repeatedly.

For Loops

For loops are used for iterating over a sequence.

# Print numbers 1 to 5 using a for loop
for (i in 1:5) 
{
  print(i)  # Print the value of i in each iteration
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
# Calculate the sum of the first 10 natural numbers
sum <- 0  # Initialize sum to 0
for (i in 1:10) 
{
  sum <- sum + i  # Add the value of i to sum in each iteration
}

# Print the sum
print(sum)  
## [1] 55

While Loops

While loops continue to execute as long as the condition is true.

# Initialize i to 1
i <- 1            

# Print numbers 1 to 5 using a while loop. you will break out of the loop when i is 6
while (i <= 5) 
{
  # Print the value of i
  print(i)     
  
  # Increment i by 1
  i <- i + 1      
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
# Assign the value 5 to n
n <- 5           

 # Initialize factorial to 1
factorial <- 1   

# Initialize i to 1
i <- 1            

# Calculate the factorial of a number using a while loop
while (i <= n) 
{
  # Multiply factorial by i in each iteration
  factorial <- factorial * i 
  
  # Increment i by 1
  i <- i + 1      
}

# Print the factorial
print(factorial)  
## [1] 120

Vectors

A vector is a sequence of data elements of the same basic type. You can create vectors using the c() function.

# Creating a numeric vector
numbers <- c(1, 2, 3, 4, 5) 
numbers
## [1] 1 2 3 4 5
# Creating a character vector
characters <- c("a", "b", "c")
characters
## [1] "a" "b" "c"
# Creating a logical vector
logicals <- c(TRUE, FALSE, TRUE)
logicals
## [1]  TRUE FALSE  TRUE
# Calculate the sum of the elements in the numbers vector
sum_numbers <- sum(numbers)  

# Calculate the mean of the elements in the numbers vector
mean_numbers <- mean(numbers)  

# Find the maximum element in the numbers vector
max_numbers <- max(numbers)  
# Print the sum of the numbers vector
print(sum_numbers)  
## [1] 15
# Print the mean of the numbers vector
print(mean_numbers) 
## [1] 3
# Print the maximum element in the numbers vector
print(max_numbers)  
## [1] 5

Data Tables

data.table is a high-performance version of data.frame. It provides fast and memory-efficient data manipulation capabilities.

Creating Data Tables

You can create and print a data.table with the following code.

# Creating a data table
dt <- data.table(
  
  # Create a column named Name with elements "John", "Jane", and "Doe"
  Name = c("John", "Jane", "Doe"),  
  
  # Create a column named Age with elements 23, 25, and 28
  Age = c(23, 25, 28),  
  
  # Create a column named Height with elements 180, 165, and 170
  Height = c(180, 165, 170)  
)

# Display the data table
dt 

Adding Columns

You can add new columns to a data.table.

# Adding a new column named Weight with values 70, 55, 60
dt[, Weight := c(70, 55, 60)]  

# Adding a column named BMI based on existing columns Weight and Height
dt[, BMI := Weight / (Height / 100)^2]  

# Display the updated data table
dt 

Subsetting Data

Subsetting allows you to select specific rows and columns from a data.table.

# Select rows where the Age column is greater than 24
dt_subset <- dt[Age > 24]  

# Print the subset data table 
dt_subset 
# Select only the Name and Height columns
dt_selected_columns <- dt[, .(Name, Height)]  

# Print the data table with selected columns 
dt_selected_columns

Filtering Rows

Filtering allows you to select rows that meet certain criteria.

# Select rows where the BMI column is greater than 20
dt_filtered <- dt[BMI > 20]  

# Display the filtered data table
dt_filtered 

Data Analysis

Data analysis involves calculating summary statistics and exploring relationships between variables.

Summary Statistics

You can calculate various summary statistics using data.table.

# Calculate the mean of the Age column
mean_age <- dt[, mean(Age)]  

# Calculate the standard deviation of the Height column
sd_height <- dt[, sd(Height)]  

# Calculate summary statistics for Age and Height
summary_stats <- dt[, .(
  
  # Calculate the mean of the Age column
  Mean_Age = mean(Age),  
  
  # Calculate the standard deviation of the Age column
  SD_Age = sd(Age),  
  
  # Calculate the mean of the Height column
  Mean_Height = mean(Height),  
  
  # Calculate the standard deviation of the Height column
  SD_Height = sd(Height)  
)]

# Display the summary statistics
print(summary_stats)  
##    Mean_Age   SD_Age Mean_Height SD_Height
##       <num>    <num>       <num>     <num>
## 1: 25.33333 2.516611    171.6667  7.637626
# Use the summary function to get summary stats for all variables
summary(dt)
##      Name                Age            Height          Weight     
##  Length:3           Min.   :23.00   Min.   :165.0   Min.   :55.00  
##  Class :character   1st Qu.:24.00   1st Qu.:167.5   1st Qu.:57.50  
##  Mode  :character   Median :25.00   Median :170.0   Median :60.00  
##                     Mean   :25.33   Mean   :171.7   Mean   :61.67  
##                     3rd Qu.:26.50   3rd Qu.:175.0   3rd Qu.:65.00  
##                     Max.   :28.00   Max.   :180.0   Max.   :70.00  
##       BMI       
##  Min.   :20.20  
##  1st Qu.:20.48  
##  Median :20.76  
##  Mean   :20.86  
##  3rd Qu.:21.18  
##  Max.   :21.60

Wooldridge Package

The wooldridge package contains data sets from Jeffrey M. Wooldridge’s textbook “Introductory Econometrics: A Modern Approach”. Any MindTap computer-based assignment will use one of these data sets. Rather than downloading the .RData file, you can access any wooldridge dataset like I do in the below code.

# Load the wooldridge package
pacman::p_load("wooldridge")  

# Load the wage1 data set from the Wooldridge package as a data.table
dt <- as.data.table(wage1)

# Display the first few rows of the data set
head(dt) 
# Display the last 10 rows of the data set
tail(dt, 10)

Data Analysis with the Wooldridge Package

# Display summary statistics of the wage1 data set
summary(dt)
##       wage             educ           exper           tenure      
##  Min.   : 0.530   Min.   : 0.00   Min.   : 1.00   Min.   : 0.000  
##  1st Qu.: 3.330   1st Qu.:12.00   1st Qu.: 5.00   1st Qu.: 0.000  
##  Median : 4.650   Median :12.00   Median :13.50   Median : 2.000  
##  Mean   : 5.896   Mean   :12.56   Mean   :17.02   Mean   : 5.105  
##  3rd Qu.: 6.880   3rd Qu.:14.00   3rd Qu.:26.00   3rd Qu.: 7.000  
##  Max.   :24.980   Max.   :18.00   Max.   :51.00   Max.   :44.000  
##     nonwhite          female          married           numdep     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :1.0000   Median :1.000  
##  Mean   :0.1027   Mean   :0.4791   Mean   :0.6084   Mean   :1.044  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:2.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :6.000  
##       smsa           northcen         south             west       
##  Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.000   Median :0.0000   Median :0.0000  
##  Mean   :0.7224   Mean   :0.251   Mean   :0.3555   Mean   :0.1692  
##  3rd Qu.:1.0000   3rd Qu.:0.750   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.000   Max.   :1.0000   Max.   :1.0000  
##     construc          ndurman          trcommpu           trade       
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.0000   Median :0.00000   Median :0.0000  
##  Mean   :0.04563   Mean   :0.1141   Mean   :0.04373   Mean   :0.2871  
##  3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.00000   Max.   :1.0000  
##     services         profserv         profocc          clerocc      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1008   Mean   :0.2586   Mean   :0.3669   Mean   :0.1673  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##     servocc           lwage            expersq          tenursq       
##  Min.   :0.0000   Min.   :-0.6349   Min.   :   1.0   Min.   :   0.00  
##  1st Qu.:0.0000   1st Qu.: 1.2030   1st Qu.:  25.0   1st Qu.:   0.00  
##  Median :0.0000   Median : 1.5369   Median : 182.5   Median :   4.00  
##  Mean   :0.1407   Mean   : 1.6233   Mean   : 473.4   Mean   :  78.15  
##  3rd Qu.:0.0000   3rd Qu.: 1.9286   3rd Qu.: 676.0   3rd Qu.:  49.00  
##  Max.   :1.0000   Max.   : 3.2181   Max.   :2601.0   Max.   :1936.00
# Calculate the mean and standard deviation of the wage column
mean_wage <- dt[, mean(wage)]  
sd_wage <- dt[, sd(wage)]  

# Display the mean of wage
print(mean_wage) 
## [1] 5.896103
# Display the standard deviation of wage
print(sd_wage) 
## [1] 3.693086
# Calculate the covariance between wage and education
cov_wage_educ <- dt[, cov(wage, educ)]

# Display the covariance between wage and education
cov_wage_educ  
## [1] 4.150864

Linear Regression

Use linear regression to analyze the relationship between wage and education.

# Create a linear model with wage as the dependent variable and educ, exper, and tenure as the independent variables
model <- lm(wage ~ educ + exper + tenure, data = dt)  

# Print the summary of the linear regression model
summary(model)  
## 
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6068 -1.7747 -0.6279  1.1969 14.6536 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.87273    0.72896  -3.941 9.22e-05 ***
## educ         0.59897    0.05128  11.679  < 2e-16 ***
## exper        0.02234    0.01206   1.853   0.0645 .  
## tenure       0.16927    0.02164   7.820 2.93e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared:  0.3064, Adjusted R-squared:  0.3024 
## F-statistic: 76.87 on 3 and 522 DF,  p-value: < 2.2e-16
# Colors used for plot (https://www.rapidtables.com/web/color/RGB_Color.html)
color_1 <- "#00FF00"
color_2 <- "#ADD8E6"

# Create a scatter plot of wage vs education with a regression line using ggplot2
ggplot(wage1, aes(x = educ, y = wage)) +  # Initialize the plot with education on the x-axis and wage on the y-axis
  geom_point(color = color_1) +  # Add the data points to the plot
  geom_smooth(method = "lm", col = color_1) +  # Add a regression line to the plot (regression of wage on educ)
  labs(title = "Wage vs Education",  # Add a title to the plot
       x = "Education (years)",  # Label the x-axis
       y = "Wage") +  # Label the y-axis
    theme(
    panel.background = element_rect(fill = "black", color = "black"),
    plot.background = element_rect(fill = "black", color = "black"),
    panel.grid.major = element_line(color = "gray"),
    panel.grid.minor = element_line(color = "gray"),
    axis.text = element_text(color = color_1, size = 15, family = "Arial"),
    axis.title = element_text(color = color_2, size = 25, family = "Arial"),
    plot.title = element_text(hjust = 0.5, color = color_2, size = 30, family = "Arial", face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, color = color_1, size = 25)
  )