Download R for Windows.
base.
Download R X.X.X for Windows link (where
X.X.X is the latest version number).
.exe file and follow the installation
instructions.
Download R for macOS.
.pkg file link for the latest version
compatible with your macOS version.
.pkg file and follow the installation
instructions.
Download R for Linux.
RStudio Desktop.
Then,Download RStudio for Windows button.
.exe file and follow the installation
instructions.
Download RStudio for macOS button.
.dmg file, drag RStudio to your
Applications folder, and open it.
Download RStudio for Linux button.
After installing RStudio, open the application. It comes default using the bright white template. I don’t want you guys to go blind. So, follows these steps if you wish to have a cooler template:
Tools at the top of your screen.Global Options...AppearanceFrom here you can tinker with the settings to see what you like. I
use the Modern RStudio theme, set zoom at 90%, set editor font to
Cascadia Code, leave text rendering at default, have the editor and help
panel font sizes at 9, and I use a custom theme I made which I have
under the Content>R and RStudio section in
D2L. You can use any theme you wish (Cobalt is a good one), but to use
my custom theme, follow these instructions:
Content>R and RStudio in
D2L.My_RStudio_Theme.rstheme.Appearance window in RStudio by following
the steps above, click Add.My_RStudio_Theme.rstheme file and select
it.Apply and then
OK.Furthermore, you can position your console, environment, etc.
differently in RStudio by selecting Pane Layout on the left
once in the Global Options... window.
.R ScriptsYou will write your code in a .R script. To create a
.R script, follow these instructions:
File in the top-left corner.New File from the dropdown menu.R Script from the submenu.You should now see a blank script editor where you can write your R
code. Save your script with a .R extension by clicking
File > Save As... and giving your file a
name ending with .R. You can also save your .R
file by pressing Ctrl + S (Windows/Linux) or
Cmd + S (Mac).
In this blank .R file, you will write your code. You can
execute code line by line by following these instructions:
Ctrl + Enter (Windows/Linux) or
Cmd + Enter (Mac) to execute the current line.Alternatively, you can highlight a block of code and press
Ctrl + Enter (Windows/Linux) or Cmd + Enter
(Mac) to run the selected code. You can also run a line or block of code
by navigating to the run button on the top right of your
.R script pane.
To run the entire .R script, follow these steps:
.R script in RStudio.Source button at the top of the script
editor. This will run the entire script. Alternatively, you can use the
shortcut Ctrl + Shift + S (Windows/Linux) or
Cmd + Shift + S (Mac) to source the entire script.I always start my .R script with the following lines of
code:
# Clear environment, plot pane, and console
rm(list = ls())
graphics.off()
cat("\014")The code clears all objects from your environment, removes any existing plots from your plot pane, and clears out your console. Essentially, you start working with a clean slate.
The R console is the place the code you write in your .R
script as well as its results will be displayed. You can also write code
directly into the R console. This is good if you want to quickly run a
line of code without actually putting it in your .R script.
To write code in the R console, follow these instructions:
Console within
RStudio.> symbol.Enter.Upon opening RStudio, you can navigate to the Help tab
to get assistance. Once you find and click on the Help tab,
search print. This will generate information on the
print() function in R. You can do this for all
functions.
More importantly, I highly recommended using ChatGPT when you write code. I always have ChatGPT open when I am coding. It makes your coding experience much more efficient, almost always giving you the correct or close to the correct answer on any coding question you have. Click ChatGPT to get the answers to your coding questions.
R has a rich ecosystem of packages that extend its functionality. Packages in R are collections of functions, data, and compiled code that are bundled together for easy distribution and use. They provide additional capabilities beyond the base R installation, such as advanced statistical techniques, data manipulation tools, and graphical capabilities. To use functions from packages, follow these steps:
To install packages, you can use the install.packages()
function which comes included with R. To install and manage packages
more easily, you can install the package pacman with the
following code:
# If pacman is not already installed, then install the pacman package
if(!require(pacman)) install.packages("pacman")After running the code, if you are prompted to select a
CRAN Mirror, please scroll down and choose any option from
the U.S.
pacman provides a suite of functions for installing,
loading, and managing R packages. Via the function p_load()
contained in the pacman package, any packages passed into
the p_load() function will be automatically installed and
loaded into your current R environment.
# Use the pacman function p_load to load in packages needed for this document
pacman::p_load(ggplot2, data.table)The double colon in pacman:: allows us to use functions
without loading the pacman package first. All packages can
do this, but you typically want to load in packages by using
p_load and then use their functions like normal. You can
load packages one at time via the require() function, but I
encourage you to use pacman::p_load() and then pass all the
packages you will be using in your .R script. Here, I
passed the ggplot2 and data.table packages
since I plan on using them in this tutorial. data.table is
great package used for efficient data manipulation while
ggplot2 has extremely powerful plotting capabilities.
The working directory is the folder where R reads and saves files by
default. You can set the working directory using the
setwd() function.
# Setting the working directory
setwd("C:/Users/wbras/OneDrive/Desktop/UA/Fall_2024/ECON_418-518/ECON_418-518_R_Tutorial")
# Get the working directory
getwd()## [1] "C:/Users/wbras/OneDrive/Desktop/UA/Fall_2024/ECON_418-518/ECON_418-518_R_Tutorial"Upon setting the working directory, you can read in data from a
.csv or .xlsx file by running the next code as
long as those files are contained in the current working directory.
# Read in .csv file from working directory
csv_data <- read.csv("file.csv")
# Read in .xlsx file from working directory
pacman::p_load(readxl)
xlsx_data <- readxl("file.xlsx")If you have a very large dataset as a .csv file, you can
use data.table’s fread() and
fwrite() functions to read data from your working directory
into R and write data from R to your working
directory, respectively, much faster than base R’s
read.csv() and write.csv().
# Read in .csv file from working directory fast
dt <- fread("file.csv")
# Write datatable to working directory fast
xlsx_data <- frwrite(dt)In R, you can assign values to variables using the <-
operator. R supports various data types, including numeric, character,
and logical. The name to the left of the <- operator is
a variable that stores the value(s) stored to the right of the
<- operator. When you name variables, you typically want
to separate words with underscores _ or use camelcase. An
example of camelcase is myNewVariable.
# x is assigned the value 10
x <- 10  
# Show the value of x
paste("x is", x)## [1] "x is 10"# y is assigned the value 5.5
y <- 5.5  
# Show the value of y
paste("y is", y)## [1] "y is 5.5"# Show the value of x + y
paste("x + y is", x + y)## [1] "x + y is 15.5"# Show the value of x * y
paste("x * y is", x * y)## [1] "x * y is 55"# Show the value of x / y rounded to two decimal places
paste("x / y is", round(x / y, 2))## [1] "x / y is 1.82"# Show the value of ln(x) rounded to two decimal places
paste("ln(x) is", round(log(x), 2))## [1] "ln(x) is 2.3"# Reassign x to 5 and show its value
x <- 5
paste("x is now", x)## [1] "x is now 5"# Assigning a character value to a variable
z <- "Hello, World!"  
# Show what z holds
paste("Will says", z)## [1] "Will says Hello, World!"# Assigning a logical value to a variable
flag <- TRUE  
# Show what is assigned to flag
paste("flag is", flag)## [1] "flag is TRUE"If statements are used for decision making. They allow you to execute certain sections of code based on conditions.
# Assign x and y to numeric values
x <- 10
y <- 8
# Basic if-else statement
if (x > y) {
  print("x is greater than y")  # This will be executed if x is greater than y
} else {
  print("x is not greater than y")  # This will be executed if x is less than or equal to y
}## [1] "x is greater than y"# Use the & (and) operator to check if both conditions hold
if (x > 9 & y > 9) {
  print("x and y are greater than 9")  # This will be executed if x and y are greater than 9
} else {
  print("x or y is less than or equal to 9")  # This will be executed if x or y is less than or equal to 9
}## [1] "x or y is less than or equal to 9"# Use the | (or) operator to check if at least one condition holds
if (x > 9 | y > 9) {
  print("x or y is greater than 9")  # This will be executed if x or y is greater than 9
} else {
  print("x and y are less than or equal to 9")  # This will be executed if x and y are less than or equal to 9
}## [1] "x or y is greater than 9"# Assign a score of 85 to the variable score
score <- 85  
# If-else example
if (score >= 90) {
  grade <- "A"  # Assign grade A if score is 90 or above
} else if (score >= 80) {
  grade <- "B"  # Assign grade B if score is 80 or above
} else if (score >= 70) {
  grade <- "C"  # Assign grade C if score is 70 or above
} else if (score >= 60) {
  grade <- "D"  # Assign grade D if score is 60 or above
} else {
  grade <- "F"  # Assign grade F if score is below 60
}
# Show what grade is assigned to
paste("Grade assigned is", grade)## [1] "Grade assigned is B"Loops allow you to execute code repeatedly.
For loops are used for iterating over a sequence.
# Print numbers 1 to 5 using a for loop
for (i in 1:5) 
{
  print(i)  # Print the value of i in each iteration
}## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5# Calculate the sum of the first 10 natural numbers
sum <- 0  # Initialize sum to 0
for (i in 1:10) 
{
  sum <- sum + i  # Add the value of i to sum in each iteration
}
# Print the sum
print(sum)  ## [1] 55While loops continue to execute as long as the condition is true.
# Initialize i to 1
i <- 1            
# Print numbers 1 to 5 using a while loop. you will break out of the loop when i is 6
while (i <= 5) 
{
  # Print the value of i
  print(i)     
  
  # Increment i by 1
  i <- i + 1      
}## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5# Assign the value 5 to n
n <- 5           
 # Initialize factorial to 1
factorial <- 1   
# Initialize i to 1
i <- 1            
# Calculate the factorial of a number using a while loop
while (i <= n) 
{
  # Multiply factorial by i in each iteration
  factorial <- factorial * i 
  
  # Increment i by 1
  i <- i + 1      
}
# Print the factorial
print(factorial)  ## [1] 120A vector is a sequence of data elements of the same basic type. You
can create vectors using the c() function.
# Creating a numeric vector
numbers <- c(1, 2, 3, 4, 5) 
numbers## [1] 1 2 3 4 5# Creating a character vector
characters <- c("a", "b", "c")
characters## [1] "a" "b" "c"# Creating a logical vector
logicals <- c(TRUE, FALSE, TRUE)
logicals## [1]  TRUE FALSE  TRUE# Calculate the sum of the elements in the numbers vector
sum_numbers <- sum(numbers)  
# Calculate the mean of the elements in the numbers vector
mean_numbers <- mean(numbers)  
# Find the maximum element in the numbers vector
max_numbers <- max(numbers)  # Print the sum of the numbers vector
print(sum_numbers)  ## [1] 15# Print the mean of the numbers vector
print(mean_numbers) ## [1] 3# Print the maximum element in the numbers vector
print(max_numbers)  ## [1] 5data.table is a high-performance version of
data.frame. It provides fast and memory-efficient data
manipulation capabilities.
You can create and print a data.table with the following code.
# Creating a data table
dt <- data.table(
  
  # Create a column named Name with elements "John", "Jane", and "Doe"
  Name = c("John", "Jane", "Doe"),  
  
  # Create a column named Age with elements 23, 25, and 28
  Age = c(23, 25, 28),  
  
  # Create a column named Height with elements 180, 165, and 170
  Height = c(180, 165, 170)  
)
# Display the data table
dt You can add new columns to a data.table.
# Adding a new column named Weight with values 70, 55, 60
dt[, Weight := c(70, 55, 60)]  
# Adding a column named BMI based on existing columns Weight and Height
dt[, BMI := Weight / (Height / 100)^2]  
# Display the updated data table
dt Subsetting allows you to select specific rows and columns from a
data.table.
# Select rows where the Age column is greater than 24
dt_subset <- dt[Age > 24]  
# Print the subset data table 
dt_subset # Select only the Name and Height columns
dt_selected_columns <- dt[, .(Name, Height)]  
# Print the data table with selected columns 
dt_selected_columnsFiltering allows you to select rows that meet certain criteria.
# Select rows where the BMI column is greater than 20
dt_filtered <- dt[BMI > 20]  
# Display the filtered data table
dt_filtered Data analysis involves calculating summary statistics and exploring relationships between variables.
You can calculate various summary statistics using
data.table.
# Calculate the mean of the Age column
mean_age <- dt[, mean(Age)]  
# Calculate the standard deviation of the Height column
sd_height <- dt[, sd(Height)]  
# Calculate summary statistics for Age and Height
summary_stats <- dt[, .(
  
  # Calculate the mean of the Age column
  Mean_Age = mean(Age),  
  
  # Calculate the standard deviation of the Age column
  SD_Age = sd(Age),  
  
  # Calculate the mean of the Height column
  Mean_Height = mean(Height),  
  
  # Calculate the standard deviation of the Height column
  SD_Height = sd(Height)  
)]
# Display the summary statistics
print(summary_stats)  ##    Mean_Age   SD_Age Mean_Height SD_Height
##       <num>    <num>       <num>     <num>
## 1: 25.33333 2.516611    171.6667  7.637626# Use the summary function to get summary stats for all variables
summary(dt)##      Name                Age            Height          Weight     
##  Length:3           Min.   :23.00   Min.   :165.0   Min.   :55.00  
##  Class :character   1st Qu.:24.00   1st Qu.:167.5   1st Qu.:57.50  
##  Mode  :character   Median :25.00   Median :170.0   Median :60.00  
##                     Mean   :25.33   Mean   :171.7   Mean   :61.67  
##                     3rd Qu.:26.50   3rd Qu.:175.0   3rd Qu.:65.00  
##                     Max.   :28.00   Max.   :180.0   Max.   :70.00  
##       BMI       
##  Min.   :20.20  
##  1st Qu.:20.48  
##  Median :20.76  
##  Mean   :20.86  
##  3rd Qu.:21.18  
##  Max.   :21.60The wooldridge package contains data sets from Jeffrey
M. Wooldridge’s textbook “Introductory Econometrics: A Modern Approach”.
Any MindTap computer-based assignment will use one of these data sets.
Rather than downloading the .RData file, you can access any
wooldridge dataset like I do in the below code.
# Load the wooldridge package
pacman::p_load("wooldridge")  
# Load the wage1 data set from the Wooldridge package as a data.table
dt <- as.data.table(wage1)
# Display the first few rows of the data set
head(dt) # Display the last 10 rows of the data set
tail(dt, 10)# Display summary statistics of the wage1 data set
summary(dt)##       wage             educ           exper           tenure      
##  Min.   : 0.530   Min.   : 0.00   Min.   : 1.00   Min.   : 0.000  
##  1st Qu.: 3.330   1st Qu.:12.00   1st Qu.: 5.00   1st Qu.: 0.000  
##  Median : 4.650   Median :12.00   Median :13.50   Median : 2.000  
##  Mean   : 5.896   Mean   :12.56   Mean   :17.02   Mean   : 5.105  
##  3rd Qu.: 6.880   3rd Qu.:14.00   3rd Qu.:26.00   3rd Qu.: 7.000  
##  Max.   :24.980   Max.   :18.00   Max.   :51.00   Max.   :44.000  
##     nonwhite          female          married           numdep     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :1.0000   Median :1.000  
##  Mean   :0.1027   Mean   :0.4791   Mean   :0.6084   Mean   :1.044  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:2.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :6.000  
##       smsa           northcen         south             west       
##  Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.000   Median :0.0000   Median :0.0000  
##  Mean   :0.7224   Mean   :0.251   Mean   :0.3555   Mean   :0.1692  
##  3rd Qu.:1.0000   3rd Qu.:0.750   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.000   Max.   :1.0000   Max.   :1.0000  
##     construc          ndurman          trcommpu           trade       
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.0000   Median :0.00000   Median :0.0000  
##  Mean   :0.04563   Mean   :0.1141   Mean   :0.04373   Mean   :0.2871  
##  3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.00000   Max.   :1.0000  
##     services         profserv         profocc          clerocc      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1008   Mean   :0.2586   Mean   :0.3669   Mean   :0.1673  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##     servocc           lwage            expersq          tenursq       
##  Min.   :0.0000   Min.   :-0.6349   Min.   :   1.0   Min.   :   0.00  
##  1st Qu.:0.0000   1st Qu.: 1.2030   1st Qu.:  25.0   1st Qu.:   0.00  
##  Median :0.0000   Median : 1.5369   Median : 182.5   Median :   4.00  
##  Mean   :0.1407   Mean   : 1.6233   Mean   : 473.4   Mean   :  78.15  
##  3rd Qu.:0.0000   3rd Qu.: 1.9286   3rd Qu.: 676.0   3rd Qu.:  49.00  
##  Max.   :1.0000   Max.   : 3.2181   Max.   :2601.0   Max.   :1936.00# Calculate the mean and standard deviation of the wage column
mean_wage <- dt[, mean(wage)]  
sd_wage <- dt[, sd(wage)]  
# Display the mean of wage
print(mean_wage) ## [1] 5.896103# Display the standard deviation of wage
print(sd_wage) ## [1] 3.693086# Calculate the covariance between wage and education
cov_wage_educ <- dt[, cov(wage, educ)]
# Display the covariance between wage and education
cov_wage_educ  ## [1] 4.150864Use linear regression to analyze the relationship between wage and education.
# Create a linear model with wage as the dependent variable and educ, exper, and tenure as the independent variables
model <- lm(wage ~ educ + exper + tenure, data = dt)  
# Print the summary of the linear regression model
summary(model)  ## 
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6068 -1.7747 -0.6279  1.1969 14.6536 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.87273    0.72896  -3.941 9.22e-05 ***
## educ         0.59897    0.05128  11.679  < 2e-16 ***
## exper        0.02234    0.01206   1.853   0.0645 .  
## tenure       0.16927    0.02164   7.820 2.93e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared:  0.3064, Adjusted R-squared:  0.3024 
## F-statistic: 76.87 on 3 and 522 DF,  p-value: < 2.2e-16# Colors used for plot (https://www.rapidtables.com/web/color/RGB_Color.html)
color_1 <- "#00FF00"
color_2 <- "#ADD8E6"
# Create a scatter plot of wage vs education with a regression line using ggplot2
ggplot(wage1, aes(x = educ, y = wage)) +  # Initialize the plot with education on the x-axis and wage on the y-axis
  geom_point(color = color_1) +  # Add the data points to the plot
  geom_smooth(method = "lm", col = color_1) +  # Add a regression line to the plot (regression of wage on educ)
  labs(title = "Wage vs Education",  # Add a title to the plot
       x = "Education (years)",  # Label the x-axis
       y = "Wage") +  # Label the y-axis
    theme(
    panel.background = element_rect(fill = "black", color = "black"),
    plot.background = element_rect(fill = "black", color = "black"),
    panel.grid.major = element_line(color = "gray"),
    panel.grid.minor = element_line(color = "gray"),
    axis.text = element_text(color = color_1, size = 15, family = "Arial"),
    axis.title = element_text(color = color_2, size = 25, family = "Arial"),
    plot.title = element_text(hjust = 0.5, color = color_2, size = 30, family = "Arial", face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, color = color_1, size = 25)
  )