Download R for Windows
.
base
.
Download R X.X.X for Windows
link (where
X.X.X
is the latest version number).
.exe
file and follow the installation
instructions.
Download R for macOS
.
.pkg
file link for the latest version
compatible with your macOS version.
.pkg
file and follow the installation
instructions.
Download R for Linux
.
RStudio Desktop
.
Then,Download RStudio for Windows
button.
.exe
file and follow the installation
instructions.
Download RStudio for macOS
button.
.dmg
file, drag RStudio to your
Applications folder, and open it.
Download RStudio for Linux
button.
After installing RStudio, open the application. It comes default using the bright white template. I don’t want you guys to go blind. So, follows these steps if you wish to have a cooler template:
Tools
at the top of your screen.Global Options...
Appearance
From here you can tinker with the settings to see what you like. I
use the Modern RStudio theme, set zoom at 90%, set editor font to
Cascadia Code, leave text rendering at default, have the editor and help
panel font sizes at 9, and I use a custom theme I made which I have
under the Content
>R and RStudio
section in
D2L. You can use any theme you wish (Cobalt is a good one), but to use
my custom theme, follow these instructions:
Content
>R and RStudio
in
D2L.My_RStudio_Theme.rstheme
.Appearance
window in RStudio by following
the steps above, click Add
.My_RStudio_Theme.rstheme
file and select
it.Apply
and then
OK
.Furthermore, you can position your console, environment, etc.
differently in RStudio by selecting Pane Layout
on the left
once in the Global Options...
window.
.R
ScriptsYou will write your code in a .R
script. To create a
.R
script, follow these instructions:
File
in the top-left corner.New File
from the dropdown menu.R Script
from the submenu.You should now see a blank script editor where you can write your R
code. Save your script with a .R
extension by clicking
File
> Save As...
and giving your file a
name ending with .R
. You can also save your .R
file by pressing Ctrl + S
(Windows/Linux) or
Cmd + S
(Mac).
In this blank .R
file, you will write your code. You can
execute code line by line by following these instructions:
Ctrl + Enter
(Windows/Linux) or
Cmd + Enter
(Mac) to execute the current line.Alternatively, you can highlight a block of code and press
Ctrl + Enter
(Windows/Linux) or Cmd + Enter
(Mac) to run the selected code. You can also run a line or block of code
by navigating to the run
button on the top right of your
.R
script pane.
To run the entire .R
script, follow these steps:
.R
script in RStudio.Source
button at the top of the script
editor. This will run the entire script. Alternatively, you can use the
shortcut Ctrl + Shift + S
(Windows/Linux) or
Cmd + Shift + S
(Mac) to source the entire script.I always start my .R
script with the following lines of
code:
# Clear environment, plot pane, and console
rm(list = ls())
graphics.off()
cat("\014")
The code clears all objects from your environment, removes any existing plots from your plot pane, and clears out your console. Essentially, you start working with a clean slate.
The R console is the place the code you write in your .R
script as well as its results will be displayed. You can also write code
directly into the R console. This is good if you want to quickly run a
line of code without actually putting it in your .R
script.
To write code in the R console, follow these instructions:
Console
within
RStudio.>
symbol.Enter
.Upon opening RStudio, you can navigate to the Help
tab
to get assistance. Once you find and click on the Help
tab,
search print
. This will generate information on the
print()
function in R. You can do this for all
functions.
More importantly, I highly recommended using ChatGPT when you write code. I always have ChatGPT open when I am coding. It makes your coding experience much more efficient, almost always giving you the correct or close to the correct answer on any coding question you have. Click ChatGPT to get the answers to your coding questions.
R has a rich ecosystem of packages that extend its functionality. Packages in R are collections of functions, data, and compiled code that are bundled together for easy distribution and use. They provide additional capabilities beyond the base R installation, such as advanced statistical techniques, data manipulation tools, and graphical capabilities. To use functions from packages, follow these steps:
To install packages, you can use the install.packages()
function which comes included with R. To install and manage packages
more easily, you can install the package pacman
with the
following code:
# If pacman is not already installed, then install the pacman package
if(!require(pacman)) install.packages("pacman")
After running the code, if you are prompted to select a
CRAN Mirror
, please scroll down and choose any option from
the U.S.
pacman
provides a suite of functions for installing,
loading, and managing R packages. Via the function p_load()
contained in the pacman
package, any packages passed into
the p_load()
function will be automatically installed and
loaded into your current R environment.
# Use the pacman function p_load to load in packages needed for this document
pacman::p_load(ggplot2, data.table)
The double colon in pacman::
allows us to use functions
without loading the pacman
package first. All packages can
do this, but you typically want to load in packages by using
p_load
and then use their functions like normal. You can
load packages one at time via the require()
function, but I
encourage you to use pacman::p_load()
and then pass all the
packages you will be using in your .R
script. Here, I
passed the ggplot2
and data.table
packages
since I plan on using them in this tutorial. data.table
is
great package used for efficient data manipulation while
ggplot2
has extremely powerful plotting capabilities.
The working directory is the folder where R reads and saves files by
default. You can set the working directory using the
setwd()
function.
# Setting the working directory
setwd("C:/Users/wbras/OneDrive/Desktop/UA/Fall_2024/ECON_418-518/ECON_418-518_R_Tutorial")
# Get the working directory
getwd()
## [1] "C:/Users/wbras/OneDrive/Desktop/UA/Fall_2024/ECON_418-518/ECON_418-518_R_Tutorial"
Upon setting the working directory, you can read in data from a
.csv
or .xlsx
file by running the next code as
long as those files are contained in the current working directory.
# Read in .csv file from working directory
csv_data <- read.csv("file.csv")
# Read in .xlsx file from working directory
pacman::p_load(readxl)
xlsx_data <- readxl("file.xlsx")
If you have a very large dataset as a .csv
file, you can
use data.table
’s fread()
and
fwrite()
functions to read data from your working directory
into R
and write data from R
to your working
directory, respectively, much faster than base R
’s
read.csv()
and write.csv()
.
# Read in .csv file from working directory fast
dt <- fread("file.csv")
# Write datatable to working directory fast
xlsx_data <- frwrite(dt)
In R, you can assign values to variables using the <-
operator. R supports various data types, including numeric, character,
and logical. The name to the left of the <-
operator is
a variable that stores the value(s) stored to the right of the
<-
operator. When you name variables, you typically want
to separate words with underscores _
or use camelcase. An
example of camelcase is myNewVariable
.
# x is assigned the value 10
x <- 10
# Show the value of x
paste("x is", x)
## [1] "x is 10"
# y is assigned the value 5.5
y <- 5.5
# Show the value of y
paste("y is", y)
## [1] "y is 5.5"
# Show the value of x + y
paste("x + y is", x + y)
## [1] "x + y is 15.5"
# Show the value of x * y
paste("x * y is", x * y)
## [1] "x * y is 55"
# Show the value of x / y rounded to two decimal places
paste("x / y is", round(x / y, 2))
## [1] "x / y is 1.82"
# Show the value of ln(x) rounded to two decimal places
paste("ln(x) is", round(log(x), 2))
## [1] "ln(x) is 2.3"
# Reassign x to 5 and show its value
x <- 5
paste("x is now", x)
## [1] "x is now 5"
# Assigning a character value to a variable
z <- "Hello, World!"
# Show what z holds
paste("Will says", z)
## [1] "Will says Hello, World!"
# Assigning a logical value to a variable
flag <- TRUE
# Show what is assigned to flag
paste("flag is", flag)
## [1] "flag is TRUE"
If statements are used for decision making. They allow you to execute certain sections of code based on conditions.
# Assign x and y to numeric values
x <- 10
y <- 8
# Basic if-else statement
if (x > y) {
print("x is greater than y") # This will be executed if x is greater than y
} else {
print("x is not greater than y") # This will be executed if x is less than or equal to y
}
## [1] "x is greater than y"
# Use the & (and) operator to check if both conditions hold
if (x > 9 & y > 9) {
print("x and y are greater than 9") # This will be executed if x and y are greater than 9
} else {
print("x or y is less than or equal to 9") # This will be executed if x or y is less than or equal to 9
}
## [1] "x or y is less than or equal to 9"
# Use the | (or) operator to check if at least one condition holds
if (x > 9 | y > 9) {
print("x or y is greater than 9") # This will be executed if x or y is greater than 9
} else {
print("x and y are less than or equal to 9") # This will be executed if x and y are less than or equal to 9
}
## [1] "x or y is greater than 9"
# Assign a score of 85 to the variable score
score <- 85
# If-else example
if (score >= 90) {
grade <- "A" # Assign grade A if score is 90 or above
} else if (score >= 80) {
grade <- "B" # Assign grade B if score is 80 or above
} else if (score >= 70) {
grade <- "C" # Assign grade C if score is 70 or above
} else if (score >= 60) {
grade <- "D" # Assign grade D if score is 60 or above
} else {
grade <- "F" # Assign grade F if score is below 60
}
# Show what grade is assigned to
paste("Grade assigned is", grade)
## [1] "Grade assigned is B"
Loops allow you to execute code repeatedly.
For loops are used for iterating over a sequence.
# Print numbers 1 to 5 using a for loop
for (i in 1:5)
{
print(i) # Print the value of i in each iteration
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
# Calculate the sum of the first 10 natural numbers
sum <- 0 # Initialize sum to 0
for (i in 1:10)
{
sum <- sum + i # Add the value of i to sum in each iteration
}
# Print the sum
print(sum)
## [1] 55
While loops continue to execute as long as the condition is true.
# Initialize i to 1
i <- 1
# Print numbers 1 to 5 using a while loop. you will break out of the loop when i is 6
while (i <= 5)
{
# Print the value of i
print(i)
# Increment i by 1
i <- i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
# Assign the value 5 to n
n <- 5
# Initialize factorial to 1
factorial <- 1
# Initialize i to 1
i <- 1
# Calculate the factorial of a number using a while loop
while (i <= n)
{
# Multiply factorial by i in each iteration
factorial <- factorial * i
# Increment i by 1
i <- i + 1
}
# Print the factorial
print(factorial)
## [1] 120
A vector is a sequence of data elements of the same basic type. You
can create vectors using the c()
function.
# Creating a numeric vector
numbers <- c(1, 2, 3, 4, 5)
numbers
## [1] 1 2 3 4 5
# Creating a character vector
characters <- c("a", "b", "c")
characters
## [1] "a" "b" "c"
# Creating a logical vector
logicals <- c(TRUE, FALSE, TRUE)
logicals
## [1] TRUE FALSE TRUE
# Calculate the sum of the elements in the numbers vector
sum_numbers <- sum(numbers)
# Calculate the mean of the elements in the numbers vector
mean_numbers <- mean(numbers)
# Find the maximum element in the numbers vector
max_numbers <- max(numbers)
# Print the sum of the numbers vector
print(sum_numbers)
## [1] 15
# Print the mean of the numbers vector
print(mean_numbers)
## [1] 3
# Print the maximum element in the numbers vector
print(max_numbers)
## [1] 5
data.table
is a high-performance version of
data.frame
. It provides fast and memory-efficient data
manipulation capabilities.
You can create and print a data.table with the following code.
# Creating a data table
dt <- data.table(
# Create a column named Name with elements "John", "Jane", and "Doe"
Name = c("John", "Jane", "Doe"),
# Create a column named Age with elements 23, 25, and 28
Age = c(23, 25, 28),
# Create a column named Height with elements 180, 165, and 170
Height = c(180, 165, 170)
)
# Display the data table
dt
You can add new columns to a data.table
.
# Adding a new column named Weight with values 70, 55, 60
dt[, Weight := c(70, 55, 60)]
# Adding a column named BMI based on existing columns Weight and Height
dt[, BMI := Weight / (Height / 100)^2]
# Display the updated data table
dt
Subsetting allows you to select specific rows and columns from a
data.table
.
# Select rows where the Age column is greater than 24
dt_subset <- dt[Age > 24]
# Print the subset data table
dt_subset
# Select only the Name and Height columns
dt_selected_columns <- dt[, .(Name, Height)]
# Print the data table with selected columns
dt_selected_columns
Filtering allows you to select rows that meet certain criteria.
# Select rows where the BMI column is greater than 20
dt_filtered <- dt[BMI > 20]
# Display the filtered data table
dt_filtered
Data analysis involves calculating summary statistics and exploring relationships between variables.
You can calculate various summary statistics using
data.table
.
# Calculate the mean of the Age column
mean_age <- dt[, mean(Age)]
# Calculate the standard deviation of the Height column
sd_height <- dt[, sd(Height)]
# Calculate summary statistics for Age and Height
summary_stats <- dt[, .(
# Calculate the mean of the Age column
Mean_Age = mean(Age),
# Calculate the standard deviation of the Age column
SD_Age = sd(Age),
# Calculate the mean of the Height column
Mean_Height = mean(Height),
# Calculate the standard deviation of the Height column
SD_Height = sd(Height)
)]
# Display the summary statistics
print(summary_stats)
## Mean_Age SD_Age Mean_Height SD_Height
## <num> <num> <num> <num>
## 1: 25.33333 2.516611 171.6667 7.637626
# Use the summary function to get summary stats for all variables
summary(dt)
## Name Age Height Weight
## Length:3 Min. :23.00 Min. :165.0 Min. :55.00
## Class :character 1st Qu.:24.00 1st Qu.:167.5 1st Qu.:57.50
## Mode :character Median :25.00 Median :170.0 Median :60.00
## Mean :25.33 Mean :171.7 Mean :61.67
## 3rd Qu.:26.50 3rd Qu.:175.0 3rd Qu.:65.00
## Max. :28.00 Max. :180.0 Max. :70.00
## BMI
## Min. :20.20
## 1st Qu.:20.48
## Median :20.76
## Mean :20.86
## 3rd Qu.:21.18
## Max. :21.60
The wooldridge
package contains data sets from Jeffrey
M. Wooldridge’s textbook “Introductory Econometrics: A Modern Approach”.
Any MindTap computer-based assignment will use one of these data sets.
Rather than downloading the .RData file, you can access any
wooldridge
dataset like I do in the below code.
# Load the wooldridge package
pacman::p_load("wooldridge")
# Load the wage1 data set from the Wooldridge package as a data.table
dt <- as.data.table(wage1)
# Display the first few rows of the data set
head(dt)
# Display the last 10 rows of the data set
tail(dt, 10)
# Display summary statistics of the wage1 data set
summary(dt)
## wage educ exper tenure
## Min. : 0.530 Min. : 0.00 Min. : 1.00 Min. : 0.000
## 1st Qu.: 3.330 1st Qu.:12.00 1st Qu.: 5.00 1st Qu.: 0.000
## Median : 4.650 Median :12.00 Median :13.50 Median : 2.000
## Mean : 5.896 Mean :12.56 Mean :17.02 Mean : 5.105
## 3rd Qu.: 6.880 3rd Qu.:14.00 3rd Qu.:26.00 3rd Qu.: 7.000
## Max. :24.980 Max. :18.00 Max. :51.00 Max. :44.000
## nonwhite female married numdep
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :1.0000 Median :1.000
## Mean :0.1027 Mean :0.4791 Mean :0.6084 Mean :1.044
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:2.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :6.000
## smsa northcen south west
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.000 Median :0.0000 Median :0.0000
## Mean :0.7224 Mean :0.251 Mean :0.3555 Mean :0.1692
## 3rd Qu.:1.0000 3rd Qu.:0.750 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000
## construc ndurman trcommpu trade
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.04563 Mean :0.1141 Mean :0.04373 Mean :0.2871
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## services profserv profocc clerocc
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1008 Mean :0.2586 Mean :0.3669 Mean :0.1673
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## servocc lwage expersq tenursq
## Min. :0.0000 Min. :-0.6349 Min. : 1.0 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.: 1.2030 1st Qu.: 25.0 1st Qu.: 0.00
## Median :0.0000 Median : 1.5369 Median : 182.5 Median : 4.00
## Mean :0.1407 Mean : 1.6233 Mean : 473.4 Mean : 78.15
## 3rd Qu.:0.0000 3rd Qu.: 1.9286 3rd Qu.: 676.0 3rd Qu.: 49.00
## Max. :1.0000 Max. : 3.2181 Max. :2601.0 Max. :1936.00
# Calculate the mean and standard deviation of the wage column
mean_wage <- dt[, mean(wage)]
sd_wage <- dt[, sd(wage)]
# Display the mean of wage
print(mean_wage)
## [1] 5.896103
# Display the standard deviation of wage
print(sd_wage)
## [1] 3.693086
# Calculate the covariance between wage and education
cov_wage_educ <- dt[, cov(wage, educ)]
# Display the covariance between wage and education
cov_wage_educ
## [1] 4.150864
Use linear regression to analyze the relationship between wage and education.
# Create a linear model with wage as the dependent variable and educ, exper, and tenure as the independent variables
model <- lm(wage ~ educ + exper + tenure, data = dt)
# Print the summary of the linear regression model
summary(model)
##
## Call:
## lm(formula = wage ~ educ + exper + tenure, data = dt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6068 -1.7747 -0.6279 1.1969 14.6536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.87273 0.72896 -3.941 9.22e-05 ***
## educ 0.59897 0.05128 11.679 < 2e-16 ***
## exper 0.02234 0.01206 1.853 0.0645 .
## tenure 0.16927 0.02164 7.820 2.93e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.084 on 522 degrees of freedom
## Multiple R-squared: 0.3064, Adjusted R-squared: 0.3024
## F-statistic: 76.87 on 3 and 522 DF, p-value: < 2.2e-16
# Colors used for plot (https://www.rapidtables.com/web/color/RGB_Color.html)
color_1 <- "#00FF00"
color_2 <- "#ADD8E6"
# Create a scatter plot of wage vs education with a regression line using ggplot2
ggplot(wage1, aes(x = educ, y = wage)) + # Initialize the plot with education on the x-axis and wage on the y-axis
geom_point(color = color_1) + # Add the data points to the plot
geom_smooth(method = "lm", col = color_1) + # Add a regression line to the plot (regression of wage on educ)
labs(title = "Wage vs Education", # Add a title to the plot
x = "Education (years)", # Label the x-axis
y = "Wage") + # Label the y-axis
theme(
panel.background = element_rect(fill = "black", color = "black"),
plot.background = element_rect(fill = "black", color = "black"),
panel.grid.major = element_line(color = "gray"),
panel.grid.minor = element_line(color = "gray"),
axis.text = element_text(color = color_1, size = 15, family = "Arial"),
axis.title = element_text(color = color_2, size = 25, family = "Arial"),
plot.title = element_text(hjust = 0.5, color = color_2, size = 30, family = "Arial", face = "bold"),
plot.subtitle = element_text(hjust = 0.5, color = color_1, size = 25)
)