Research
- Crime Categories
- Murder Circumstances
- Charges
- Murder Numbers by SHR
- Definitions of Murder
- Crime Literature
- Other Literature
- Seminars
- Journal Ranking
- Laws
- Changes in Law and Reporting in Michigan
- Citation Guides
- Datasets
Writing
Methods
- BLP
- Econometrics Models
- Econometrics Tests
- Econometrics Resources
- Event Study Plots
- Metrics Literature
- Machine Learning
Python-related
- Python Basic Commands
- Pandas Imports and Exports
- Pandas Basic Commands
- Plotting in Python
- Python web scraping sample page
- Two Sample t Test in Python
- Modeling in Python
R-related
- R Basics
- R Statistics Basics
- RStudio Basics
- R Graphics
- R Programming
- Accessing MySQL Databases from R
Latex-related
Stata-related
SQL
Github
Linux-related
Conda-related
AWS-related
Webscraping
Interview Prep
Other
R Basics
(Updated 2-19-2025)
This article lists the basics of the R language. Please let me know if there is anything else I should include!
- General Maintenance
- Data Types
- Converting character variable into class date
- Simple Plotting
- Data Frame Manipulation
General Maintenance
If there is a new package that I don’t yet have installed on my computer, I can do:
install.packages("plotly")
To update a package, I do:
update.packages(ask=FALSE)
To load a library, I do:
library(lubridate)
pacman installs and loads packages, which is much easier than the standard R routine.
pacman::p_load(pacman, dplyr, GGally, ggplot2, ggthemes, ggvis, 
    httr, lubridate, plotly, rio, rmarkdown, shiny, stringr, tidyr)
To unload packages, type
p_unload(all)
To check current directory
getwd()
To set working directory
setwd("/path/to/my/directory")
Updating
Data Types
Check data type
str(my.data)
Convert character variable to numeric
dataset$prop_camp <- as.numeric(dataset$prop_camp)
Converting Character Variable to Factor
dataset$cookstove_assigned2 <- factor(dataset$cookstove_assigned2, 
                                      levels = c("Already users", 
                                                 "Intervention group",
                                                 "Waitlisted controls"),
                                      ordered=TRUE)
Converting character variable into class date
Say that my date variable looks like this in string format: “2023-06-13”, then I can use the library lubridate
library(lubridate)
df$Date <- ymd(df$Date)
Simple Plotting
Boxplot
plot(iris$Species, iris$Petal.Width)
The resulting figure is
Data Frame Manipulation
Read in data
Recode empty cells as missing value during data import
df_midline <- import("midline_04032021.csv", na.strings="")
Rows
Remove rows with NaN values
df_midline <- df_midline[!is.na(df_midline$prop_camp), ]
Check number of rows in data frame
nrow(dataset)
Select rows with certain conditions
df[df$unique_id=="157-00045552",]
Reorder rows by Sepal.Length in ascending order and Petal.Length in descending order
my_data %>% arrange(Sepal.Length, desc(Petal.Length))
Reorder rows by Sepal.Length in descending order. Use the function desc():
my_data %>% arrange(desc(Sepal.Length))
Find unique values:
unique(df$col)
Columns
Multiply two columns
df$c <- df$a * df$b
Change column type from character to numeric
df_midline <- transform(df_midline, 
                        employment_woman = as.numeric(employment_woman))
Change column type from integer to categorical
mydata$COR <- as.factor(mydata$COR)
Rename columns
my_data %>% 
  rename(
    sepal_length = Sepal.Length,
    sepal_width = Sepal.Width
    )
Reorder columns
df[,c(1,2,3,4)]
Drop columns
df = subset(mydata, select = -c(x,z) )
Select unique values from column
unique(df$column)
Data frames
Examine a Data Frame in R with 7 Basic Functions:
- dim(): shows the dimensions of the data frame by row and column
- str(): shows the structure of the data frame
- summary(): provides summary statistics on the columns of the data frame
- colnames(): shows the name of each column in the data frame
- head(): shows the first 6 rows of the data frame
- tail(): shows the last 6 rows of the data frame
- View(): shows a spreadsheet-like display of the entire data frame
Check for missing values in a dataframe:
sapply(airquality, function(x) sum(is.na(x)))
Combine two cross sectional data sets into a panel data set
library(dplyr)
data1 %>%
  bind_rows(data2) %>%
  arrange(ID, Yr)
Copy data from one cross sectional data variable to another one (time-fixed variables)
df <- df %>%
  group_by(unique_id) %>%
  mutate(cookstove_assigned2 
         = ifelse(n()==2, cookstove_assigned2[!is.na(cookstove_assigned2)],
                  cookstove_assigned2)) %>%
  ungroup
Model-building
x <- as.matrix(data[-12])
y <- data[, 12]
Logit panel regression
See here for a reference.
result_1 <- clogit(resp_stovetype_n ~ indep_var + strata(unique_id), data = df)
If you think some of the variations are due to overall time trends or other time series patterns (reference here), then you should add time dummies in the data. Just be aware that the log-likelihood may not converge if time dummies are added.
result_1 <- clogit(resp_stovetype_n ~ indep_var + strata(unique_id) 
    + strata(time_period), data = df)
This version and this version can handle factor variables.
Save data
write.csv(df, 'df.csv', row.names = FALSE)