The Federal Reserve Bank of St. Louis hosts one of the most expansive depositories of economic data in the United States (US). The Federal Reserve Economic Data, better known as FRED, includes “hundreds of thousands of economic data timeseries from scores of national, international, public and private sources”.
There are multiple ways to access and download these data. Users can visit FRED’s website, search for one of the datasets, and download it in different formats. It allows users to download graphs in different formats as well. The Federal Reserve Bank of St. Louis has developed multiple video tutorials to help users learn how to use the FRED’s many features.
In this post, I will use R‘s fredr package, developed by Sam Boysel, to access and download the FRED’s data. Once downloaded, we can analyze and plot the data in R. Is this technique easier than using the FRED’s data portal or using a spreadsheet application to analyze and graph these data? It depends on users’ level of comfort with R and the tidyverse’s many packages for data manipulation and graphing. I think that learning how to use R to access these data is less time-consuming and the research will be easier to reproduce and replicate.
Before using the fredr package, we first need to open an account with FRED and access an API Key. Opening an account is straightforward. Once an account has been established, we can then go into the website’s “My Account” tab and click on the “API Keys” option and complete the form. It is a simple process.
Working with the fredr Package:
If you have not downloaded and installed R and RStudio on your computer, follow these instructions. If you have done so already, open RStudio, create a new project, and if you have not done so already install the tidyverse, fredr, and the scale packages. The last of these packages work with ggplot2 (which is part of the tidyverse) to transform numbers in scientific notations into other values or formats.
install.packages ("fredr") install.packages ("tidyverse") install.packages ("scales")
Once the installations are complete, you will then load the libraries.
library (fredr) library (tidyverse) library (scales)
To access the FRED’s databases, we first have to load the API key. This alphanumeric key should not be shared with other people.
One of the challenges of working with the FRED’s database is finding the correct time-series data. Luckily, Boysel’s package includes a handy search function that produces a dataframe with the ID as well as other information we will need to analyze and graph the data.
For example, let’s assume I am writing a paper on Puerto Rico and I want to know how many people are working in the island’s manufacturing sector. So the first step is to use the following search function.
pr1 <- fredr_series_search_text("Puerto Rico and manufacturing")
Note that we could have only added one search term to this function.
To see the results, we use the view function, which will open a new window in RStudio with all the results. At the top of this window, you can find a small magnifying glass icon, which allows us to search for keywords in this spreadsheet. So we could add the word “employ” to highlight all the time-series datasets that are related to employment issues.
The Series ID for what I am looking for is “SMS72000003100000001”. As noted above, the columns include some important information, you will need for your analysis. For example, the unit value is “thousands of persons”. The data is collected on a monthly basis, starting in 1990 and ending in 2022.
Once you find the correct dataset, copy the ID and paste it into the following function.
mfg <- fredr_series_observations(series_id = "SMS72000003000000001") head (mfg)
The head function shows us the first six lines of our dataframe
date series_id value realtime_start realtime_end <date> <chr> <dbl> <date> <date> 1 1990-01-01 SMS72000003000000001 159.5 2022-07-23 2022-07-23 2 1990-02-01 SMS72000003000000001 161.5 2022-07-23 2022-07-23 3 1990-03-01 SMS72000003000000001 160.9 2022-07-23 2022-07-23 4 1990-04-01 SMS72000003000000001 161.1 2022-07-23 2022-07-23 5 1990-05-01 SMS72000003000000001 159.6 2022-07-23 2022-07-23 6 1990-06-01 SMS72000003000000001 158.5 2022-07-23 2022-07-23
At this point, we need to analyze the structure of the data. The first column is the date and “<date>” signifies that the data has been formatted as a date. The third column is the value, which represents the number of people employed in the manufacturing sector.
We can also observe that the data is structured using the following tidy data principles:
Each variable forms a column.Hadley Wickham (2014)
Each observation forms a row.
Each type of observational unit forms a table.
Given the structure of our dataframe, we do not need to restructure the data. Let’s use ggplot2 to generate a quick line graph.
ggplot (data = mfg, aes(x=date, y=value))+ geom_line (color="red")+ theme_bw()
This line graph is a good start. But with a few lines of code, we can add a few new layers to improve its functionality and look. To understand ggplot2’s “grammar of graphics”, read this tutorial. Otherwise, here is the new code and the plot.
ggplot (mfg, aes(x=date, y=value))+ geom_line (color="red")+ theme_light()+ labs (title="Number of People Working in Puerto Rico's Manufacturing Sector (1990-2022)", y="Thousands of Persons", x= "", caption = "FRED ID: SMS72000003000000001")+ theme(text= element_text(family="serif"), plot.title = element_text(size=12, face="bold"), axis.title = element_text(size=8, face="italic"), plot.caption = element_text(size=6, face="italic"), plot.title.position = "plot")
This graph shows that the number of people employed in this sector has decreased dramatically in the last 22 years. But it also shows that the sector seems to be hiring more people after 2020.
Plotting Multiple Lines in One Graph
Once we understand how to use this package, it is very easy to access several of FRED’s time-series datasets and plot different lines in a graph.
For this example, I want to compare the US unemployment rate to United Kingdom’s unemployment rate. As noted above, we first search for all the datasets that include the word “unemployment” in their title.
unemp <- fredr_series_search_text("unemployment") view(unemp)
In the view function, we can see that the ID for the US unemployment rate is “UNRATE” and using the search function in the window we can see there are two options for the United Kingdom’s rate of unemployment. I will be using “AURUKM” because like “UNRATE” the data is collected on a monthly basis and they are both “seasonally adjusted”.
Let’s get the data for the US first.
us_unemp<- fredr_series_observations( series_id = "UNRATE", observation_start = as.Date("1990-01-01"), observation_end = as.Date("2016-12-01") )
It is worth noting that UNRATE goes back to 1948 and that AURUKM starts in 1855! I will delimit my search from January 1990 to December 2016, as the dataset for the United Kingdom stops here. And this function includes several commands that will allow us to easily subset the data.
Let’s get the data for the United Kingdom.
uk_unemp <- fredr_series_observations( series_id = "AURUKM", observation_start = as.Date("1990-01-01"), observation_end = as.Date("2016-12-01") )
In RStudio, to check that function is downloading the correct data, you can use the view function or the head function, as we used above.
Before plotting these lines, we need to merge both dataframes using the rbind function after we add a new column to both dataframes. Note that these columns will have the same heading: “country”.
us_unemp$country <- "United States" uk_unemp$country <- "United Kingdom" df_all <- rbind (us_unemp, uk_unemp)
It is worth noting that we can use the rbind function because both dataframes had the same column numbers and each column had the same name or heading. If not, combining both dataframes would have required more wrangling.
Let’s use ggplot2 to plot these countries’ unemployment rates.
ggplot (data=df_all, aes (x=date, y=value, color = country))+ geom_line(size=0.75)+ theme_light()+ labs (title="Comparing the United Kingdom's and the United States' Unemployment Rate (1990-2016)", y="Percent", x= "", caption = "FRED IDs: UNRATE & AURUKM")+ theme(text= element_text(family="serif"), plot.title = element_text(size=12, face="bold"), axis.title = element_text(size=8, face="italic"), plot.caption = element_text(size=6, face="italic"), plot.title.position = "plot")
Running a Simple Linear Regression:
A few weeks ago, many news outlets were reporting US consumers’ growing pessimism regarding the economy. These news stories cited the University of Michigan Consumer Sentiment Index’s weakening scores, which hit the lowest level since 2013 in June 2022. Consumers’ opinions of the economy’s future seem to be shaped by rising inflation.
The FRED is a depository for this index. Using the fredr package, let’s access these sentiment scores from January 1990 to May 2022, which is the last entry the FRED has in its collection. Let’s also get US unemployment data for this time period and see whether there is any relationship between consumer sentiments and the unemployment rate. In other words, we will run a simple regression between these two variables. This analysis recreates Sam Shum’s tutorial on working with FRED data.
The first step will be to download the data into R and then restructure it. The ID for the Consumer Sentiment Index is “UMSCENT”.
y <- fredr_series_observations(series_id = "UMCSENT", observation_start = as.Date("1990-01-01"), observation_end = as.Date("2022-05-01")) x <- fredr_series_observations(series_id = "UNRATE", observation_start = as.Date("1990-01-01"), observation_end = as.Date("2022-05-01")) df_reg <- cbind(x,y)
As noted above, we can use the cbind function because the x and y dataframes are exactly the same length. If these dataframes were different in terms of the number of observations each had we would not be able to join them.
Merging these two dataframes has created one big problem. The new dataframe (e.g., df_reg) has two columns named “date” and another two columns named “value”. We need to change these column names and give them unique names. R will not let us run operations on columns with the same name.
There are many, many ways to change the column names, but I think this is the most straightforward. Note the function colnames requires we first call the dataframe and then we specify the column number. To get the column number, use the view function to study df_reg more closely.
Once I renamed the columns, I used dplyr’s select function to include in df_reg the variables that I need for the regression.
colnames(df_reg) <- "Date" colnames(df_reg) <- "umscent" colnames(df_reg) <- "unrate" df_reg <- df_reg %>% select(date, umscent, unrate)
To run a regression, we use the lm function. “lm” stands for “linear model”.
model <- lm(formula = umscent ~ unrate, data=df_reg) summary (model)
The summary function allows us to see the regression’s output.
lm(formula = umscent ~ unrate, data = df_reg) Residuals: Min 1Q Median 3Q Max -38.945 -3.304 1.037 5.571 26.332 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 114.1704 1.7266 66.12 <2e-16 *** unrate -4.6736 0.2823 -16.55 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.582 on 387 degrees of freedom Multiple R-squared: 0.4146, Adjusted R-squared: 0.4131 F-statistic: 274.1 on 1 and 387 DF, p-value: < 2.2e-16
To better understand the relationship between these two variables, we can use ggplot2 to plot the results of our simple regression.
ggplot (data = df_reg, aes(x=unrate, y=umscent))+ geom_point ()+ theme_light()+ labs (title="The US Unemployment Rate Versus the Consumer Sentiment Index", y="University of Michigan Consumer Sentiment Index", x= "US Unemployment Rate (%)", caption = "FRED IDs: UNRATE & UMSCENT")+ theme(text= element_text(family="serif"), plot.title = element_text(size=12, face="bold"), axis.title = element_text(size=8, face="italic"), plot.caption = element_text(size=6, face="italic"), plot.title.position = "plot")+ geom_smooth(method="lm")
So this regression and graph show that as consumers’ confidence erodes, the unemployment rate tends to rise.
The Federal Reserve Bank of St. Louis’ FRED is probably one of the best repositories of US economics data. Its collection also includes data from other countries. This is a vital resource for students and scholars. While users can access and download these data through the FRED’s data portal and analyze the data using spreadsheet applications or more specialized statistical software packages, in this post, I explain why we should use Boysel’s fredr package for R to download, analyze and graph these data.
Using this package is not too difficult. The more challenging part of this workflow is the restructuring of the data into a tidy format. But once we understand how to do this, we can repurpose our code to execute other analyzes using the FRED’s data, saving us some time, while also helping us produce research that is reproducible and replicable.
fredr is not the only R package that can access FRED data. The tidyquant package, developed by Matt Dancho, will also allow you to use your API Key to download FRED data. I will make a tutorial using this package in the future.
Want to learn more about the fredr package?
See the video produced by Linnar Felkl in early 2020. He also put together a blog post that I found useful.
Tyler Ransom, an economics professor at the University of Oklahoma, has shared via his GitHub page some R code that allows users to download several datasets for all the 50 US states at the same time. It really is a very nice shortcut.
About the author:
Carlos L. Yordán is an Associate Professor of International Relations at Drew University. He is also the director of the Semester on the United Nations.