The world’s leading publication for data science, AI, and ML professionals.

Analyzing Recent Unemployment Data

Using Some Charts And Statistics To Better Understand What's Going On In The Job Market Right Now

Photo by The New York Public Library on Unsplash
Photo by The New York Public Library on Unsplash

If you’ve been following Economics news at all lately, then you’ve probably seen this chart on initial unemployment claims:

Initial unemployment claims (Source: Federal Reserve Bank of St. Louis)
Initial unemployment claims (Source: Federal Reserve Bank of St. Louis)

That’s a massive increase that makes the 2008 Great Financial Crisis (when the financial system was nearly ended) look like a molehill. I can imagine 3 different reactions to seeing this chart:

  1. Oh snap, record high by a mile! This must be the Great Depression 2.0.
  2. This is all temporary; once the pandemic curve flattens, everything will be just fine.
  3. What are initial unemployment claims again?

Let’s answer the last question first. Initial unemployment claims (also called new jobless claims sometimes) is the total number of people in a given week that filed for unemployment benefits for the first time. So as you can imagine, it’s highly correlated to layoffs (if there are lots of layoffs and/or firings, then many of those that recently lost their jobs will file an initial claim for unemployment benefits).

The right answer probably lies somewhere in between reactions 1 and 2. Yes, the initial unemployment claims are shockingly high. But, keep in mind the following points:

  • Initial claims are notoriously volatile (it usually helps to look at the rolling 4-week average instead of just the most recent number). Also, it’s helpful to look at continuing claims (people who remain on unemployment benefits).
  • Initial claims are not a net number – meaning that it represents the number of new people that file for unemployment, but it does not net out the critical other side of the equation: the number of people that found jobs and rolled off of unemployment benefits. For example, if initial claims increased by 800,000 last week and 200,000 this week, that does not mean that there are 1,000,000 new unemployed people in the U.S. The 200,000 increase this week tells us nothing about what happened to the 800,000 people who filed last week (some might have found jobs, left the labor force, etc.)
  • The current crisis is uniquely sharp and acute – economies in many major cities went from functioning normally to shut down in a matter of days. There was none of the "let’s wait and see if we’re in a recession or not before we start to lay off people" attitude that companies typically have. Rather, from the day shutdown started, it was clear that the economy would take a significant hit. Thus, this downturn is also unique from a data perspective. Usually there is is a shock to the system (mortgage defaults, rocketing oil prices, etc.), and then everyone waits to see how that shock gradually reverberates to the job market. This time everything is in reverse: the pandemic forced most everyone to stop working, and now we wait to see how all the reduced hours, furloughs, and layoffs will go on to damage global economic growth.
  • The CARES act (the U.S. government’s recently passed stimulus package) opened the door for gig economy workers, freelancers, and the otherwise self-employed to access unemployment benefits. Previously, they were not eligible. So the base of people eligible for benefits has been temporarily increased by a large amount.
  • Furloughed workers who experience a drastic decrease in pay are also eligible for unemployment benefits under the CARES act. This also increases the base of people eligible compared to past recessions. Also, these people are still technically employed and can resume working (and getting paid) pretty seamlessly once lockdown concludes.

So while the initial claims data portends bad news for the unemployment rate and the economy, it’s probably not quite as bad as it looks. Let’s use some Statistics to try to figure out how the recent data impacts the unemployment rate (which is an incredibly important driver of economic outcomes).

And if you’re wondering: "If they already publish the employment rate, why do we need to model it?" Good question, it’s because the unemployment rate data is release monthly and is generally slower moving than the unemployment claims data. So we want to see whether using the claims data (which are published every week on Thursday) can give us an advanced reading of where the unemployment rate is likely to be at when it’s finally released.

Unemployment Claims Vs. The Unemployment Rate

Let’s start by eyeballing the relationship between unemployment claims and the unemployment rate. There are two types of claims data: initial and continuing. The former is like I explained above restricted to new filings, in other words people that were laid off very recently. The latter, continuing, counts the number of people that remain on unemployment benefits past the first week. Of the two, initial claims is more timely while continuing claims is less volatile and more connected to the unemployment rate. You can think of it like initial claims happen first, and then some portion of them flow into continuing claims, which forms the base (but not all) of the unemployed population.

Why Do We Care So Much About The Unemployment Rate?

Consumer spending is approximately 70% of GDP. It’s the engine that drives every other part of the U.S. economy. But we can only spend if we are employed. Without jobs, spending goes down causing declines in corporate profits (less demand for goods and services), taxes (putting pressure on government budgets), and asset prices (fewer buyers of stocks and houses). It also creates a negative feedback loop where increases in the amount of people that are unemployed weaken the economy, which in turn causes even more unemployment.

I’ve plotted the unemployment rate below. As expected, it increases during recessions (gray shading). Prior to the pandemic, the unemployment rate was near all time lows before increasing sharply in March.

The unemployment rate (Source: Federal Reserve Bank of St. Louis)
The unemployment rate (Source: Federal Reserve Bank of St. Louis)

Some Plots And Figures

Let’s take a look at how the two types of claims relate to the unemployment rate. First let’s plot a scatter plot of initial claims vs. the unemployment rate. I exclude the recent spike in initial claims so that we can see what the relationship usually looks like:

Initial claims vs. the unemployment rate (without the spike)
Initial claims vs. the unemployment rate (without the spike)

There is an approximately linear relationship. Now let’s add the recent spike. Lending credibility to the argument that what we are seeing is once in a lifetime, adding in the last 2 weeks’ data points (the orange dots) breaks our scatter plot.

Initial claims vs. the unemployment rate (with spike)
Initial claims vs. the unemployment rate (with spike)

Even if we tried to use the blue dots (and the best fit line between initial claims and the unemployment rate that they imply) to infer a predicted unemployment rate, the result wouldn’t make sense. It would be greater than 100%, which is not possible.

So how do we reconcile the recent data with the historical (pre-spike) statistical relationship? My guess is that the "real time true" unemployment rate is currently very high, perhaps even higher than during the depths of the 2008 recession. But, it’s not as astronomically high as the initial claims data would have us believe because of the aforementioned impact of the CARES Act (which opened unemployment benefits to a much broader set of people including folks that are merely furloughed), the inherent volatility of the initial claims data, and the potential for employment to snap-back (if the lockdown ended just like that next week, then a significant portion of this week’s initial claimants of unemployment benefits would return to work).

It’s important to keep in mind that not all unemployment is created equal. Unemployment resulting from a temporary but acute reduction in demand (like right now) is less bad than unemployment resulting from the permanent bankruptcy of companies or entire industries (those jobs are more or less gone forever). When we try to take our measure of unemployment, we want to capture the latter (the long term type) more than the former. And I would guess that as of right now, a big portion of the initial claims data is the highly temporary type. Of course, if the lockdown persists and more businesses fail because of it, then what currently looks like temporary unemployment will increasingly become long term or even permanent.

Going back to the data, I would argue that continuing unemployment claims present a truer picture of actual unemployment than initial claims. It’s a count of people who are on unemployment benefits that applied at least 2 weeks prior. That means the data is a bit lagged (by 1 week relative to initial claims) – not so good normally as we generally prefer timely data. But it’s great for our purposes here as we are skeptical about both the initial claims numbers’ volatility and how much they will actually flow into the unemployment rate. Let’s plot the scatter. As before the pre-spike data is in blue and the most recent 2 weeks’ data is in orange.

Continuing claims vs. the unemployment rate (with spike)
Continuing claims vs. the unemployment rate (with spike)

The orange dots are still outliers, but significantly less so than with initial claims. That makes sense as continuing claims is kind of like a lagged and somewhat smoothed version of initial claims. We can work with this.

Interestingly, this plot actually hints that there may be more than one trend line in the data (each with its own slope). In other words, some other variable out there (probably inflation or interest rates) defines the regime that we’re in, and the regime defines the beta (slope) between continuing claims and the unemployment rate. In my next blog where I will build a more in-depth model of the unemployment rate, I promise to explore this further. But today, we just want to make a guess at the real-time true unemployment rate using continuing claims.


Forecasting Forward

We can use a simple linear trend extrapolation (single variable linear regression) to see what the current unemployment rate should be. We actually don’t want to compare the current month’s unemployment rate with the current month’s claims data. Rather we want to compare the future unemployment rate (shifted forward by 1 month) with the current claims data. Remember that continuing (initial as well) claims imply a future unemployment rate – for example, the people underlying April 2020’s claims data (already published) are a big driver of what the unemployment rate will be when it’s finally published in a few weeks at the beginning of May. Lending credence to this hypothesis is the fact that continuing unemployment claims are actually slightly more correlated to next month’s unemployment rate (0.82) than the concurrent month’s (0.80).

So we can set up the following linear regression to forecast the unemployment rate using continuing claims:

Y = B0 + B1*(This Mth’s Continuing Claims)

Where Y = Next Month’s Unemployment Rate

Let’s plot the scatter plot of future (next month’s) unemployment against current month’s continuing claims (in blue). To the plot, we also add our prediction (in red) calculated using the previous linear regression equation. Our simple model predicts that the May 2020 print of the unemployment rate will be 13.6%.

Unemployment Rate Prediction
Unemployment Rate Prediction

13.6% is a pretty high number. If the model is right, it’s certainly the highest rate of unemployment that we’ve seen in the past 50 years. But it’s still significantly less than the 25% unemployment that the U.S. experienced during the depths of the Great Depression. So as bad as things are, comparisons to the Great Depression look hyperbolic for now.

Unemployment rate and prediction
Unemployment rate and prediction

Conclusion

I actually hope to be proven wrong. 13.6% unemployment is no joke and if the lockdown keeps going longer than expected, more indebted businesses will fail, and more people will lose their jobs. That’s why the government and Federal Reserve are trying so hard to preserve the current economic status quo. It’s significantly less costly to save a business (by giving it money now) than to let it go under and hope it gets replaced by a new one. This is most likely true even after accounting for money wasted on mismanaged businesses that probably should be allowed to default. Of course, this type of spending can’t go on forever, so it’s imperative that we flatten the curve before policy makers run out of money.

Next time, we will build a fancier regression model and dive deeper into the code. Until then, cheers and stay safe everyone!


Related Articles