HELP! We’ve Been HECS’d

A statistical simulation of the Universities Accord recommendations

Murray Gillin
Towards Data Science

--

Introduction

In Australia, students can afford tertiary education through a government loan known as the Higher Education Loan Program (HELP). To ensure the value of the loan is not devalued, they are indexed annually based on the Consumer Price Index. Candidates begin to repay the loan when their post-tax income exceeds ~$51k and has a stepped repayment rate that reaches a maximum of 10% when earning above ~$151k. The most recent indexation rate was a record 7.1% and hadn’t been that high since the 1990s, causing a lot of people to become more aware of the creeping nature of how these student debts are maintained.

Photo by Edwin Andrade on Unsplash

I’m one of these people, completing a Bachelor with Honours when leaving school, and then later retraining with a Master's program during COVID. For most of my 20s I wasn’t earning enough to make the compulsory contributions threshold, so in time the debt crept up. The addition of a Master's degree (~40k) then pushed the debt to a new level. Whilst now I’m taking steps to remove the debt and recover the non-productive deductions that are being made, you can imagine this is a burden on many graduates.

The Universities Accord Review has made the following proposed changes to how the debts are repaid (Recommendation 16).

That to reduce the long-term financial costs of studying for students, the Australian Government make student contributions fairer and better reflective of lifetime benefits that students will gain from studying and reduce the burden of HELP loans by introducing fairer and simpler indexation and repayment arrangements.

This should involve:

A. reducing student contributions to address the most significant impacts of the Job-ready Graduates (JRG) package starting with students in humanities, other society and culture, communications and human movement, and moving toward a student contribution system based on projected potential lifetime earnings

B. reducing the financial burden of repayment on low-income earners and limiting disincentives to work additional hours by moving to a system of HELP repayment based on marginal rates

C. reducing repayment times by changing the timing of indexation for HELP loans so that amounts withheld for compulsory repayment can be accounted for before indexation is applied

D. ensuring that growth in HELP loans does not outpace growth in wages by setting the HELP indexation rate to the lower of the Consumer Price Index (CPI) and the Wage Price Index (WPI)

E. reviewing bank lending practices to ensure banks recognise that HELP loans are not like other types of loans and are not treated in a way that unduly limits peoples’ borrowing capacity for home loans.

Let’s unpack these recommendations, specifically those relating to the indexation and repayment of the student loan, and frame these as questions of statistics.

Point B indicates that the current repayment rates create possible disincentives for low-income earners. This whilst a compassionate view, neglects the long-term impact of a protracted debt position. The debt compounds each year, independently of an individual's capacity to pay. We’ll examine the impact of repayment rates on clearance year and review how other countries index or apply interest to student loans.

Point C is an interesting case, and at face value seems fair, like any other debt as deductions are made from post-tax salaries they should be credited against principle before the later indexation point. We can run simulations to this effect and calculate the change in the distribution of debt clearance years across students.

Current and Proposed Effect of Indexation and Repayments

Point D is a curious attempt to try and limit debt growth to the lower of CPI or WPI. This we can do a simple statistical exercise to measure over the last 20 years how often the CPI is lower than the WPI.

To answer these questions we will attempt to answer the following:

  • Will students benefit from moving between CPI and WPI to minimise the indexation rate?
  • Simulate the trajectory of student loans under three scenarios, current state, proposed future, noted above, as well as applying a 10% rate of repayment across all income levels above the current minimum threshold. This exercise will then allow us to assess if there is a difference in the distribution of clearance years between scenarios.

Will students benefit from applying the smallest of CPI and WPI?

This is one of the key recommendations designed to reduce indexation's impact on students. It is logically flawed as salary growth and level are independent of the debt level. However, let us take it at face value and evaluate if there is any reasonable difference where doing so will slow the growth of debt vs CPI alone. Figures have been sourced and reviewed for WPI and CPI, both made available under CC BY 4.0 thanks to the Australian Bureau of Statistics.

wpi_df <- readr::read_csv("All sector WPI, quarterly and annual movement (%), seasonally adjusted (a).csv", 
col_types = cols(`Quarterly (%)` = col_skip()),
skip = 1) %>%
rename(month_year = ...1,
value = `Annual (%)`) %>%
drop_na() %>%
mutate(month_year = lubridate::my(month_year),
rate = 'wpi')

cpi_df <- read_csv("All groups CPI, Australia, quarterly and annual movement (%).csv",
col_types = cols(`Change from previous quarter (%)` = col_skip()),
skip = 1) %>%
rename(month_year = ...1,
value = `Annual change (%)`) %>%
drop_na() %>%
mutate(month_year = lubridate::my(month_year),
rate = 'cpi')

rates_df <- bind_rows(cpi_df, wpi_df) %>%
filter(month_year >= '2014-03-01' & month_year <= '2023-12-01') %>%
mutate(rate = as_factor(rate))

rates_df %>%
ggplot(aes(month_year, value, color = rate)) +
geom_point() +
geom_line() +
theme_ggdist() +
scale_color_viridis_d(begin = 0.3, end = 0.7) +
labs(x = 'Date', y = 'Index Value', color = 'Rate', title = 'Comparison of Annual CPI and WPI')
Comparison of Annual CPI and WPI by Quarter (Image by Author)

We can see that the variance of CPI is greater than the WPI, but over time, is there a real difference between the two underlying distributions about the mean?

We fit a Bayesian model to 10 years of historical data, take draws from the expected values of the posterior distributions then perform a difference in means assessment, using Bayesian methods.

indices <- 
brm(
bf(value ~ rate + 0,
sigma ~ rate + 0),
data = rates_df,
prior = c(prior(normal(2, 2), class = 'b')),
family = gaussian,
iter = 2000, chains = 4, seed = 246, cores = 4, sample_prior = 'yes'
)

new_df <- tibble(rate = c('wpi', 'cpi'))

new_df %>%
add_epred_draws(indices) %>%
compare_levels(.epred, rate, comparison = list(c('cpi', 'wpi'))) %>%
ggplot(aes(.epred, fill = after_stat(x > 0))) +
stat_halfeye() +
geom_vline(xintercept = 0, lty = 2) +
theme_ggdist() +
scale_fill_manual(values = c("gray80", "skyblue")) +
labs(y = 'Density', x = 'Difference in Posterior Means', title = 'Difference in Posterior Means of WPI and CPI',
subtitle = "80% of Density is Greater then 0/nApplying ROPE of 10%, Difference is Neglible", fill = 'Value Greater Then 0')
Difference in Posterior Mean Distribution of CPI and WPI (Image by Author)

Based on posterior distributions for WPI and CPI, the mean CPI is greater than the mean WPI by an average of 0.3bps. 80% of the posterior distribution of differences is greater than 0, and within a ROPE of 10%, we can consider this difference to be negligible. Shifting between WPI and CPI will at best minimize the impact of inflation, but in the long term won’t materially assist students overcome these debts.

Simulating Indexation and Repayment Scenarios of Students

Let us continue to assess the other elements of the recommendation, largely, indexation timing and compulsory repayments rates.

Outline of Key Simulated Assumptions

To simulate student outcomes we need reasonable prior knowledge of the key variables.

To simulate graduate salaries we’ve taken a fairly broad view of using the median graduate salary of $68,000 and applying a lognormal distribution featured below. This way we capture most of the graduate salaries around this point, but allow for much higher starting salaries that can be available in some sectors. Similarly, for debt, we’ve taken the current average student debt and applied a reasonably right-skewed distribution to capture the broad range of debts.

For the indexation rate, we’ve simply taken the mean quarterly annual CPI figure across 10 years and assume this follows a normal distribution, as above. Similarly, we’ve taken a conservative, and positive view on salary growth by assuming an average annual increase of 3% following a log-normal distribution that enables graduates to climb faster because of promotions or new job opportunities.

Simulated Variable Distributions (Image by Author)
Assumptions of Simulated Variables (Image by Author)
index <- 1:50000 #simulate 50000 students

year <- 0:19 #over 20 years

calculate_salary <- function(previous, basevalue, multiplier) {
coalesce(basevalue, multiplier * previous)
}

set.seed(246)

base_df <-
crossing(index, year) %>%
group_by(index) %>% #salary and growth varies by individual
mutate(salary_0 = rlnorm(1, mean = log(68000), sdlog = log(1.34)),
debt_0 = rlnorm(1, meanlog = 10.2, sdlog = 0.5)) %>%
group_by(year) %>% #indexation rate applies uniformly across all indices each year
mutate(indexation_rate = rnorm(1, mean = 0.027, sd = 0.012),
indexation_rate = round(indexation_rate, 3)) %>%
group_by(index, year) %>%
mutate(salary_growth = rlnorm(1, meanlog = -3.5, sdlog = 0.6) + 1,
salary_growth = round(salary_growth, 3)) %>%
group_by(index) %>%
mutate(salary_0 = if_else(year > 0, NA, salary_0),
salary_1 = accumulate2(salary_0, salary_growth[-1], calculate_salary),
salary_1 = case_when(salary_1 < 18200 ~ salary_1 * (1-0), #calculate post-tax incomee
salary_1 >= 18201 & salary_1 <= 45000 ~ salary_1 - 0 - (salary_1-18200)*(0.19),
salary_1 >= 45001 & salary_1 <= 120000 ~ salary_1 - 5092 - (salary_1-45000)*(0.325),
salary_1 >= 120001 & salary_1 <= 180000 ~ salary_1 - 29467 - (salary_1-120000)*(0.37),
salary_1 >= 180001 ~ salary_1 - 51667 - (salary_1-180000)*(0.45)
))

The above code snap sets up the simulation of 50,000 students over 20 years. We use our custom function calculate_salary and purrr::accumulate to iteratively calculate the salary increase given the sampled growth distribution, and then their post-tax income. What is also a point to note, each candidate draws from their growth distribution, but indexation applies across all students at the same rate per year.

calculate_remaining_debt <- function(principal, payment, interest_rate) {
interest = principal * interest_rate
remaining_debt = principal + interest - payment
remaining_debt = ifelse(remaining_debt < 0, 0, remaining_debt)
remaining_debt
}

set.seed(246)

df <- base_df %>% mutate(
repayment_rate = case_when(
salary_1 < 51550 ~ 0.0, # Repayment Rates Post-Tax Income
salary_1 >= 51550 & salary_1 <= 59518 ~ 0.01,
salary_1 >= 59519 & salary_1 <= 63089 ~ 0.02,
salary_1 >= 63090 & salary_1 <= 66875 ~ 0.025,
salary_1 >= 66876 & salary_1 <= 70888 ~ 0.03,
salary_1 >= 70889 & salary_1 <= 75140 ~ 0.035,
salary_1 >= 75141 & salary_1 <= 79649 ~ 0.04,
salary_1 >= 79650 & salary_1 <= 84429 ~ 0.045,
salary_1 >= 84430 & salary_1 <= 89494 ~ 0.05,
salary_1 >= 89495 & salary_1 <= 94865 ~ 0.055,
salary_1 >= 94866 & salary_1 <= 100557 ~ 0.06,
salary_1 >= 100558 & salary_1 <= 106590 ~ 0.065,
salary_1 >= 106591 & salary_1 <= 112985 ~ 0.07,
salary_1 >= 112986 & salary_1 <= 119764 ~ 0.075,
salary_1 >= 119765 & salary_1 <= 126950 ~ 0.08,
salary_1 >= 126951 & salary_1 <= 134568 ~ 0.085,
salary_1 >= 134569 & salary_1 <= 142642 ~ 0.09,
salary_1 >= 142643 & salary_1 <= 151200 ~ 0.095,
salary_1 > 151201 ~ 0.1),
repayment = salary_1 * repayment_rate,
debt_1 = accumulate(2:n(), .init = first(debt_0),
~ calculate_remaining_debt(.x, repayment[.y], indexation_rate[.y])
),
repayment = if_else(debt_1 == 0, 0, repayment),
debt_paid = if_else(debt_1 == 0, 'y', 'n'))

df <-
df %>%
group_by(index) %>%
mutate(clearance_year = if_else(lag(debt_1, default = first(debt_1)) > 0 & debt_1 == 0, 1, 0),
clearance_cum = cumsum(clearance_year)) %>%
filter(clearance_cum == 0 | clearance_year == 1) %>%
select(1:11) %>%
mutate(is_paid = if_else(debt_paid == 'y', 1, 0),
debt = round(debt_1, 2),
salary = round(salary_1, 2),
scenario = 'INDEXATION PRE PAYMENT; PROGRESSIVE PAYMENT RATE',
scenario_l = 'A',
group = if_else(max(is_paid) == 1, 'paid', 'unpaid'))

# Repeat Again with Different Calculation of Debt, Indexation Post Repayment

calculate_remaining_debt <- function(principal, payment, interest_rate) {
remaining_debt = principal - payment + (principal-payment) * interest_rate
remaining_debt = ifelse(remaining_debt < 0, 0, remaining_debt)
remaining_debt
}



df2 <-
df2 %>%
group_by(index) %>%
mutate(clearance_year = if_else(lag(debt_1, default = first(debt_1)) > 0 & debt_1 == 0, 1, 0),
clearance_cum = cumsum(clearance_year)) %>%
filter(clearance_cum == 0 | clearance_year == 1) %>%
select(1:11) %>%
mutate(is_paid = if_else(debt_paid == 'y', 1, 0),
debt = round(debt_1, 2),
salary = round(salary_1, 2),
scenario = 'INDEXATION AFTER PAYMENT; PROGRESSIVE PAYMENT RATE',
scenario_l = 'B',
group = if_else(max(is_paid) == 1, 'paid', 'unpaid'))

# Last Scenario Applies a Flat Rate of 10% Repayment Above Minimum Threshold

df3 <- base_df %>%
mutate(repayment_rate = case_when(
salary_1 < 51550 ~ 0.0,
salary_1 >= 51550 ~ 0.1),
repayment = salary_1 * repayment_rate,
debt_1 = accumulate(2:n(), .init = first(debt_0),
~ calculate_remaining_debt(.x, repayment[.y], indexation_rate[.y])
),
repayment = if_else(debt_1 == 0, 0, repayment),
debt_paid = if_else(debt_1 == 0, 'y', 'n'))

df3 <-
df3 %>%
group_by(index) %>%
mutate(clearance_year = if_else(lag(debt_1, default = first(debt_1)) > 0 & debt_1 == 0, 1, 0),
clearance_cum = cumsum(clearance_year)) %>%
filter(clearance_cum == 0 | clearance_year == 1) %>%
select(1:11) %>%
mutate(is_paid = if_else(debt_paid == 'y', 1, 0),
debt = round(debt_1, 2),
salary = round(salary_1, 2),
scenario = 'INDEXATION PRE PAYMENT; FLAT PAYMENT RATE',
scenario_l = 'C',
group = if_else(max(is_paid) == 1, 'paid', 'unpaid'))
Sample Dataframe (Image by Author)

We then combine the three scenarios into a single view for later analysis. Below we visualise the first 9 students in our dataset.

df4 <- bind_rows(df, df2, df3)

df4 %>%
filter(index <= 9) %>%
ggplot(aes(year, debt, color = scenario)) +
geom_point() +
geom_line() +
facet_wrap(~index) +
theme_ggdist() +
scale_color_brewer(palette = "Dark2") +
theme(legend.position = 'bottom', legend.direction = 'vertical') +
labs(x = 'Year', y = 'Debt', title = 'Scenarios of First Nine Students')

Let’s unpack, we’ve simulated three scenarios noted above. The reason we’ve simulated a 10% repayment rate, is because this is what New Zealand do for student debt, starting from ~ NZD 25,000. Some might say this is quite punitive, however, NZ does not charge interest or apply indexation unless graduates leave the country to work. I thought this would be a reasonable scenario to scope out. The above visualization shows the debt trajectory for each scenario for the first 9 students in our dataset — we’ve simulated 50,000 students in total.

Index 4 is a great example of how salary and debt indexation are conditionally independent. For the first 14 years in the current state, no repayments were made against the debt. Secondly, it wasn’t until year 15 that the graduate had a post-tax salary that was above the ~51,000 threshold for compulsory payments, but the debt grew nearly 22,000 in that period, and within the 20-year horizon doesn’t clear the debt.

The low progressive rates mean that this student may never pay off this debt as the indexation will continue to outpace the current rate of repayment each year. Which is the better outcome, a graduate that pays off their student debt or has this burden for the rest of their professional lives, scraping increments from their salary at values less than the effect of indexation?

I think this highlights a key missing point of the review, what problem/s are we solving for? In my opinion, the HELP system must enable graduates to pay off their debt as soon as reasonably possible.

Modelling Years to Debt Clearance

Now that we have a dataset with simulations for all students, we can assess the distribution in years to debt clearance across each scenario.

df4 %>% 
filter(group == 'paid') %>%
group_by(index, scenario) %>%
summarise(debt_clearance = max(year)) %>%
ggplot(aes(debt_clearance, fill = scenario)) +
geom_histogram(binwidth = 1) +
facet_grid(~scenario) +
theme_ggdist() +
scale_fill_brewer(palette = "Dark2", aesthetics = c('color', 'fill')) +
theme(legend.position = 'none') +
labs(x = 'Years to Debt Clearance', y = 'Count', title = 'Distribution of Years to Debt Clearance by Scenario')
Distribution of Years to Debt Clearance by Scenario (Image by Author)

What is becoming quite clear is that there is little difference between changing the indexation point, however setting a higher repayment rate, greater than the indexation rates by far, enables students to pay their debt off much sooner. Let’s complete a Bayesian ANOVA to get a sense of the differences in posterior mean years. Given we are dealing with count data, in the below we fit three models, varying the likelihood and number of parameters.

set.seed(246)

# Take Sample of Total Dataframe
count_df_sample <- df4 %>%
filter(group == 'paid') %>%
group_by(index, scenario, scenario_l) %>%
summarise(debt_clearance = max(year)) %>%
group_by(scenario_l) %>%
slice_sample(n = 4000)

# Poisson Likelihood
m1a <- brm(
debt_clearance ~ scenario_l + 0,
data = count_df_sample,
family = poisson,
prior = c(prior(gamma(9, 1), class = 'b', lb = 0)),
chains = 4, iter = 2000, cores = 4, threads = threading(2)
) %>%
add_criterion(c('loo', 'waic'), moment_match = T)

# Negative Binomial Likelihood w/ Pooling
m1b <- brm(
debt_clearance ~ scenario_l + 0,
data = count_df_sample,
family = negbinomial,
prior = c(prior(gamma(9, 1), class = 'b', lb = 0)),
chains = 4, iter = 2000, cores = 4, threads = threading(2)
) %>%
add_criterion(c('loo', 'waic'), moment_match = T)

# Establish Prior for Non-Pooling Negative Binomial

prior <- get_prior(
bf(debt_clearance ~ scenario_l + 0,
shape ~ scenario_l + 0),
data = count_df_sample,
family = negbinomial,
prior = c(prior(gamma(9, 1), class = 'b', lb = 0))) %>%
as_tibble() %>%
mutate(prior = if_else(class == 'b' & dpar == 'shape' & coef == '', 'gamma(9, 1)', prior),
lb = if_else(class == 'b' & dpar == 'shape' & coef == '', '0', lb),
prior = if_else(class == 'b' & dpar == '' & coef == '', 'gamma(6, 1)', prior),
lb = if_else(class == 'b' & dpar == '' & coef == '', '0', lb)) %>%
as.brmsprior()

# Negative Binomial Likelihood w/o Pooling

m1c <- brm(
bf(debt_clearance ~ scenario_l + 0,
shape ~ scenario_l + 0),
data = count_df_sample,
family = negbinomial,
prior = prior,
chains = 4, iter = 2000, cores = 4, threads = threading(2)
) %>%
add_criterion(c('loo', 'waic'), moment_match = T)

loo_compare(m1a, m1b, m1c) %>% print(simplify = F)
Output from LOO Comparison (Image by Author)

From the three models we’ve created — the negative binomial model without pooling shows better out-of-sample predictive power. For the sake of our ANOVA, we’ll use this model. For reference, scenario A is Indexation Pre, Progressive, B is Indexation Post, Progressive and C is Indexation Pre, Flat Rate.

new_df <- tibble(scenario_l = c('A', 'B', 'C'))

m1c %>%
tidybayes::epred_draws(new_df, ndraws = 4000, seed = 111) %>%
compare_levels(.epred, scenario_l) %>%
ggplot(aes(.epred, scenario_l, fill = scenario_l)) +
stat_dist_halfeye() +
geom_vline(xintercept = 0, linetype = 'dashed') +
theme_ggdist() +
scale_fill_brewer(palette = "Dark2") +
labs(x = 'Difference in Posterior Means', y = 'Scenario', title = 'Differences in Posterior Mean Between Each Scenario', fill = 'Scenario')

The above tells us two things — that changing the timing of indexation has a negligible impact on the maturity of HELP debts. Secondly, having a 10% flat rate of repayment will on average decrease the time to pay off the debt by an average of 4 years.

It also goes, that the shorter the time of the debt, the lower the impact of effective compounding has on the debt value, and means graduates will regain the portion of their salaries otherwise directed to payments, giving them an effective post-tax pay increase.

Concluding Remarks

This analysis has quantitatively evaluated the recommendations put forth by the Universities Accord Review in a wholly transparent and repeatable manner.

Our analysis has sought to provide what we must consider a best-case situation of alternative approaches to the status quo. Our simulations are optimistic, but not unrealistic views on debt and salary growth over 20 years.

Firstly, we assessed whether or not swapping between CPI and WPI would be worthwhile in the long term. Based on our modelling, we expect in the long term to be a negligible difference shifting between the two indices for the calculation of indexation, amounting to a mean difference of 0.3bps in favour of CPI.

Secondly, we simulated debt and salary trajectories for 50,000 graduates over 20 years given the three scenarios above, moving the indexation point, and increasing the repayment rate. We then used this data to assess any difference in mean years to debt clearance and noted that changing the indexation point will have not a tangible effect, and increasing the repayment rate to a flat 10% represents a viable option to enable graduates to pay off their debt faster. We can express these statements concerning our DAG, by increasing repayments we observed a faster decrease in debt levels. Also, the impact of indexation is independent of salary, and as such any attempt to mitigate its impact based on income levels ignores the relationship below.

Causal Diagram for HELP Debt Maintenence (Image by Author)

It becomes a case of more sacrifice up front, but over time avoid the compounding effect that indexation has on principal debt levels. This scenario replicates what occurs in NZ to an extent, a flat, broad rate that applies at an even lower minimum threshold.

It is incumbent on the authors of the Universities Accord Review, to discuss the implications of their recommendations. The report makes implied causal statements without validation or appropriate simulations. You cannot in one sentence seek to alleviate the financial pressures of expensive tertiary education, and then not offer costed solutions to how students can either through compulsory and voluntary payments to clear debts in a more timely fashion.

The Federal Government to their credit have said HELP debts are under review, but given the recommendations in the report, and based on the simulations we’ve conducted, doubt that any of these will have a material benefit in improving graduate financial outcomes.

--

--

Analytics Program Manager at Amazon Australia | Passionate Data Analyst and ML Enthusiast | Join the Adventure https://mmgillin.medium.com/membership