A closer look into the Spanish railway passenger transportation pricing

Salva Rocher
Towards Data Science
7 min readAug 6, 2019

--

Introduction

As someone who lives and works in a Spanish city 400km away from home, I have found that the most convenient way to travel back and forth is to resort to the train. As a frequent user I have grown baffled of the pricing pattern upon buying the tickets, moving sometimes along the same levels, while others out of the most common levels.

So this doubt spurred me to formulate the following questions:

“Do train ticket prices really change over the days”?

And if so,

“Is there an optimal moment to buy them?”

Data

In this project, only Renfe’s long distances routes were considered.

The dataset is originally sourced from a Renfe scraping procedure carried over by thegurus.tech, where prices for the sampled routes departing trains where checked several times on loop each day.

In particular, the trains whose prices were checked, range about 3 months, from April 12th 2019 to July 7th 2019.

Data source: https://thegurus.tech/posts/2019/05/renfe-idea/

Data wrangling

After spending some time getting to know the data, the natural next step is to clean the raw and transform it in a way that is prepared for analysis, therefore prepared to cast light into the main questions. Code for it can be found in my Github repo for this project.

Developing this a little bit more, cleaning and transformation tasks range from creating new columns such as routes, departure date, departure time, identifiers for a particular train departing in a given day and in a given time, or days to departure; to changing format to columns to be able to do calculations and further transformations on them, through reducing categories for some categorical variables such as train type or ticket class, dropping not needed columns or deciding what to do with invalid rows (with null values).

Results

Code for all the plots and results presented in this section can also be found here.

All routes together

There are a few interesting things to comment on this graph. On one hand, it answers the main questions.

1) We can see that the price indeed changes over the days, not only that, we can also see that there is clearly an aggregated trend where the price goes up as the departure day gets closer.

2) Once we accept that there are indeed relevant variations, is there an optimal moment to buy the tickets? Well, from the aggregated graph, we can see that the optimal moment to buy is as soon as the tickets are made available for sale, and in any case, between 50 to 60 days.

On the other hand, it allows us to identify 3 main stages. First, between 42 to 60 days before departure, we can find the lower range of prices, then a period between 42 and 13 days before departure where the variation in prices is very low, around 6%-7% altogether. And lastly, there is the stage where the ticket price increases by the day.

Let’s break it down by dataset routes.

Price ticket evolution in a high demand route

Madrid-Barcelona round route has been selected as an example of a high demanded one. It has a similar pattern that the one we observed taking into account all available routes together. We see a drop in prices after ticket has been released for sale, close to 50 days before departure. Again, we can identify the 3 stages, first between 40 to 60 days before departure, lower range of prices, then a period between 40 and 12 days before departure where the variation in prices is very low, around 6%-7%. And last there is the stage where the ticket price increases by the day.

Price ticket evolution in a non-high demand route

This time, Renfe sets an initial price, but since demand is not met for that price, it starts to dip until it touches the lowest point between 40–50 days before departure. Then the price evolution is much more volatile, and it continuously rises and falls as the departure day approaches, but always following an increasing trend.

The intermediate stage that we saw in previous plots, where the price pretty much stays constant for almost a month, it is not quite repeated here.

The last point I would highlight is that the price variation range is much lower than for a high demand route.

Canvas of all dataset routes price evolution

We can also break down the price evolution by trip/train characteristics.

Day of the week effect

We can see that the price evolution pattern is very similar regardless of the day of the week when the train departs, however, we see that Friday and Sunday tickets are more expensive on average than for the trains departing in the rest of the days. Monday train tickets are the ones whose price rises the most in the stage where the train is close to departure. Saturdays would be on the opposite side of the spectrum, the price increase is the least pronounced.

Departure time effect

We were seeing before in the boxplots that indeed there was a slight difference in the price decomposing by departure time window, where we could see that evening departing trains observed less cheap prices that the ones in the rest of departing time windows, We were also venturing to say that it could be because evening departing trains have in general more demand than the rest, therefore, their prices that are set higher from Renfe, don’t get to dip adjusting for a lower initial demand, as it happens with the rest.

This graph above backs this theory up. In fact, evening trains lowest price happens to be around a month before departing, resulting in a different optimal purchase moment.

Ticket class effect

General already commented patterns hold for both types of tickets, so I would highlight that since we know that economy tickets sell out first, that explains why the rise in the prices when the departure date approaches (“last stage”) is so smooth. On the contrary thus, first class are more demanded towards the last days before departure, and that is why the prices rise more strongly as compared to economy class tickets.

Checking whether there are any pricing intraday differences

Once assessed price interday patterns, let’s put the focus for a moment in intraday price evolution patterns, to see if perhaps they also exist.

Lines continuously interweave and overlap in what it seems to tell that there are no differences in the intraday pricing per window times by days to departure. In particular, the only point that sticks out is the drop in evening prices towards the lower boundary of the graph (few to none days to departure). Zooming in, it is possible to see that the drop belongs to the same day of departure.

So, in order to check whether we need to consider that drop relevant, we first need to know the number of price checks runs that are done in general and how many are done per departure window time, in order to exclude potential dependency effects.

We can see that the price checks are done round-the-clock pretty much at equal proportions. Let’s see if this holds when zooming in for the last day, which is when the drop takes place.

The proportionality does not hold any longer. It means that for price checks done on the same day of departure, those price checks are done in an imbalanced way; in particular, we can see that the number of evening price checks is much lower than the ones done in other window times.

So basically, since the difference is so high, the few trains whose price has been checked in the evening and in the same day of departure, might perfectly be of a particular type which on average are cheaper. In a nutshell, there are not relevant intraday price differences.

Conclusions

We have found relevant price movements for the same combo train-day-time-route as the departure date approaches.

In particular, for high demand routes, the optimal moment to buy is between 50–60 days before departure. For non high demand routes, the optimal moment can be found between 40 to 50 days before departure. When train departures in the evening, the optimal moment is 30 days before departure.

In any case, if due to any reason the ticket cannot be purchased with as much time in advance, there is no need to worry since the price barely varies up until 12 days before departure (just 6–7% on average).

But, passed that threshold, tickets gets more expensive day by day as departure day approaches all the way to the day when the train departs.

Lastly, we have not found any intraday pattern in Renfe pricing system. So grouping by days to departure, one is indifferent to buy at any particular moment of the day.

--

--