Hands-on Tutorials

Winning the Kaggle Google Brain — Ventilator Pressure Prediction

On LSTMs, Transformers and PID controllers…

Gilles Vandewiele
Towards Data Science
13 min readNov 12, 2021

--

An artificial mechanical lung (from Wikimedia)

On the 4th of November, we managed to came out on top out of 2650 teams in the Google Brain — Ventilator Pressure Prediction, organized on Kaggle. In this blog post I would like to take you through the journey that bagged us the victory. A more summarized version of this write-up can be found on the Kaggle forum and we have published code for our LSTM + CNN Transformer model and PID matching.

Winning eternal glory and $2500! Screenshot by Author

An email from Kha…

I was a little bit burnt down after finishing my PhD and after an unpleasant but short work experience. As a result, I was taking a bit of a hiatus from Kaggle in 2021, as you can see from my activity log. I participated a little bit in the “Rock, Paper, Scissors contest” and had a go at the “Indoor Location & Navigation challenge”.

Screenshot by Author

On the 23rd of September, Yves and I received an email from Kha. He spotted an interesting time series competition that was only lasting for 1 month on Kaggle. The competition appealed to me as I am quite familiar with time series data, and also a big fan of shorter competitions. Moreover, I know that when Kha, Yves and I team up, that good things often follow! I replied to Kha that I would be interested in joining, but had some things to take care of in real life first before I could really dive into it. One of those things that had to be done was preparing & giving a presentation on the third place solution Kha, Yves and I obtained in 2020 in the “Liverpool Ion Switching” competition. Kha and Yves started working on the problem and making some good progress. Some time later, on the 6th of October, I joined them.

Brief problem statement

The goal of this competition is to predict the pressure within a mechanical lung on any given time step, based on how much the inspiratory solenoid valve is opened (u_in; a value between 0 and 100). We were given roughly 75000 training breaths and 50000 test breaths. Every breath consisted of an inspiratory phase and an expiratory phase, which was indicated by the provided u_out. Our submissions were only scored on inspiratory phase (so where u_out was equal to 0). Moreover, the breaths were recorded in a mechanical lung with different resistance and capacity properties, which were encoded in provided R and C variables.

An example of 1 training breath. We are provided with u_in, u_out, R & C and are asked to predict the pressure. Image by Author

Deep learning struggle

From the public notebooks that were published in this competition, it was clear that LSTMs or any kind of temporal deep learning models would reign supreme. All three of us quickly adding features or making small modifications to these pipelines, but to no avail… It was clear that deep learning isn’t our forté. We then had a switch in mindset: “instead of trying to create one model, fitted on all train data, that is then used to make predictions for all test data, can’t we just focus on breaths individually instead and see whether we can model those to find patterns?”

Yves, who has a background in physics, looked at PID controllers and quickly found a nice and simple post-processing trick that was able to give us a +0.004 boost on the LB, which is quite significant. Thanks to this post-processing trick, we were able to climb the leaderboard with a blend of slightly modified public notebooks, landing us a spot right out of the gold zone. The post-processing trick of Yves is rather simple to explain (but not that easy to find) and is based on the theory of PID controllers.

We could come up with a very simple formula: pressure = kt — (u_in/kp) . With kt and kp parameters of the algorithm that had to be tuned, u_in the input data and pressure the output data that we want to predict. The grid to search these parameters in was mentioned in a paper written by the organizers. But this grid can easily be discovered without the paper by just applying linear regression to each breath in the training data (where we have u_in and pressure).

Now the question is, how do we know how good a certain set of parameters is for test breaths, as we do not have pressures to calculate an error. For this, we could make use of the discrete nature of the pressures. There are only 950 unique pressure values, with equal distances in between them. As such, if we fill in two candidate parameters kt and kp and the calculated pressure values align perfectly with these 950 unique pressure values, we have a match. This matching worked for roughly 2.5% of the data.

KP_GRID = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 
2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
KT_GRID = [10, 15, 20, 25, 30, 35]
def match_breath(breath):
u_in = breath['u_in'].values
inspiratory_len = np.sum(1 - breath['u_out'])

u_in = u_in[1:inspiratory_len]
breath_mask = (u_in > 0) & (u_in < 100)

if np.sum(breath_mask) < 2:
return None, None, None, None

for kt in KT_GRID:
for kp in KP_GRID:
preds = (kt - u_in / kp)
preds_round = (preds-MIN_PRESSURE)/DIFF_PRESSURE
tune_error = np.round(preds_round[breath_mask]) - preds_round[breath_mask]
if np.sum(np.abs(tune_error)) < 1e-6:
return preds, kt, kp, breath_mask

return None, None, None, None

Send in the reinforcements!

After discovering the post-processing trick, we were not able to make any more significant improvements. We slowly saw our rank dropping on the LB, and it was clear that we would needed someone with more deep learning experience in order to obtain a good result in this competition. We messaged Shujun who was just below the gold zone at that time. I knew that Shujun is extremely strong in deep learning as he got a solo gold medal with a unique and strong architecture in the “OpenVaccine — Covid mRNA” competition in which I participated too. Boy did he NOT disappoint.

Shujun’s deep learning architecture was a combination of LSTM cells, 1D convolutions, and transformers. LSTMs were necessary because of target pressure’s heavy dependence on previous time points. Convolutions in conjunction with transformers is a good combination to model global dependencies while making up for transformers’ inability to capture local interactions. The network was rather deep, and we ran into some gradient issues. Therefore, a new module called ResidualLSTM, which adds a Feedforward Network (FFN) and connects the input to the LSTM with the output after FFN with a residual connection, was created. The model works on the raw data in addition with a few lag and difference features, a cumulative sum of the u_in, and one-hot-encoded R & C features.

The LSTM + CNN Transformer architecture. Image by Co-Author Shujun.

We ensembled our current blend with Shujun’s blend of different runs of his model and applied our PP trick. Boom, we were in the gold!

Image by Author

We tried suggesting some ideas to Shujun to implement in his pipeline, and also tried adding some features so the pipeline (as his pipeline was leveraging features). But again, we were not successful.

A few days later, we felt that we needed another unique deep learning pipeline for our blend. This is where Riccardo enters the story! At the time when we messaged him, Riccardo was only a few spots below us. He had an LSTM architecture up and running that converged on just the raw data. No feature engineering, no fancy stuff, just raw data and a lot of epochs. The trick was to have a good learning rate scheduler. We opted for a ReduceLROnPlateau with a patience of 30 epochs. This was in contrast with the fast annealing schedulers typically used within the public notebooks, which were susceptible to overfitting. This can be seen when we compare our validation loss curves to those of the public notebooks, as displayed below. Here the red (stakes) curve corresponds to the public notebooks and quickly overfits.

Image by Author

This approach even scored better than the LSTM + 1D CNN Transformer Encoder. The blend of these pipelines gave us the fourth spot on the leaderboard. We decided to change our ambitions from a gold medal to winning money.

Image by Author

Again, all of us focused on improving the two pipelines we had up and running. Making slight modifications to both pipelines or adding features to the transformer pipeline. Only extremely marginal improvements were made, and the gap with the competitors ahead of us was too big. Surely, there was something hidden in the data that they had found and we did not yet…

The best birthday gift ever

On the 26th of October, my birthday, I decided to learn some more about PID (Proportional - Integral - Derivative) controllers, as those were a core concept in the organizers their paper as well. Moreover, our post-processing trick came from Yves looking into these PID controllers.

Let’s poke this rabbit hole… Image by Author

I played around with the simple-pid package in Python as this package already implements the PID formulas. I was able to reproduce our post-processing trick with this package as well. However, for most breaths I was not able to find a match. That was until I decided to play around with the code and the integral term specifically. There are multiple ways to calculate the Integral term of PID controllers, and the package only included the vanilla implementation. I decided to plug in some other alternatives and suddenly, I had a breakthrough… WHAT A BIRTHDAY GIFT! The following code snippet (based on the simple-pid package) was able to model pressure->u_in perfectly for some breaths:

Close, but not yet… Still a huge breakthrough! Image by Author
A few hours later after fixing some details… Image by Author

Here is the code snippet that was able to perfectly model pressure to u_in for some of the breaths:

def generate_u_in(pressure, time_step, kp, ki, kt, integral=0):
dt = np.diff(time_step, prepend=[0])
preds = []
for j in range(32):
error = kt - pressure[j]
integral += (error - integral) * (dt[j] / (dt[j] + 0.5))
preds.append(kp * error + ki * integral)
return preds

I thought we had a new nice post-processing trick on our hands, and sent this information to Yves. Yves made something much much bigger out of it. He came up with some code that was able to match the pressures perfectly.

Can you feel the excitement? Image by Author

The perfect match

We can write the “vanilla” equation of a PID controller (that corresponds to the generate_u_in snippet I provided):

With epsilon equal to target — pressure where target represents the ideal pressure that is set by the user. Now gamma, which is the weight of the derivative term, was always set to zero by the organizers. So we only had to focus on the first (proportional) and second (integral) term. Our previously discussed post-processing technique actually corresponds to this equation with beta set to 0. Let’s rewrite the second integral term for ease of notation. Moreover, the integral term was implemented by a weight decay as follows:

With delta_t the difference in time between timestep t and t-1 and tau a constant, which was equal to 0.5 for this data. We can isolate the integral term from the PID equation, in order to calculate it for any point in time. This avoids having to propagate this across the entire signal:

With these equations, we are all set for our matching algorithm. Our matching algorithm to fill in the pressure for a given breath at timestep t, for a given configuration of parameters (target, alpha and beta or referred to as kt, kp and ki respectively in the remainder of this post) goes as follows:

  1. Take two random pressure values P0 and P1 . There are 950 possible pressure values and thus 950*950 possible combinations.
  2. Calculate I0 = (u_in[t - 1] - kp * (kt - P0))/ki
  3. Calculate I1 = I0 + (kt - P1 - I0) * (dt / (dt + 0.5))
  4. If kp * (kt - P1) + ki * I1 == u_in[t] we have a match and we can fill in P1 for pressure[t]

In code:

P_RANGE = np.arange(
MIN_PRESSURE,
MAX_PRESSURE + DIFF_PRESSURE,
DIFF_PRESSURE
)
for P0 in P_RANGE:
for P1 in P_RANGE:
I0 = (u_in[t - 1] - kp * (kt - P0))/ki
I1 = I0 + (kt - P1 - I0) * (dt / (dt + 0.5))
u_in_hat = kp * (kt - P1) + ki * I1
if abs(u_in_hat - u_in[t]) < 1e-8:
print(P0, P1, pressure[t-1:t+1])

However, this code is rather slow. This is because 950*950 combinations need to be tried for every possible parameter combination. kp and ki have 20 possible values and kt has 6 possible values, in total we thus have 950*950*20*20*6 possible combinations. However, for a fixed P0 we notice that abs(u_in_hat — u_in[t]) grows linearly across the for-loop. As such, we can extrapolate this with just 2 measurements. If the intersection of this extrapolated function with the x-axis occurs at an integer (or close to an integer), our if-condition would have evaluated to True in an iteration of our inner for-loop.

Image by Author

This observation allows us to speed up the code with a factor of 950.

for P0 in P_RANGE:
I0 = (u_in[t - 1] - kp * (kt - P0))/ki

# Calculate 2 points for P1 so we can get the slope
I11 = I0 + (kt - MIN_PRESSURE - I0) * (dt / (dt + 0.5))
u_in_hat1 = kp * (kt - MIN_PRESSURE) + ki * I11

I12 = I0 + (kt - MIN_PRESSURE2 - I0) * (dt / (dt + 0.5))
u_in_hat2 = kp * (kt - MIN_PRESSURE2) + ki * I12

slope = u_in_hat2 - u_in_hat1
x_intersect = (u_in[t] - u_in_hat2) / slope
if abs(np.round(x_intersect) - x_intersect) < 1e-8:
print(P0, MIN_PRESSURE + (x_intersect + 1) * DIFF_PRESSURE)

Turn it up! Bring the noise!

With the code above, we were able to match a lot of pressures. But many of them still remained unmatched. In the paper of the organizers, it was explained that artificial noise was added to the u_in data. We set out on an endeavour to remove this noise. Let us first look at the output of the current version of a matching algorithm on a particular breath (train breath id 17)

Image by Author

As one notices, the first value of the u_in and some values in the middle were not fully matched. The first values of all breaths are noisy and are not matchable, but the points in the middle should be matchable… Let us start from index 5 and call our generate_u_in code on the provided pressures to generate “ideal u_in values” and subtract this from the u_in we are provided.

kp, ki, kt = 1., 3., 30
I5 = (u_in[4] - kp * (kt - pressure[4]))/ki
u_in_hat = generate_u_in(pressure[5:], timestep[4:], kp, ki, kt, integral=I5)
noise = u_in[5:32] - u_in_hat[:-5]
plt.plot(timestep[5:32], noise, '-o')
plt.title('noise')
plt.show()
Image by Author

The difference between the ideal u_in values and the provided u_in values form a nice triangle, except on the turning point / peak of the triangle. The key insight is that the slope of this noise ((np.diff(noise) / np.diff(timestep[5:32]))) will be equal for consecutive points on this triangle. We can also rewrite our integral calculation to work backwards.

Strapped with these two insights, we were able to match a lot more data:

Image by Author

Putting it together

Given our DL models and matching algorithm, our approach was rather simple:

  1. Generate predictions with our two DL models. Train them in 10-fold cross validation at multiple seeds. This gives us multiple models with slight variations. Take the (weighted) median of these models as an ensembling technique as that is typically better than a mean when MAE is the metric.
  2. From these predictions, apply our matching algorithm. If we find a match, replace the prediction of the DL algorithm by the matcher. If there is not a match, keep the DL prediction.

In total we were able to roughly match 66% of the data, which gave us a HUGE boost on the leaderboard.

Closing time

This was my first victory on Kaggle. While I admit that the PID controller matching was a bit cheeky, I am still extremely proud of our result as only 2 teams out of 2650 were able to find this exploit. Moreover, we were the only team that was able to match the noise within the u_in data as well. I would like to thank my team mates zidmie, kha, shujun and B as without them this result would not have been possible!

In case you have any questions or something is unclear, feel free to leave some comments or contact me through some other medium (pun intended).

References

  1. Organizers Paper
  2. Kaggle Forum write-up
  3. PID Matching Code + Explanation (Kaggle)
  4. PID Matching Code + Explanation (Github)
  5. LSTM + CNN Transformer code (Kaggle)
  6. LSTM + CNN Transformer code (Github)
  7. LSTM on raw data code (Github)

--

--