The world’s leading publication for data science, AI, and ML professionals.

I Simulated the Next 50 Years of Baseball

What scoring records will be broken?

Image provided by Stock Up
Image provided by Stock Up

May 1st, 1920. The Brooklyn Robins faced off against the Boston Braves in what would end up being the longest game, in terms of innings, ever played. After 26 innings, the game was ended due to darkness and they settled for a tie. How long could this game have gone? There were 0 runs scored for 20 innings straight, so could this game have gone on to reach 30+ innings? Unfortunately, we will never know just how long the longest MLB game could have continued for… but we can simulate the game to find the limits of modern-day Baseball, and maybe find some clarity on how many scoreless innings the Robins and Braves had left.


While this started with trying to find a record-breaking game in terms of innings, I realized I had all of the tools necessary to go after some other insane records. As such, my additional goals for this was to beat some of the more absurd scoring records, like most combined runs in a game (49, Phillies and Cubs in 1922), most combined home runs in a game (13, Diamondbacks and Phillies in 2019), or maybe even the biggest blowout (27 runs, Rangers and Orioles in 2007). Using the batting statistics for the 2021 MLB season, I simulated every season until the year 2070 (121,500 games), and found some very interesting results.


In order to make this Simulation work, I needed real data from the 2021 MLB season. At the point of making this project, there had been around 121,000 plate appearances, with 12 different batting outcomes. Combining those with different scenarios on the bases can lead to a simulation that closely matches real life. An example of a scenario is shown below. This is for if the bases are empty, and the batter hits a single or a double.

if sum(bases) == 0 % if nobody on base
if indicator == 1 % if batter hits a single
bases(3) = 1; % sets first base to occupied
outs = outs + 0; % no outs added
if top_bottom == 0 % if top of inning, gives stats to away team
runsA = runsA + 0;
hitsA = hitsA + 1;
rbiA = rbiA + 0;
else               % if bottom of inning, gives stats to home team
runsB = runsB + 0;
hitsB = hitsB + 1;
rbiB = rbiB + 0;
end
elseif indicator == 2 % if batter hits a double
bases(2) = 1; % sets second base to occupied
outs = outs + 0; % no outs added
if top_bottom == 0 % if top of inning, gives stats to away team
runsA = runsA + 0;
hitsA = hitsA + 1;
rbiA = rbiA + 0;
else               % if bottom of inning, gives stats to home team
runsB = runsB + 0;
hitsB = hitsB + 1;
rbiB = rbiB + 0;
end

Similar copies of this code are added for each base scenario and batting event.

Over the 50 years simulated, the batting average was 0.239, which is very close to the actual batting average of 0.241. Similarly, the on-base percentage was 0.315 in the simulation, while it is 0.316 in reality (at least in the 2021 season). Now that we have an accurate representation of reality, we can start simulating the games.


Let’s see if we can beat the 100-year record of 26 innings. In the 983rd game of the 2048 MLB season, Team A beat Team B 5-3 after a grueling 28 inning game, making it the longest game in MLB history. After staying tied 3–3 for 19 innings, Team A finally broke the game open with a two-run inning in the top of the 28th, and then shutout Team B in the bottom of the 28th.

In the 1257th game of the 2062 MLB season, Team A routed Team B in a 26–2 stunner, making it the 3rd biggest blowout in MLB history (24 run differential). While I could simulate a few more sets of 50 years to try to find a record score, I think it would be better to just appreciate the 30–3 Rangers victory over the Orioles in 2007, the biggest blowout in MLB history.

In the 1045th game of the 2050 MLB season, Team A beat Team B 18–9 with both teams combining for 14 home runs in the game. This beats the previous record from 2019 when the Diamondbacks and the Phillies faced off and combined for 13.

In the 2302nd game of the 2070 MLB season, Team B beat Team A 19–14 with a combined 33 runs. While this is an incredible feat for both teams, it unfortunately does not come close to the record from 1922 at 49 runs. Looks like we will have to wait longer than 50 years to see any two teams get close to that record.

For the last scoring record I wanted to beat, I chose combined grand slams in one game. In the 1509th game of the 2067 MLB season, Team A beat Team B 15–9 with 4 grand slams. This beats the previous record of 3 grand slams set by 4 different sets of teams ranging from 1986 to 2015.


There are many different directions I could take next with this simulation. Longest game with a no-hitter or perfect game? Most triples in a game? Most consecutive home runs? All doable with the current program. I could also take a more analytical approach and attempt to find the one statistic that matters the most when looking at scores. Do teams with the higher batting average win the game most often? What percentage of teams win after leading through 4 innings? All interesting topics that could be explored in a later article.

Breaking some of the scoring records proved to be an interesting and rather fun task. The 28-inning thriller proved that the 1920 game between the Robins and the Braves probably would have gone on for a few more innings at the least. The 24-run deficit suffered by Team B was no match for the 27-run differential the Rangers put on the Orioles in 2007. If anything, this project showed how rare these types of games are, and that they should be appreciated for that rarity.


Related Articles