A Full-Stack Machine Learning Web App that Predicts Rebounding Probabilities
First of all, let me show what the app eventually looks like:

As is indicated from the image above, users will be able to get the probability for each player/side to get the rebound, which is useful to Basketball operators when they need to arrange tactics about rebounding. And some of the takeaways from the GIF above are:
- We need an interface in the frontend that allows users to place the players on the court by dragging the items;
- we need a kernel in the backend that can predict any individual probability for a player to grab the rebound (and the summed probability for a team) based on their location on the court;
- we need to have a server that hosts the website 24/7.
In the rest of the article, I’m going to describe my ideas in that order.
Frontend design
There are two main difficulties we need to tackle during the development:
- how can we enable users to place the players by dragging the elements on the panel?
- how should I transport the data from the frontend to the backend so that our model uses it for prediction?
For Difficulty #1, I fortunately found this useful link which enables any elements are draggable on the web page. What I updated is that I confined the div elements representing the players within the panel:
For difficulty #2, my workaround is to insert an invisible
Up to now, we can let users place the player wherever they want them to be, and the machine is also able to recognize the locations.
Machine learning model training
To make it clear, we need a model that can predict the odds for any individual player or a team on the court to get a rebound. In other words, the input would be the coordinates of players’ locations while the output is the rebounded player’s name and his side.
Let’s take a look at my original data:


Note:
- The location represents where a player is standing when the shot is thrown, and it is represented by x-y coordinates.
- The X coordinate is measured in feet and represents the distance from the center of the court, length-wise. -47 represents the baseline of the offensive team’s end. 47 represents the baseline of the defending team’s end.
- The Y coordinate is measured in feet and represents the distance from the basket, width-wise. -25 represents the right side of the court, 25 represents the left side of the court (for someone facing the offensive basket).
- In the output, there are empty values which mean that the corresponding shots or free throws are made. Since we only care about the scenarios where there are rebounds, we need to handle this later.
Data Cleaning
To begin with, let’s take out the undesirable row values:
# remove the rows where the shots or free throws were made
train = train[train['f.oreb'].isna()==False]
Next, match the rebounded player id with his position in his team and the team condition (offending/defending):
# target columns is a list containing the input columns' names
target_columns = []
for event in ['off', 'def']:
for i in range(1, 6):
target_columns.append('playerid_' + event + '_player_' + str(i))
reb_player_id_df = train[target_columns].eq(train['reb_player_id'], axis = 0)
reb_player_position_df = reb_player_id_df.idxmax(1).where(reb_player_id_df.any(1)).dropna()
# encode all players on court
# 1~5 means a player is an offending one while 6~10 means a defending one
position_code = {
'playerid_off_player_1': 1,
'playerid_off_player_2': 2,
'playerid_off_player_3': 3,
'playerid_off_player_4': 4,
'playerid_off_player_5': 5,
'playerid_def_player_1': 6,
'playerid_def_player_2': 7,
'playerid_def_player_3': 8,
'playerid_def_player_4': 9,
'playerid_def_player_5': 10
}
output = reb_player_position_df.apply(lambda x: position_code[x])
# reset the index
output = output.reset_index(drop=True)
Now, to many degrees, normalized data usually performs better in machine learning because it reduces the influences from outliers and avoids falling into local optimal points. Thus, since we have a certain range for both x and y coordinates, we could try Min-Max normalizer:
train[[col for col in location_columns if '_y_' in col]] = (25 - train[[col for col in location_columns if '_y_' in col]]) / (25 - (-25))
train[[col for col in location_columns if '_x_' in col]] = (47 - train[[col for col in location_columns if '_x_' in col]]) / (47 - (-47))
Now the data is ready to go!
Model selection
I tried a suite of models that are expected to perform well in probability prediction, including Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Gaussian Naive Bayesian Classifier, and Multinomial Naive Bayesian Classifier.
# define models
models = [LogisticRegression(n_jobs=-1), LinearDiscriminantAnalysis(), QuadraticDiscriminantAnalysis(), GaussianNB(), MultinomialNB()]
Cross-validation:
names, values = [], []
# evaluate each model one by one
# and store their names and log loss values
for model in models:
# get a name for the model
name = type(model).__name__[:15]
scores = evaluate_model(train, LabelEncoder().fit_transform(output), model)
# output the results
print('>%s %.3f (+/- %.3f)' % (name, np.mean(scores), np.std(scores)))
names.append(name)
values.append(scores)

The result shows that Linear Discriminant Analysis outperforms all other counterparts, so I selected it as my kernel algorithm.
# save the model
dump(LDA, 'LDA.joblib')
Backend Design
With the model in hand, it is time to synthesize it into a model that uses it. In this stage, my major tool is Python and Flask. Flask is a lite web development framework written in Python which is commonly applied in Machine Learning Products. It is simple, flexible, and allows users to decide what to implement and how to control their apps.
Basically, we need a web page (Homepage) that shows the panel and other ones when the locations are assigned. The homepage is accessed by "GET" methods when a session starts while the result page by "POST" because it will not show up until a user clicks the "submit" button.
Wow! Now we made it and our development is complete! As long as you do the following commands under the working directory, you can see the app at 127.0.0.1:5000 on your browser.
> set FLASK_APP=app.py (for windows; for linux, it should be "export FLASK_APP=app.py")
> flask run
Server Deployment
There are a lot of options for you to host a web app. Now my choice is Heroku for its simplicity. Usually, I would like to deploy my apps via GitHub and this one is not an exception. So the first step now is to create a new repo for your program, including your main function (app.py), static directory, and templates directory. Here is how your folder should look like now:
├── README.md
├── app.py
├── LDA.joblib
├── templates
│ ├── index.html
│ ├── output.html
│ └── output2.html
├── static
├── bar.png
├── basketball.jpg
└── court.png
When you gather these pieces of stuff, two more things should also be included:
- requirements.txt, a list of dependencies in need, you can collect them all quickly by:
pip freeze > requirements.txt
Note that after the freezing, make sure to append gunicorn in requirements.txt.
Here is what is needed throughout the development:
Flask==1.1.2
joblib
matplotlib
pandas
scikit-learn
gunicorn
- Procfile, this records the command that will let Heroku server know what to do to activate the application. Because our app is developed using Flask, our command should be:
web: gunicorn app:app --log-level debug
All set! You can upload this folder to your newly created repo now:
git init
git clone your_github_repo_link
git add .
git commit -m "Initial Commit"
git push
When we finish handling the issues on GitHub, it is time to look at Heroku. Make sure you have already signed up on Heroku and still have available slots.
If so, create a new application on your personal dashboard:

When it is done, it usually redirects to the homepage of the app. So you click on the "Deploy" tab and choose to deploy the app via GitHub:

And select the repo you want it to connect with:

Finally, there is only one more step to go: click "Deploy Branch".

Horray! you made it and now you can see what you have created on the link that the slot provides!
Conclusions
This is my first time doing something with the traditional frontend tools (i.e. HTML, JS, and CSS), so I would consider it as one of my trials to challenge the limit since I usually stay within the comfort zone using more convenient tools like Python and streamlit. Thus my takeaway from this project is that never let your tools confine you, instead, let them serve your need!
And you need the whole source code of this project, here is the link: