Using K-Means Clustering Algorithm to Redefine NBA Positions and Explore Roster Construction

Published in

Towards Data Science

7 min readNov 7, 2019

Project Description and Motivation

Conventional positions within the NBA do not accurately reflect the playing style or functional role a player provides to their team. The overall style of play has changed drastically and various era’s within the NBA indicate that. Similarly a player’s style of play is also reflective of this change. Currently the league is fast paced with more floor spacing. An example that demonstrates this are Centers who shoot 3’s and stretch the floor for their teams. These centers that are multi-faceted but are still grouped with traditional Centers and there is no methodology to distinguish between the two. The purpose of this project is find a better approach to define these players roles based on the value they bring to their team.

Data Source

Information was scraped from Basketball-Reference and statistics for every player from 2011 to 2018 were collected. 2011 was used as the initial start year because it reflective of when positionless basketball started to take form (LeBron being the main facilitator in Miami and the start of the Golden State Dynasty). Approximatley 3000 observations were included in the final dataset. In total there were 30 features describing each player. Features include box-score metrics such as Points, Rebounds, Blocks, Steals. Advanced metrics were also used such as: USG%, PER, and Plus-Minus Score. All features were defined by a Per-100 possessions unit. This was done to ensure that players statistics were comparable regardless of how many minutes or games they played. Players who did not play more than 400 minutes were excluded from the dataset because they do not have significant impact on the game.

Initial Exploratory Data Analysis

To get a better understanding of my dataset I developed some intial visuals.

Scatterplot displaying League Average for 3 Pt % over time

Graph displaying League Average for various features (Ortg, Drtg, 3P%) over time

League Average for various features over time

These two graphs are examples focusing on certain features and indicate an upward trend over time. The features for these graphs include 3 pointers made, and other metrics demonstrating efficiency and overall offense. This change from a league wide perspective indicates that player role/style has also changed as the seasons evolved.

Graph illustrating Average Stats for Conventional Positions.

Average stats by conventional NBA positions

This graph cannot tell us about players that do not fit their conventional roles assigned. For example what about the Power Forwards that are also facilitators? Or Guards that have multi-demensional roles?

Pie chart showing even distribution of conventional positions

Clustering visualization of conventional positions

Clusters representing conventional NBA positosn which take all 30 features into consideration

Each dot represents a player. Players assigned to the same conventional position are scattered throughout the graph. Indicating that they do not have similar playing style. (Principal Component Analysis was used to reduce the dimensionality of the 30 features into 3 components).

After clustering…

Results After Clustering…

Data Science Methods Used

Principal Component Analysis (PCA) was used to reduce demensionality for the visuals. Since there are over 30 features describing each player it is impossible to create visual Clusters with so many features. After utilzing PCA, 90 % of the variation within all the features were still retained. They were simplified to 3 components, hence a 3D model. PCA was used for the visualization of the conventional position clusters and well as for the new roles after implmenting the k means clustering algorithm.
K-means clustering is an unsupervised algorithm where no labels are given. The basic idea is that centroids (n) are assigned, which then begins to cluster (group) based off of how close the observation (player) is to that centroid.
Elbow method and Silhouette score. In order to figure out how many clusters (new roles) are ideal, the silhoutte score, was utilized. Silhouette score explains the density of each cluster and the seperating from cluster to cluster. The elbow method displays as the number of clusters increass, how much the silhoutte score is changing by. Once the rate slows down significantly picking a higher the number of ‘n’ clusters isn’t very useful. For this project I ended up with 9 clusters total.
Scaling features. Since k-means is using a distance metric to evaluate and assign each observation which cluster they belong to, it is very important to scale all of the features. For example, 40 points is not comparable to 2 blocks, they are essentially different units. I also assigned weights to various features to give certain styles more emphasis. For example Assists and Turnovers can be said to represent ball handling so it was assigned a different weight when compared to the other features.

Results after K-Means Clustering

Pie Chart of New Roles

Clustering Visual with New Roles Assigned

Labels were assigned to the new 9 clusters by seeing the average stats for each cluster and then also looking at a list of players within each cluster. By using domain knowledge and what was revealed within the cluster a label was generated to describe that cluster.

The clusters are much more condensed and sorted now when compared the clusters generated for the conventional role. This indicates that the clusters are representative of their playing style/role and provides more insight to teams and fans.

Lets take a look at how labels were assigned to each cluster

In Depth Look at Each Cluster and How New Labels Were Assigned for New Roles

It’s important to note that although 5–6 players are listed under ‘Notable Players’, in actuality each cluster has a list of over 250 players each.

Cluster 1 : Perimeter Wing/Scorers

Notable Players:

Wilson Chandler
Jaylen Adams
Kevin Knox (Knicks Fan)
Courtney Lee
Stanley Johnson

Cluster 2: ‘Three & D’

Notable Players:

Trevor Ariza
Kent Bazemore
Will Barton
OG Anounoby
Taurean Prince
Robert Covington

Cluster 3: ‘Do it All’

Notable Players:

Kyle Anderson
Brandon Ingram
Danilo Gallinari
Kevin Love
Kelly Oubre
Zach Lavine

Cluster 4: Elite Wings

Notable Players:

- Harrison Barnes

- Jaylen Brown

- DeMarre Carroll

- Aaron Gordon

- Danny Green

- Eric Gordon

Cluster 5: Backup Bigs (Inside)

Notable Players:

- Jordan Bell

- Tyson Chandler

- Nene Hilario

- Kosta Koufos

- Aron Baynes

Cluster 6: Elite Bigs (Inside)

Notable Players:

Ed Davis
Tristan Thompson
Mason Plumlee
Greg Monroe
Zaza Pachulia
Marcin Gortat

Cluster 7: Star Bigs (Inside)

Notable Players:

Steven Adams
LaMarcus Aldridge
Andre Drummond
Jusuf Nurkic
Derrick Favors

Cluster 8: All Stars

Notable Players:

Bradley Beal
Devin Booker
Jimmy Butler
Blake Griffin
Tobias Harris
Klay Thompson
Russell Westbrook

Cluster 9: Superstars

Notable Players:

Giannis Antetokounmpo
Stephen Curry
Anthony Davis
Kevin Durant
James Harden
LeBron James

I then wanted to develop more practical insight and created some questions that explore roster diversity.

Business Insight Questions

What is the difference between Elite teams & Average teams when it comes to player roles/styles?
Do winning teams have more or less players with a specific role/style. Does roster diversity play a part in winning?

To answer these questions I compared the roster construction of teams that are considered ‘average’ (barely made playoffs or were in the 8th seed) over the last 4 years to teams that made the finals over the last 4 years. Both pools had an equal amount of teams from the Western and Eastern conference.

Roster pool of teams that made NBA finals over the last 4 years.

Answer:

Overall NBA Finalist teams have more star power and their inside players have a reserved/defined role. In comparison, the ‘Average’ NBA teams have less star power and rely on star inside bigs as their focal points. Teams that want to win should have their roster to be more refelctive of the pie chart for NBA finalist teams.

Hope you guys enjoyed my project ! Here is a link to my GitHub account which further explains the steps I took for this project. Lastly, if you want to discuss my project or just talk basketball, you can find me on LinkedIn.

Using K-Means Clustering Algorithm to Redefine NBA Positions and Explore Roster Construction

Project Description and Motivation

Data Source

Initial Exploratory Data Analysis

After clustering…

Data Science Methods Used

Results after K-Means Clustering

Clustering Visual with New Roles Assigned

In Depth Look at Each Cluster and How New Labels Were Assigned for New Roles

Cluster 1 : Perimeter Wing/Scorers

Cluster 2: ‘Three & D’

Cluster 3: ‘Do it All’

Cluster 4: Elite Wings

Cluster 5: Backup Bigs (Inside)

Cluster 6: Elite Bigs (Inside)

Cluster 7: Star Bigs (Inside)

Cluster 8: All Stars

Cluster 9: Superstars

Business Insight Questions

Answer:

Written by Haider Hussain