Python is great for Data Visualization! Matplotlib is very fast and robust but lacks the aesthetic appeal. Seaborn library built over matplotlib has greatly improved the aesthetics and provides very sophisticated plots. However when it comes to scatter plots, these python libraries do not have any straight forward option to display labels of data points. This feature is available in other data visualization tools like Tableau and Power BI, with just a few clicks or hovering the pointer over the datapoints.
In this article, I will explain how to add text labels to your scatter plots made in seaborn or any other library which is built on matplotlib framework.
The Data

The dataset is English Premier League table. We are interested in three columns: i. Team : Team Name ii. G : Goals Scored iii. GA : Goals Conceded
Scatter Plot : Goals Scored vs Goals Conceded
A simple scatter plot can plotted with Goals Scored in x-axis and Goals Conceded in the y-axis as follows.
plt.figure(figsize=(8,5))
sns.scatterplot(data=df,x='G',y='GA')
plt.title("Goals Scored vs Conceded- Top 6 Teams") #title
plt.xlabel("Goals Scored") #x label
plt.ylabel("Goals Conceded") #y label
plt.show()

Label Specific Items
Most often scatter plots may contain large amount of data points, we might be interested how some specific items fare against the rest. Labelling all the data points may render your plot too clunky and difficult to comprehend. For example, if we are examining a socio-economic statistic of USA, it makes no sense to display the labels of all countries in scatter plot. It would be useful if USA’s and other selected competitors data is labelled so that we can understand how these countries are performing with respect to each other and rest of the world. Coming to our dataset, I am a Totenham Hotspur(TOT) fan and am interested only in the performance of TOT against the other teams. I can add the label using plt.text()
Syntax:
plt.text(x=x coordinate, y=y coordinate, s=string to be displayed)
He x and y are Goals scored and Goals conceded by TOT respectively. The string to be displayed is "TOT". x, y and s are positional arguments and need not be explicitly mentioned if their order is followed.
plt.text(df.G[df.Team=='TOT'],df.GA[df.Team=='TOT'],"TOT", color='red')
Additional arguments like color, size, alpha(transperency) etc. can be used to change to text format. It can also be grouped within fontdict to make your code easy to read and understand.
plt.text(df.G[df.Team=='LIV'],df.GA[df.Team=='LIV'],"LIV",
fontdict=dict(color='black', alpha=0.5, size=16))

Adding Background Box
bbox parameter can be used to highlight the text.
sns.scatterplot(data=df,x='G',y='GA')
plt.text(x=df.G[df.Team=='TOT']+0.3,
y=df.GA[df.Team=='TOT']+0.3,
s="TOT",
fontdict=dict(color='red',size=10),
bbox=dict(facecolor='yellow',alpha=0.5))
Note that an indentation of 0.3 is added to x and y coordinates so that the text and the background box does not overlap with the datapoint. It is optional but can improve the aesthetics of the chart.

Labelling All Points
Some situations demand labelling all the datapoints in the scatter plot especially when there are few data points. This can be done by using a simple for loop to loop through the data set and add the x-coordinate, y-coordinate and string from each row.
sns.scatterplot(data=df,x='G',y='GA')
for i in range(df.shape[0]):
plt.text(x=df.G[i]+0.3,y=df.GA[i]+0.3,s=df.Team[i],
fontdict=dict(color='red',size=10),
bbox=dict(facecolor='yellow',alpha=0.5))

Final Touch
We have completed constructing a labelled scatter plot. However, we can observe that a few text boxes are jutting out of the figure area. It would be aesthetically more pleasing if the text could be wrapped within the plot’s canvas. This can be done by changing the position, size etc. of the text. I generally achieve this by increasing the plot area by using xlim() and ylim() functions in Matplotlib. In the below code you can see how I have applied a padding of 1 unit around the plot while setting x and y limits.
plt.figure(figsize=(8,5))
sns.scatterplot(data=df,x='G',y='GA')
for i in range(df.shape[0]):
plt.text(x=df.G[i]+0.3,y=df.GA[i]+0.3,s=df.Team[i],
fontdict=dict(color='red',size=10),
bbox=dict(facecolor='yellow',alpha=0.5))
plt.xlim(df.G.min()-1,df.G.max()+1) #set x limit
plt.ylim(df.GA.min()-1,df.GA.max()+1) #set y limit
plt.title("Goals Scored vs Conceded- Top 6 Teams") #title
plt.xlabel("Goals Scored") #x label
plt.ylabel("Goals Conceded") #y label
plt.show()

If you know any better methods of wrapping the elements in plot within the canvas area please let me know in comments.
Resources:
You can check out the notebook for this article in GitHub.
Become a Member
I hope you like the article, I would highly recommend signing up for Medium Membership to read more articles by me or stories by thousands of other authors on variety of topics. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.
Here are some other stories you may be interested in.
