
Kite diagrams are classically used in both ecology and Biology studies and also form part of the school syllabus on A-Level Biology courses in the UK. Despite this, there are few options for creating these diagrams in standard software visualisation packages, and most seem to still be hand drawn. This short post will explain how to automate this process with Python 3 and the matplotlib library.
So what are Kite diagrams? Kite diagrams provide a graphical summary of different observations made along a transect. A transect is a line placed across a part of a habitat, or an entire habitat. This is often done manually with string, rope etc. The quantity of various species can be counted at regular intervals along the transect. The distribution of the various species can be affected by various different factors including predators and also by other environmental factors like heat, light and moisture levels. These are referred to as abiotic (not alive) factors. Data can also be collected using a quadrat, this involves using a square (e.g. 1m2) frame that is moved along the transect. The number of species in the square can then be counted at each point.
Kite diagrams are a way of seeing the change in abundance of the various species along a transect.
This allows researchers to see the relative abundance of certain species in different places in a habitat such as a seashore. There may be many types of grass, plants and insects for example distributed over the shore at different points.

These diagrams are often produced by hand, and there seems to be little in the way of support for them in standard visualisation packages. We found one example in Excel (Luke, 2019) and one produced using R (Hood, 2014), but nothing using Python. As Python is being used increasingly to analyse data, we thought we would have a go at implementing a simple Kite diagram using Python. This was done using an interactive Jupyter notebook with Python 3. We present the process here for those in other fields that may find automating such diagrams useful. We have deliberately tried to keep the implementation basic for beginners.
For the example, we will use Python’s ‘pandas’ library to represent the dataset that we will use. This dataset was entered into Excel and saved as a Comma Separated Values (CSV) file. We will also use the numpy library to extract columns from the dataset and apply operations to them. As is the convention in Python, we can refer to these libraries using a short hand reference (pd for pandas and np for numpy).
import pandas as pd
import numpy as np
Importing the data
Next, we load the dataset into a pandas dataframe object using the pandas read_csv() function, supplying the path to the CSV file. We store this data in a variable called kite_data that can then be viewed in the notebook (or other Python environment).
kite_data = pd.read_csv("./biology/kitedata.csv")
kite_data

The first column of the data should represent the distances. This will be used for the horizontal axis. The remaining columns represent the frequency of species or sometimes the percentage cover of certain plants that will be plotted at intervals on the y-axis.
Create the Kite plot function
The next stage is to create a function to generate the Kite diagram. This builds on the matplotlib library that is widely used to generate a large variety of visualisations. You can see some examples in the gallery here https://matplotlib.org/3.1.1/gallery/index.html. We will import the library and refer to it as plt. We will also need to use the Polygon function to draw the kite shapes on the diagram.
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
After importing the libraries we can then create the function. The first column of the dataframe should be the distance or other measure of segmentation of the area (e.g. quadrats). Here we use the iloc feature which stands for integer location. This is a way of referencing columns by a number (0 to number of columns) rather than by the columns name. We can extract and store the first column of distances for use on the x-axis later. We also create an empty list to store the start points. This is used to position the individual Kite shapes on the y-axis. We also get the column names and store this in the y_values variable for plotting the species names on the y-axis later. Finally, we get the number of columns in the dataframe and store this in a variable called num_cols so we know how many species we need to add to the diagram.
def kite_diagram(df, axis_labs):
"""Function to draw a kite diagram."""
plt.axes()
start_points = []
v1 = np.array(df.iloc[:, [0]])
y_values = df.columns
y_values = np.delete(y_values, 0)
num_cols = len(df.columns) - 1
Now we need to get the maximum value from the dataset (not including the distances column) so that we can position the kite shapes on the diagram with adequate vertical spacing. We want to space the different Kite plots by the maximum distance so that they don’t overlap each other as this would make them unreadable. To do this we get all the columns apart from the first (distance) column, then we use the max() function to determine the maximum value in the columns.
df_cols = df.iloc[:, 1:len(df.columns)]
max_val = max(df_cols.max())
As Python uses an indexing system that starts at 0, we select 1 to the number of columns (length of columns) to exclude the first (distance) column which is stored at location zero. Next, we store the maximum value in the data in the variable called max_val. Each column apart from the first should represent a different species, so we need to loop over each of these columns and make a Kite shape for each species. As Python indexes from 0, we will start at 1 to skip the distance column.
for j in range(1, num_cols + 1):
p1 = []
p2 = []
The p1 and p2 lists will store the coordinate points for the polygons (Kite shapes) for each species. There are 2 lists because the plot essentially shows a mirror image of the same shape above and below the base line as seen in the figure below. This is achieved by halving each value and projecting a pair of points with one value above and the other below the horizontal base line by this halved value such that the distance between each pair represents the original total value. For example, a value of 8 will be 4 units above the baseline and 4 units below the baseline.

To code this, we need to get the mid-point of each data value. This is done by dividing each of the values by 2. We can do this easily with numpy. We can take each of the columns values and turn them into numpy arrays. We can then divide each value in the array by 2.
v2 = np.array(df.iloc[:, [j]]) / 2
It should be noted that operations cannot be applied to entire lists using standard Python lists. The figure below illustrates this, showing an error when we try to divide the list by 2.

If however we use a numpy array instead, the operation will be applied to all the values in the list:

Next we want to work out if this is the first Kite we are adding to the diagram, if so we want to position the vertical baseline at half the maximum value found in the dataset so we have enough space to draw the required pattern above and below the baseline. For all the other subsequent Kite patterns, we will add the maximum value in the dataset to the previous starting point to evenly space them vertically.
if j == 1:
start_point = max_val / 2
else:
start_point = start_point + max_val
We also store these start points for the baseline of each species in a list for labelling the plot later. Additionally, we make the first points for our polygons (above and below the line) zero so we don’t have any gaps when we start to draw to the shapes. We do this either end of the whole shape to prevent unwanted gaps.
start_points.append(start_point)
p1.append([0, start_point])
p2.append([0, start_point])
Generate the points for the Kite shapes
For all the subsequent points we will loop through all the values and add or subtract the half values we computed and stored in the v2 variable for values both above and below the line. The pattern should be the same above and below so we store the above line points with the horizontal distance (v1) in a variable called p1 (polygon 1) and the same for below the line in a variable called p2 (polygon 2). Finally after going through all the values we add an additional pair of values to both polygons to bring the lines back down to the starting point. Again, this avoids any gaps in the pattern at the end of the shape.
for i in range(0, len(v1)):
p1.append([v1[i], start_point + v2[i]])
p2.append([v1[i], start_point - v2[i]])
p1.append([v1[i], start_point])
p2.append([v1[i], start_point])
What we end up with is a list of points with sets of coordinates. The first of each pair is the position on the x-axis (horizontal position) that goes from 0 to 20 in our example. The second number of the pair in p1 is the position above the baseline on the diagram (the vertical y-axis) whereas in p2 this is the position below the baseline:
p1 = [[0, 0], [2, 0], [4, 0], [6, 0], [8, 1.5], [10, 2], [12, 4], [14, 4], [16, 3.5], [18, 2.5], [20, 2], [20, 0]]
p2 = [[0, 0], [2, 0], [4, 0], [6, 0], [8, -1.5], [10, -2], [12, -4], [14, -4], [16, -3.5], [18, -2.5], [20, -2], [20, 0]]
Add the shapes to the plot
We can now use these points to create and add the polygons to the plot with the Polygon() function.
c = np.random.rand(3,)
l1 = plt.Polygon(p1, closed=None, fill=True, edgecolor=c, alpha=0.4, color=c)
l2 = plt.Polygon(p2, closed=None, fill=True, edgecolor=c, alpha=0.4, color=c)
We assign a random colour for each of the Kite shapes and store this value in a variable called c. We create 2 polygons with matplotlibs’ Polygon() function storing them in variables l1 and l2. The first argument is the data points (p1 or p2), next we set some of the other optional parameters, we want the shape to be filled so we set this to True, we add a colour for the edge. In this case we use the same colour as we use for the whole shape. The alpha value can be adjusted to add some transparency to the shape. This can help if there are any overlaps in the shapes or just to make the colours less intense. Finally, we add a fill colour.
The polygons can now be added to the plot with the add_line() function. The gca() function Gets the Current Axis of a plot or creates one.
plt.gca().add_line(l1)
plt.gca().add_line(l2)
Finishing touches
Finally, after looping through all columns and adding species Kite shapes, we can add additional features to the entire plot after the main loop.
plt.yticks(start_points, y_values)
plt.xlabel(axis_labs[0])
plt.ylabel(axis_labs[1])
plt.axis('scaled')
plt.show();
Recall that we added the column names (species) to a variable called y_values, we can add these names to the start_points locations we stored using the yticks() function to line up the species names with the baselines. The next 2 lines add labels for the x axis and y axis which we pass into the function in a list. The ‘axis scaled’ option changes the dimensions of the plot container rather than the data limits. The other option that can be used is ‘equal’ so that the x,y points have equal increments. Finally, the show() function will render the plot.
Lastly, to draw the plot we need to call the function providing the dataset and x/y axis labels in a list.
kite_diagram(kite_data, ['Distance', 'Species']);
This output can be seen side-by-side next to the hand drawn version:

The code for the full function can be seen below:
def kite_diagram(df, axis_labs):
"""Function to draw a kite diagram."""
plt.axes()
start_points = []
v1 = np.array(df.iloc[:, [0]])
y_values = df.columns
y_values = np.delete(y_values, 0)
num_cols = len(df.columns) - 1
df_cols = df.iloc[:, 1:len(df.columns)]
max_val = max(df_cols.max())
for j in range(1, num_cols + 1):
p1 = []
p2 = []
v2 = np.array(df.iloc[:, [j]]) / 2
if j == 1:
start_point = max_val / 2
else:
start_point = start_point + max_val
start_points.append(start_point)
p1.append([0, start_point])
p2.append([0, start_point])
for i in range(0, len(v1)):
p1.append([v1[i], start_point + v2[i]])
p2.append([v1[i], start_point - v2[i]])
p1.append([v1[i], start_point])
p2.append([v1[i], start_point])
c = np.random.rand(3,)
l1 = plt.Polygon(p1, closed=None, fill=True, edgecolor=c, alpha=0.4, color=c)
l2 = plt.Polygon(p2, closed=None, fill=True, edgecolor=c, alpha=0.4, color=c)
plt.gca().add_line(l1)
plt.gca().add_line(l2)
plt.yticks(start_points, y_values)
plt.xlabel(axis_labs[0])
plt.ylabel(axis_labs[1])
plt.axis('scaled')
plt.show();
Python has a wide variety of visualisations available through libraries such as matplotlib, seaborn, ggplot etc. these libraries can also be easily extended to add additional types of visualisation, as seen here. This provides a rich basis for scientific visualisation in multiple scientific fields. Modern languages with support for data science such as R and Python encourage the development of such visualisations, providing the tools to make such plots relatively simple to implement.
References
[1] Luke, K (2019) Best Excel Tutorial: Kite Chart [online]. Accessed: 07–07–2020 https://best-excel-tutorial.com/56-charts/267-kite-chart
[2] Hood, D (2014) RPubs: Kite Graphs in R [online]. Accessed: 07–07–2020 https://rpubs.com/thoughtfulbloke/kitegraph
With thanks to Victoria Golas who contributed to the writing of this post and provided the hand drawn version of the diagram.