The world’s leading publication for data science, AI, and ML professionals.

Pythonic Counting: An Overview

Counting elements is critical! Learn how to do it eloquently!

Photo by Crissy Jarvis on Unsplash
Photo by Crissy Jarvis on Unsplash

In Python or any other programming language when trying to solve problems and/or manipulate a dataset, counting the number of elements with a given property is a frequently occurring task. In a Data Science project, manipulating data through loops will frequently require understanding how many items are being processed (i.e. counting). For this reason, it’s important to understand a simple and easy way to enable this. Python has many different ways to do that and in this article, we’ll go over a basic way to do that as well as more eloquent ways to go about counting operations.

Photo by Isabella and Zsa Fischer on Unsplash
Photo by Isabella and Zsa Fischer on Unsplash

Counting Within Loops:

At the core of counting operations, a variable is assigned to represent the count and is usually an integer data type. As mentioned above within a loop a the counter variable will be asked to increase by a count of one if certain criteria are met. The code block below gives a brief example:

# Option #1
list = ['foo', 'bar', 'foo', 'foo', 'bar', 'foobar']
counter = {}

for item in list:
    if item not in counter:
        counter[item] = 0
    counter[item] += 1

In[1]: counter
Out[1]: {'foo': 3, 'bar': 2, 'foobar': 1}

In the example above, rather than setting the counter to a vanilla integer variable, a dictionary was utilized to generate a key-value pair of the counted item and its respective count. This is a nice additional thing to have it built into a dictionary, but below are two more examples of how to do the same thing, but a little more concise.

# Option #2
list = ['foo', 'bar', 'foo', 'foo', 'bar', 'foobar']
counter = {}
for item in list:
    counter[item] = counter.get(item,0) + 1
In[2]: counter
Out[2]: {'foo': 3, 'bar': 2, 'foobar': 1}
# Option #3
from collections import defaultdict

list = ['foo', 'bar', 'foo', 'foo', 'bar', 'foobar']
counter = defaultdict(int)
for item in list:
    counter[item] += 1
In[3]: counter
Out[3]: defaultdict(int, {'foo': 3, 'bar': 2, 'foobar': 1})

So looking above Option 2 and 3 both provide some clean alternative to counting within the loops. The 2nd option utilizes the dict.get() method and assigns the initial value as 0. Through the loop, the ‘+ 1’ part adds for the given key when it is encountered increases the value by 1. Taking a closer look at the 3rd option, we can see the defaultdict item is imported from the collections library and it was initialized as an object with an integer data type. The loop is a simple one saying that as the deafultdict encounters the item in the list increments the value for the item (the key in the dict) by 1. Since it’s a deafultdict, the initial value for any new item (or key) is by default 0. So here we have 3 ways to utilize a counting operation and variable through loops with the latter examples simplifying the code a bit.

Photo by Alex Chumak on Unsplash
Photo by Alex Chumak on Unsplash

Python’s Counter Subclass

Fortunately, since counting is such an important process in projects, Python has a Counter subclass built within the dict object. What makes this special, is that for sequences of hashable objects, it will automatically build and output a dictionary of the elements with their counts. We can see in the example below, that the Counter subclass simplifies things even more.

from collections import Counter

list = ['foo', 'bar', 'foo', 'foo', 'bar', 'foobar']
count = Counter(list)
In[4]: count
Out[4]: Counter({'foo': 3, 'bar': 2, 'foobar': 1})

There is no need to iterate across the list since that is inherent in the Counter. An interesting thing is that in the list you can assign items a value in the counter to get values of 0 or even negative values (which would depend on your convention or process, like outgoing inventory).

A very nice feature of the Counter is that once you assign a variable to count something if there are state changes and you want to update the count of everything, there is an update method. The update allows the count that was already performed to be updated by a new dictionary or counted object. The example below shows how this method can be applied.

from collections import Counter

list = {'foo': 4, 'bar': 3, 'foobar': 1}
count = Counter(list)

new_list = {'foo': 3, 'bar': 1, 'foobar': 3}
count.update(new_list)

In[5]: count
Out[5]: Counter({'foo': 7, 'bar': 4, 'foobar': 4})

Notice the output of count at the end is the sum of the list and new list via the update method.

Photo by Lachlan Donald on Unsplash
Photo by Lachlan Donald on Unsplash

Useful Counter Features:

Two additional things that can be extracted from the Counter are getting the counts of unique items as well as getting a sorted list of the top ‘n’ items in a list. To get the count of a unique item it is simply using a bracket and the item’s name inside of the bracket following the variable assigned to the Counter. The most_common method can also be applied with the argument inside limiting the output to the top ‘n’ items in the list. Examples of these two things can be seen below.

# Count of Named Element
In[6]: count['foo']
Out[6]: 7
# Example of Top 'n' items
In[7]: count.most_common(2)
Out[7]: [('foo', 7), ('bar', 4)]

Note that the most_common method will order the list in descending order and limit the count to the number used as the argument.

Photo by Campaign Creators on Unsplash
Photo by Campaign Creators on Unsplash

Histograms and Plots:

By understanding how to build custom counting methods, you can then take the data to the next level by visualizing the data. In the simple example below we take the same data that we’ve been using and apply a simple bar chart to the data. I’ve modified the example a little bit to make the syntax more clearer and to not use strings in the arguments, but the results are the same. The code below shows how I take the dictionary that is created from the counter and assign them to variables that I can then use to plot.

from collections import Counter
import matplotlib.pyplot as plt

count = Counter(foo=4, bar=3, foobar=1)

new_list = Counter(foo=3, bar=1, foobar=3)
count.update(new_list)

x = count.keys()
y = count.values()

plt.bar(x,y)
plt.show()
Simple Histogram of the counting example
Simple Histogram of the counting example

Note, that there are many more complexities that can be added to this and the types of visualizations can be increased as well.

Wrap Up:

I just wanted to close to show how counting is a simple conceptual problem that we may need to do to inspect and visualize our dataset and that within Python there are multiple ways to skin a cat, as well as a lot of cool built-in features within the core Python.


Related Articles