Bite-Sized Python Recipes — Vol. 2

A collection of small useful functions in Python

Ehsan Khodabandeh
Towards Data Science

--

Hand holding a fork on top of a piece of chocolate cake
Photo by Jordane Mathieu on Unsplash

Disclaimer: This is a collection of some useful bite-sized functions I’ve found around the web, mainly on Stack Overflow or Python’s documentation page. Some may look trivial, but one way or another, I’ve used them all in my projects and I think they are worth sharing. You can find all of them (with some additional comments) in this notebook which I try to keep up to date. If you are interested, you can check my first blog on bite-sized functions here!

Unless necessary, I intend not to over-explain the functions. So, let’s begin!

Return the First N Items of an Iterable

import itertools as itdef first_n(iterable, n):
""" If n > len(iterable) then all the elements are returned. """
return list(it.islice(iterable, n))

Example:

>>> d1 = {3: 4, 6: 2, 0: 9, 9: 0, 1: 4}
>>> first_n(d1.items(), 3)
[(3, 4), (6, 2), (0, 9)]
>>> first_n(d1, 10)
[3, 6, 0, 9, 1]

Check If All the Elements of an Iterable are the Same

import itertools as itdef all_equal(iterable):
""" Returns True if all the elements of iterable are equal to each other. """
g = it.groupby(iterable)
return next(g, True) and not next(g, False)

Example :

>>> all_equal([1, 2, 3])
False
>>> all_equal(((1, 0), (True, 0)))
True
>>> all_equal([{1, 2}, {2, 1}])
True
>>> all_equal([{1:0, 3:4}, {True:False, 3:4}])
True

When you have a sequence, the following alternative is usually even faster. (Make sure you test it for yourself if you’re working with very large sequences.)

import itertools as itdef all_equal_seq(sequence):
""" Only works on sequences. Returns True if the sequence is empty or all the elements are equal to each other. """
return not sequence or sequence.count(sequence[0]) == len(sequence)

Example: You have a list of trucks and you can check whether they are in the warehouse or en route. As the day progresses, the status of each truck changes.

import random
random.seed(500)
# Just creating an arbitrary class and attributes
class Truck:
def __init__(self, id):
self.id = id
self.status = random.choice(('loading-unloading', 'en route'))
def __repr__(self):
return f'P{str(self.id)}'
trucks = [Truck(i) for i in range(50)]

In the morning you checked and saw that the first truck is en route. You heard that three others are also left the warehouse. Let's verify this:

>>> all_equal_seq([t.status for t in trucks[:4]])
True

Sum an Iterable with None

When you have numpy arrays or pandas Series or DataFrame, the options are obvious: numpy.nansum or pandas.DataFrame/Series.sum. But what if you don't want or can't use those?

def sum_with_none(iterable):
assert not any(isinstance(v, str) for v in iterable), 'string is not allowed!'
return sum(filter(None, iterable))

This works because filter treats None as the identity function; i.e., all falsy elements of iterable are removed.

Example:

>>> seq1 = [None, 1, 2, 3, 4, 0, True, False, None]
>>> sum(seq1)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
>>> sum_with_none(seq1) # Remember True == 1
11

Check if N or Fewer Items are Truthy

def max_n_true(iterable, n):
""" Returns True if at most `n` values are truthy. """
return sum(map(bool, iterable)) <= n

Example:

>>> seq1 = [None, 1, 2, 3, 4, 0, True, False, None]
>>> seq2 = [None, 1, 2, 3, 4, 0, True, False, 'hi']
>>> max_n_true(seq1, 5)
True
>>> max_n_true(seq2, 5) # It's now 6
False

Check If Exactly One Element in an Iterable is True

def single_true(iterable):
""" Returns True if only one element of iterable is truthy. """
i = iter(iterable)
return any(i) and not any(i)

The first part of the function ensures that the iterator has any truthy value. Then, it checks from that point in the iterator onward to make sure there is no other truthy value.

Example: Putting a couple of the above functions to use!

# Just creating an arbitrary class and attributes
class SampleGenerator:
def __init__(self, id, method1=None, method2=None, method3=None,
condition1=False, condition2=False,
condition3=False):
"""
Assumptions:
1) One and only one method can be active at a time.
2) Conditions are not necessary, but if passed, maximum one can have value.
"""

# assumption 1
assert single_true([method1, method2, method3]), "Exactly one method should be used"
# assumption 2
assert max_n_true([condition1, condition2, condition3], 1), "Maximum one condition can be active"
self.id = id

The first sample below (sample1) is valid, but the others violate at least one assumption, causing an AssertionError (to avoid clutter, I didn’t show any of those here.) Run them in the notebook to see the errors for yourself.

>>> sample1 = SampleGenerator(1, method1='active')  # Correct
>>> sample2 = SampleGenerator(2, condition2=True) # no method is active
>>> sample3 = SampleGenerator(3, method2='active', method3='not-active') # more than one method has truthy value
>>> sample4 = SampleGenerator(4, method3='do something', condition1=True, condition3=True) # multiple conditions are active
>>> sample5 = SampleGenerator(5) # nothing is passed

Skip Redundant Headers When Writing to CSV

Suppose you need to run a series of simulations. At the end of each run (which may even take several hours), you record some basic statistics and want to create or update a single restults.csv file that you use to track outcomes. If so, you probably want to skip writing headers to file after the first time.

First, let’s create some data to play with:

import pandas as pd
import random
# An arbitrary function
def gen_random_data():
demands = [random.randint(100, 900) for _ in range(5)]
costs = [random.randint(100, 500) for _ in range(5)]
inventories = [random.randint(100, 1200) for _ in range(5)]
data = {'demand': demands,
'cost': costs,
'inventory': inventories}
return pd.DataFrame(data)# Let's create a few df
df_list = [gen_random_data() for _ in range(3)]

Now, let’s assume that we need to write each of df_list dataframes to orders.csv as soon as they are created.

import osfilename = 'orders.csv'
for
df in df_list:
df.to_csv(filename, index=False, mode='a',
header=(not os.path.exists(filename)))

If you don’t need to loop over similar dataframes one at a time, the alternative below is a concise way to write them to file:

pd.concat(df_list).to_csv(‘orders2.csv’, index=False)

Convert a CSV File to Python Objects

Assume you need to create a collection of Python objects, where their attributes come from the columns of a CSV file, and each row of the file becomes a new instance of that class. However, let’s say that you don’t know ahead of time what are the CSV columns, and thus you can’t initialize the class with the desired attributes.

Below, you can see two ways to achieve that:

class MyClass1(object):
def __init__(self, *args, **kwargs):
for arg in args:
setattr(self, arg, arg)
for k, v in kwargs.items():
setattr(self, k, v)
class MyClass2:
def __init__(self, **kwargs):
self.__dict__.update(kwargs)

In MyClass1 we can pass both args and kwargs, while in MyClass2 we took advantage of the special __dict__ attribute.

Example: Let’s convert our orders.csv file from the example above to objects using both implementations.

import csvfilename = 'orders.csv'
class1_list = []
class2_list = []
with open(filename) as f:
reader = csv.DictReader(f)
for row in reader:
class1_list.append(MyClass1(**row))
class2_list.append(MyClass2(**row))
# Let's check the attributes of the first row of class1_list
>>> print(f'first row = {vars(class1_list[0])}')
first row = {'demand': '821', 'cost': '385', 'inventory': '1197'}

That’s it for now. If you also have some bite-sized functions that you use regularly, let me know. I’ll try to keep the notebook up-to-date on GitHub and yours can end up there too!

I can be reached on Twitter and LinkedIn.

--

--

Operations Research Scientist. I write about optimization, logistics, and occasionally Python!