The world’s leading publication for data science, AI, and ML professionals.

Python *args and **kwargs – Data Science Edition

Make your code more elegant with arguments and keyword arguments.

If you’ve ever stumbled upon *args and **kwargs after looking at a function definition and wondered what the heck those are, you’re at the right place. Today we’ll go through both and show their use cases in your average machine learning task.

Photo by Aaron Burden on Unsplash
Photo by Aaron Burden on Unsplash

Both args and kwargs allow you to pass multiple arguments (hence args) and keyword arguments (hence kwargs) to a function. We’ll explore this in much more details in a minute.

The ideal reader is someone familiar with the basics of Python programming language and interested in Data Science. The second part is optional because the first two-thirds of the article revolves only around the programming language itself.

The article is structured as follows:

  1. *args
  2. **kwargs
  3. Practical use in a Machine Learning task
  4. Conclusion

With that being said, let’s start with the first concept – *args.


*args

Let’s say you want to declare a function for summing numbers. There’s one problem with this function by default – it only accepts a fixed number of arguments. Sure, you can get around this by using only a single argument of type list – and that’s a viable alternative. Let’s explore it for a bit.

Down below we have your regular function for summing numbers, expecting a single argument of type list:

def sum_numbers(numbers):
   the_sum = 0
   for number in numbers:
       the_sum += number
 return the_sum

We can use it to find the sum:

numbers = [1, 2, 3, 4, 5]
sum_numbers(numbers)
>>> 15

But what if you don’t want to use a list? *args to the rescue:

def sum_numbers(*args):
   the_sum = 0
   for number in args:
       the_sum += number
   return the_sum
sum_numbers(1, 2, 3)
>>> 6
sum_numbers(1, 2, 3, 4, 5)
>>> 15

Yes, I hear you – this isn’t the best use case since we can use lists as a replacement. But we have a couple more examples under our belt – the first being unpacking.

List unpacking

The idea of unpacking is to, well, unpack any iterable object. The single asterisk * is used to unpack any iterable, and the double-asterisk is used only for dictionaries**. You’ll quickly get the gist of it.

Let’s say we have the following list:

num_arr = [1, 2, 3, 4, 5]

The process of unpacking it is straightforward – and already covered by our nifty sum_numbers() function:

print(*num_arr)
>>> 1 2 3 4 5

In a minute or so we’ll talk about dictionary unpacking – for now, let’s wrap this section with list concatenation.

List concatenation

Another useable aspect of *args is list concatenation. Let’s say we have two lists:

nums1 = [1, 2, 3]
nums2 = [4, 5, 6]

How would we concatenate them into a single list? If your answer is somewhere along the lines of iterating through both and storing values to the third list, then you’re not wrong (per se), but there’s an easier and more elegant option. Take a look at the following code:

nums = [*nums1, *nums2]
nums
>>> [1, 2, 3, 4, 5, 6]

And that’s how easy it is. Let’s move along to **kwargs, something a bit more applicable to your everyday data science tasks.


**kwargs

As mentioned earlier, **kwargs are used to unpack dictionaries. Without much ado, let’s see how one would use them.

For this simple example, let’s say you want to multiply 3 numbers, but those come from some external source and are stored as key-value pairs. The keys are always identical (obviously), but the values change.

Down below is your super-sophisticated backend code:

def multiply(a, b, c):
    return a * b * c

And here’s the data you got:

d = {'a': 3, 'b': 2, 'c': 5}

Now, one could take a naive approach and do something like this:

multiply(a=d['a'], b=d['b'], c=d['c'])
>>> 30

But as always, there’s a simpler solution. Let’s take the moment to appreciate the beauty of the code below:

multiply(**d)
>>> 30

That was simple, clean, and elegant – as Python was intended to be. I’m sure some ideas are already popping in your head about potential use cases for data science tasks, but let’s quickly go over dictionary concatenation before jumping into the good stuff.

Dictionary concatenation

The idea is basically the same as with lists. Down below we have two dictionaries:

d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}

To concatenate them into a single dictionary, all we have to do is the following:

d = {**d1, **d2}
d
>>> {'a': 1, 'b': 2, 'c': 3, 'd': 4}

The only gotcha you should be aware of is key duplication. If you have the same keys in two or more dictionaries, upon concatenation the value of the ladder is used.

Now comes the part you’ve been waiting for – a concrete example in an everyday machine learning task.


Practical use in a machine learning task

As promised, we’ll use kwargs to imitate the process of model training. Let’s imagine the following scenario – you’ve gathered and prepared data, and now would like to do some regression task— let’s say with the Random Forests** algorithm.

Down below are your imports and initialization:

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()

As you don’t know the optimal values for the hyperparameters, you decide to do a Grid Search. We won’t do that here, but after performing it you’d have optimal parameters inside of the bestparams attribute. The following dictionary imitates it:

best_params = {
   'bootstrap': True,
   'criterion': 'mse',
   'max_depth': 100,
   'min_samples_leaf': 2,
   'n_estimators': 400
}

The naive approach here would be to rewrite parameter names and values in another Random Forest initialization.

We’re smarter than this. We know about **kwargs and how they can be used to unpack dictionaries, so we can do the following:

rf = RandomForestRegressor(**best_params)

Do you see how elegant that was?

Not only that, but you’re also reducing the chance of possible errors in other projects, in case you only copy-paste the code and forget to change the values.

And here’s where the story ends – at least for today. We’ve covered a lot, so let’s do a quick recap.


Conclusion

As a data scientist, being aware of what your Programming language of choice can offer can be a decent bonus. The examples covered by this article were by no means groundbreaking, but we’re sure they’ll help you to reduce the amount of code, or to make the code look more elegant – or both.

Thanks for taking a couple of minutes of your day and reading until the end. As always, feel free to leave your thoughts in the comment section below.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Related Articles