The world’s leading publication for data science, AI, and ML professionals.

Unexpected Size of Python Objects in Memory

How much memory does a Python object take on memory?

DATA SCIENCE

Photo by bady abbas on Unsplash
Photo by bady abbas on Unsplash

The Story

I was working on a large dictionary in Python for a Data Science project. The Resouce Monitor (a windows utility that displays information about the use of hardware) showed an enormous amount of memory usage in a short amount of time. I knew that my draft code was not optimal, but the rate of memory utilization was not making sense with the growth rate of my dictionary length. It seemed that my dictionary length did not have a linear relationship with the dictionary object’s size in memory. I decided to check the size of my dictionary in memory. I was sure that there should be a Python function that gives me the answer, right? Of course, I used Google to find that magic function. After an hour of research, I joined the group of Python programmers who realized that there is no straightforward solution to this question. Why? Read this article.


Basics

If you are not familiar with how Python manages memory, I would recommend reading the following article of mine first. In a layman’s language, it explains how Python allocates memory to objects.

Python, Memory, and Objects

As I explained in the article, Python objects get stored in the Heap Memory (i.e., a dynamic memory). You can get the address of an object on the Heap memory (aka the Heap) by using a function called id(). You can read more details in the above article.

But in that article, we did not discuss anything about the size of objects on the Heap. What is the size of an object in memory? Here, I give you three answers to this question. One is simple but wrong, the next one is a little bit more complex and more accurate, and the last solution is as correct as we can get.

Why should I give you a simple but wrong answer? The reason is that if you see the correct answer first, you might not understand why the answer is a little bit complicated. Also, you might not have a good understanding of the reasons behind that. After all, we are reading articles to understand the reasons behind codes and solutions; otherwise, StackOverflow is full of correct and verified solutions for almost everything.

The Simple and Wrong Answer

In Python, the most basic function for measuring the size of an object in memory is sys.getsizeof(). Let’s take a look at this function using a few examples.

>>> sys.getsizeof(1)
28
>>> sys.getsizeof('Python')
55
>>> sys.getsizeof([1, 2, 3])
88

These three examples show the size of an integer, a string, and a list in bytes. At first glance, everything seems good, and you wonder why this article should be written, right? Give me a few minutes, and I might convince you (as I got convinced after reading the other examples for the first time). Let’s see another example.

>>> sys.getsizeof('')
49
>>> sys.getsizeof('P')
50
>>> sys.getsizeof('Py')
51
>>> sys.getsizeof('Pyt')
52

First, I have an empty string. It took 49 bytes! Then I have a string with only one character, and its size is 50 bytes. I added more characters, and it seems that each character adds one byte to the size of my string object. How do we explain this observation? Actually, it is easy. In Python, like almost everything else, a string is an object, not only a collection of characters. An object (in this case, a string object), in addition to its value (i.e., collection of characters), has different attributes and related components. When we create an object, Python stores all this information in memory. Therefore, we have an overhead even for an empty string.

Let’s check the same thing for a list.

>>> sys.getsizeof([])
64
>>> sys.getsizeof([1])
72
>>> sys.getsizeof([1, 2])
80

We see the same story here. A list object has 64 bytes of overhead. For each additional item, its size grows by 8 bytes. Okay, it was strange first, but it makes sense now. Let’s see another example (I promise you, this one is more interesting).

>>> sys.getsizeof([1, 2])
80
>>> sys.getsizeof([3, 4, 5, 1])
96
>>> sys.getsizeof([1, 2, [3, 4, 5, 1]])
88

First, I have a list of [1, 2] which takes 80 bytes of memory. I have another list of [3, 4, 5, 1] which took 96 bytes. So far, everything makes sense. For a list object, we have 64 bytes of overhead and 8 bytes for each additional item. Now, I nest the second list inside the first list. The resulting list will be something like [1, 2, [3, 4, 5, 1]] . When I get the size of this new list object, its size is 88 bytes. What?!! The size of the new nested list (i.e., 88 bytes) is even less than the size of my second list (i.e., 96 bytes). How is it possible?

Let’s playback. First, I had a list of two items (i.e., integer numbers). It took 80 bytes of memory as we expected. When we added a new item, which was a list, it added 8 bytes to my list. It seems that, no matter what, an additional item takes 8 bytes. It seems that a list object is not storing the items but a reference to items (i.e., the memory addresses). THAT’S TRUE. When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects. It means that in the previous example, the list of [1, 2, [3, 4, 5, 1]] is stored on the memory like [reference to obj1, reference to obj2, reference to obj3]. The size of each reference is 8 bytes. In this case, obj1, obj2, and obj3 are stored somewhere else in the memory. Therefore, to get the actual size of our list object, in addition to getting the size of the list, we need to include the size of each member object (which we call them items).

A More Complex and a More Accurate Answer

As we learned from the previous section, sys.getsizeof() only gives us the size of the object and its attributes on memory. It does not include the size of referenced objects (e.g., items in a list) and their attributes. To get the actual size of an object, we must iterate through all components of an object (e.g., items in a list object) and add their sizes together. The following figure is an example.

Image by the author.
Image by the author.

As the above figure shows, the size of an object such as [1, 2, [3, 4, 5, 1]] is 352 bytes. However, there is a mistake in this calculation. If you look at the list of objects in the figure, you see the same memory addresses on rows 2 and 8 (highlighted with *). It seems that 1 (i.e., an integer object) in the main list and 1 in the nested list are stored in the same memory address. As I explained in a previous article (link), Python stores integer numbers between [-5, 256] once and points all references to the same memory address (for memory optimization). Therefore, it is better to identify duplicates using their memory addresses (via id()) and count their memory size once. Therefore for our example, we must remove duplicates before summing their memory sizes. The following figure shows the correct answer, which is 324 bytes.

Image by the author.
Image by the author.

As Accurate as We Can

The previous solution was more accurate than what we calculated initially, but unfortunately, it still has some caveats. When you load a class, some other elements, that you cannot think of them (e.g., obj.dict or obj.slots), might also get stored in the memory. Tracking these elements manually is hard and sometimes impossible. A better way of searching for all the elements attached to your object is to use a function from the Python Garbage Collector interface called gc.get_referents().

If you are not familiar with Python Garbage Collector, I recommend you to read this article. Garbage Collector keeps track of all objects and associated elements in the Heap and removes them when the program does not need them anymore.

Here we can take advantage of the Garbage Collector interface (link) to find all elements linked to the object that we want to know its size on the memory. The following code iterates through all objects and elements attached to the original object and adds their size to the total size of the object.

Additional Resources:

I also found a good solution provided by the following article. Although the solution works for a limited set of objects, the solution looks solid.

Measure the Real Size of Any Python Object | Shippo

Summary

Measuring the size of Python objects in memory is not an easy task. There is not a built-in and straightforward solution for finding the actual size of the objects. In this article, we learned why it isn’t easy to measure the objects’ actual size. Also, I provided a solution that works for many (not all) objects in Python.

Follow me on Twitter for the latest stories: https://twitter.com/TamimiNas


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.