False Friends
When learning a foreign language, you will encounter words that are spelled and pronounced similar to words you know in your mother tongue. But sometimes these seemingly familiar words have a completely different meaning in the other language. E.g. the German verb "bekommen" seems to be equivalent to the English "to become", but actually means "to get" or "to receive". This leads sometimes to rather funny sentences when German-speaking people learn English, like "she became a cat on her 11th birthday". Of course the intention here is not to express that the girl mutated into a cat on her birthday, but that she got a cat as a birthday present. Such seemingly related words are called "false friends".
The same situation may occur when you switch from a programming language, you are familiar with, to a new one, e.g. when you switch from Python to Julia. Both languages have some conceptual differences, but there are also several aspects which are quite similar and seem to be interchangeable— but not all of them actually are!
Python lists and Julia arrays
Recently I encountered a nice example for this situation, when reading the article Prime time for Julia and Python by René. He presented a Python function (originally from here) which computes prime numbers, translated it 1:1 to Julia, did some benchmarking and was a bit disappointed when noticing that the performance of the Julia implementation was below expectations (Python ca. 7.5 s, Julia ca. 4.5 s; I’ve repeated the benchmarks in my environment and got about 9.5 s and 7.5 s respectively).
Let’s have a look at that code to see what happened and what can be done to improve the situation (Note: For the further analysis it is not important to fully understand this code. We will only have a deeper look at some of its aspects; especially the Data Structures used.)
First the Python code:
And here the translation to Julia:
Note: There is a minor error in the Julia version. The delete!
statement in line 15 must be placed after the for-loop to be identical to the Python code. The corrected version has been used to create the benchmark numbers in this article.
Data structures (ab)used
The main data structures used in these functions are
- in Python: a dictionary
D
of lists -
in Julia: a dictionary
D
of arrays (or rather ofVector
s, which is a synonym for a one-dimensional array).
![The dictionary D of arrays as used in the code above [image by author]](https://towardsdatascience.com/wp-content/uploads/2022/05/1dW__6Fr7BYKw_gx218iR8Q-scaled.jpeg)
Analyzing the (Julia-)code we see, that this dictionary D
is initially empty and that its slots get incrementally populated, either by assigning an array with one element (line 8), assigning an empty array (line 13) or by adding single numbers to these arrays (if they already exist) using push!
(lines 9 and 14). As the example in the image above shows, most of these arrays contain only one element during the execution of the function. Over time some of the arrays aren’t needed any more and get deleted (line 15).
So basically, the following "translation" step has been made: In the Julia version the arrays have been used as a "replacement" for the Python lists. Both data structures are in several aspects quite similar and, as the example shows, can even be used more or less in the same way.
BUT: Python lists have been designed as dynamic data structures which can grow and shrink over their lifetime in little increments. You can do that too with Julia arrays, but you shouldn’t! Julia arrays are (as their name already tells us) not lists but arrays, which are traditionally rather static structures. So they are designed with a different purpose in mind.
Using a Julia array in the way the example above does, is one of the most inefficient ways to use it. Julias @time
macro shows that a call on this function with n
= 1,000,000 allocates about 4 GB of memory in more than 65 million separate allocation calls. That can’t be efficient!
7.532386 seconds (65.79 M allocations: 3.933 GiB, 19.15% gc time)
Improvements
Let’s go into some details and see what happens e.g. in lines 13 & 14 in order to demonstrate how much these tiny steps, which are applied to create and grow the data structures, influence Performance:
-
In line 13 the function
get!
is called only because of its side-effect: The developer wants to assign an empty array to the dictionaryD
at slotp+q
, if the slot is empty or do nothing, if there is already an array. The actual purpose ofget!
is to read the value at a specific slot position. - This is done so that in line 14 the value
p
can be appended to the array. I.e. line 13 guarantees that in any case an array exists at that position.
We can replace these two lines by the following code, which expresses these intentions more clearly:
if haskey(D, p+q)
push!(D[p+q], p) # append `p` to the array, if there is one
else
D[p+q] = [p] # assign a new array `[p]`, if there is none
end
So, in case the slot at p+q
is empty (the else
part), a new array containing p
is assigned in one step instead of two as in the original code.
This tiny change alone which replaces two little steps by one (a little bit larger) step reduces run time by about 43% and memory allocation by 50%!
4.242410 seconds (26.15 M allocations: 2.047 GiB, 13.62% gc time)
Apart from that, using functions only for their side-effects (as in line 13) is bad practice anyway as it conceals the real intentions. This makes code harder to read and to understand. Moreover, how should the compiler get an idea of what is really meant (and produce optimal code), if that’s not clearly expressed?
Move algorithms, not code
But that’s not the way you should improve the code. The decisive point is that the Python code shouldn’t be translated line by line (which will result in some Python-ish Julia code), but the algorithm as a whole should be considered and implemented in a Julian way.
Algorithms, not code, should be moved from one language to another.
Sieve of Eratosthenes
So let’s do this: The algorithm used above is the famous Sieve of Eratosthenes. And yes, in case you wondered, the code above is a rather special implementation of the "Sieve".
The main idea of the Sieve of Eratosthenes for computing the prime numbers in the range from 1 to n is as follows:
- List all numbers from 1 to n.
- Initially we assume that all these numbers are prime numbers. We can omit 1, because 2 is by definition the smallest prime.
- Discard all numbers which are multiples from prime numbers which have already been identified:
- Discard all multiples of 2
- Discard all multiples of 3
- Omit 4, because it has already been identified as a multiple of 2
- Discard al multiples of 5 … etc.
- We can stop this process when we reach
sqrt(n)
because at that point we have already checked all multiples of numbers larger than that threshold. E.g. for n = 100 (where this threshold is 10) we don’t have to check for multiples of 11, because we’ve already checked 2 x 11 when checking multiples of 2, and we’ve checked 3 x 11 when checking multiples of 3 etc. - When checking for multiples of a number k, we don’t have to start at 1 x k. It’s sufficient to start at k x k, because all lower multiples have already been checked. E.g. when checking for multiples of 5 we can start at 25, because 2 x 5, 3 x 5 and 4 x 5 have already been checked in earlier passes.
Note: The parameter n in this description has a meaning different from the one used in the code above. Here it denotes the upper bound of the range in which prime numbers are searched. In the implementation above n is the number of primes which will be identified. So using n = 1,000,000 produces one million prime numbers. In the following section we will implement a function fastPrimes
, that adheres to this "classic" version of the algorithm presented here, where n is the upper bound. In order to get the same result with fastPrimes
as with the code above, we have to call it with n = 15,495,863 because this is the largest of the first one million primes.
Implementation in Julia
The base of the implementation of the "Sieve"-algorithm is an adequate data structure. A structure that directly reflects the concept of a list of numbers together with the information "is a prime number?" is an array of Boolean values. If we want the implementation also to be memory efficient, then we choose a BitArray
. This is a variant of an array of Booleans which stores each Boolean using just one bit of memory.
This consideration and the description of the algorithm above leads directly to the following code:
- In line 2 we create a
BitArray
with n elements (all set to true) and __ we omit 1 (line 3). - The for-loop starting in line 4 checks all multiples of numbers from 2 to
sqrt(n)
. - Numbers which are already marked as not being a prime number are omitted (line 5).
- The for-loop starting in line 6 goes through all multiples of i, starting at i x i and marks them as not being a prime number.
- That’s it! The statement in line 10 is just used to convert the
BitArray
with all prime numbers marked with true to an array of prime numbers.
The runtime of this function is around 76 ms and it uses only about 11 MB of memory as the results of the BenchmarkTools
package shows. That’s almost 100 times faster using less than 0.3% of the memory compared to the original implementation.
![Benchmark results for fastPrimes [image by author]](https://towardsdatascience.com/wp-content/uploads/2022/05/1pEVqycr4IM5va7zSwbsSkA.png)
fastPrimes
[image by author]Implementation in Python
This approach leads of course also to better numbers in Python. Here is the code:
As indexing of arrays starts in Python at 0, we create here an array of size n+1 and ignore the element at index 0.
This code runs in about 2.5 s with the same parametrization as in the Julia example. So it’s almost 4 times faster than the original code. Unfortunately the Python timeit
module used to do the benchmarks doesn’t produce information about the memory allocated.
Results
The following diagram summarizes the performance of the different variants presented in this article and shows again which outstanding results can be achieved when using Julia in the right way.
![Performance of different variants [image by author]](https://towardsdatascience.com/wp-content/uploads/2022/05/1kUtoYxxE5n2NIp0cjL8Z3g.png)
The following environment has been used for the benchmarks: Apple M1, 8 GB RAM, Julia 1.7.1, Python 3.9.7.
For the Julia variants we can also visualize performance in relation to memory allocated. Significant improvements could be made on both measures.

Sidenotes
The main advantages of Julia over Python typically mentioned are performance and fancy concepts like multiple dispatch. But I think there is more to it: It’s also about more practical things that are important in every-day use like simplicity, consistency and a package manager that just works well.
When I prepared this article, I stumbled again over these issues. Issues which made me look for an alternative to Python a few years ago, why I would like to share them with you.
Julia
Timing a function can be done in Julia using the @time
macro, which is included in the base library, as follows:
@time fastPrimes(100)
You get a bit more information with the verbose version @timev
using the same pattern:
@timev fastPrimes(100)
If you want "real" benchmarking, doing several iterations automatically and getting some informative statistics etc., you can use the BenchmarkTools
package. As it is a third-party package, it has to be installed first. Therefore I switched to package mode, typed add BenchmarkTools
and the package manager downloaded, installed and pre-compiled that package in less than a minute. Julia has one package manager that has been designed and developed in concert with the language.
After entering using BenchmarkTools
the functions of that package could be used. And it has (despite being a completely different package, developed by different people) the same usage pattern as the macros above. You call it with:
@benchmark fastPrimes(100)
Python
In Python you can utilize the timeit
-method from the module with the same name. It’s also included in the base library. To time the fastPrimes
-function I had to use the following expression:
timeit.Timer('fastPrimes(100)', setup='from __main__ import fastPrimes; gc.enable()').timeit(number = 1)
The from __main__ import
was necessary so that timeit
can "see" the fastPrimes
-function. gc.enable()
switches the garbage collector on, because timeit
switches it off by default, which is neither a real-life situation nor comparable to the Julia setting.
And no, this isn’t easy to understand nor easy to use. It took me quite some time to figure out how to use timeit
in the proper way!
As timeit
only measures the runtime but doesn’t give information about memory allocation (as the Julia tools do), I was looking for a better alternative. Two other functions for Python benchmarking mentioned in different places on the web didn’t meet these requirements. But a third one, the benchmarkit
module, seemed to do it. So I tried to install it. Python has two package managers (PIP and Conda); both have been developed independently of the language.
As I have an Anaconda environment on my computer, I tried first conda
for the installation. But in the Conda universe the benchmarkit
module was unknown. So I switched to pip
which started to load megabytes of software on my computer. After several minutes it notified me that it would now install the pandas
module, which annoyed me, because that module was already installed. Time passed … then dozens of error messages flooded my screen. Thereafter it reported that a pandas
installation has been detected on my computer, which improved my mood. After about 15 minutes (using a M1 iMac and a 100 MBit/s internet connection!) everything seemed to be ready. So I typed import benchmarkit
… resulting in a message telling me, that this module was not installed. I.e. after fiddling around for so long I had nothing but a perhaps messier Python installation!
I think I don’t have to mention that benchmarking with benchmarkit
would follow a completely different pattern than using timeit
.