The world’s leading publication for data science, AI, and ML professionals.

Python vs. Julia: It’s also about Consistency

The main advantages of Julia over Python are surely its speed and concepts like multiple dispatch. But there is more: In everyday use…

Photo by Claudio Schwarz on Unsplash
Photo by Claudio Schwarz on Unsplash

How easy (or difficult) it is to learn and use a programming language depends in a significant way on how consistent different constructs of that language can be used and applied. A high degree of consistency also helps to avoid mistakes, thus making it a crucial aspect for building quality software. In this article I will show how well Julia and Python perform in this area.

Using ranges and indexing as an example, I will first illustrate how the two languages differ in the usage of these concepts and then explain the rationales behind it.

Ranges

In Programming a range of numbers, specified by a lower bound and an upper bound, is an often-used concept. Let’s have a look on how it is applied in both languages.

A range of integers

If we need, let’s say a range of integers from 4 to 10, this is expressed as follows:

Python: range(4,11)                    Julia: 4:10

In Julia the lower bound and the upper bound are inclusive when specifying a range, whereas in Python the lower bound is inclusive and the upper bound exclusive. So we have to give an upper bound of 11, if we want numbers up to 10 in Python.

A range of elements from an array

If we have an array a and want to extract the 4th up to the 10th element, we have to write:

Python: a[3:10]                       Julia: a[4:10]

In Python indexing of arrays (lists) begins at zero, so the 4th element has an index of 3. And again the upper bound is exclusive. So to specify that we want the elements up to the 10th element (which has index 9), we have to put there a 10.

As we can see here, in Julia the exact same expression as above is used. And it is not only the exact same expression, but the same (range) object. It can be used ‘standalone’ as above or given as an argument to an array for indexing.

A range to draw random numbers from

The following expressions are used to generate random numbers in the range from 4 to 10:

Python: randint(4,10)                Julia: rand(4:10)

In Python the upper bound is in this case inclusive, thus deviating from the common concept. In Julia we pass the exact same range object, introduced above, as an argument to the function rand.

Introducing a step size

If we don’t want each element within a range, but e.g. only every second element, this can be specified in both languages using a step size (in this example a size of 2 instead of 1). So to get every second number from 4 to 10 we write:

Python: range(4,11,2)               Julia: 4:2:10

In Julia this extended version is also just a range object. So it can be used in each place where a range object may be specified, especially also in the cases above to specify a subrange of an array or to specify the range from where random numbers are drawn.

                                    Julia: a[4:2:10]
                                           rand(4:2:10)

This isn’t possible in Python.

Other sorts of ranges

Of course there are not only ranges of integers. The next example shows a range of three hourly timestamps on April 8th of 2022 (from 12:00 to 14:00):

Python: pandas.data_range(start='08/04/2022 12:00', 
                          periods=3, freq='H')
Julia:  DateTime(2022,4,8,12):Hour(1):DateTime(2022,4,8,14)

As can be seen, in Julia the same basic syntax (lower:step:upper) is used as in the case of integer ranges.

In Julia this can be applied to other data types too, like characters. Here we specify a range of characters from ‘d’ to ‘k’:

                                  Julia: 'd':'k'

This isn’t possible in Python.

Indexing

The second example used in this article to demonstrate how consistent (or inconsistent) a concept is applied, is indexing of 2-dimensional array-like data structures.

Accessing an element within a matrix

To access the element in the 2nd row and 4th column of a matrix m (a 2-dimensional array) we write:

Python: m[1][3]                   Julia: m[2,4]

In Python a list of lists is used (because there are no n-dimensional arrays in the base library). Therefore indexing such a structure is a two step-process (and needs twice the brackets). And again, as indexing starts at zero in Python, the 2nd row has index 1 and the 4th column has index 3.

Accessing an element within a DataFrame

To get the element in the 2nd row and 4th column of a DataFrame df (which is a 2-dimensional data structure similar to a database table) we have to write:

Python: df.iloc[1,3]              Julia: df[2,4]

In Julia this is exactly the same notation as above.

In contrast to the list of lists used to represent a matrix in Python, a DataFrame is a true 2-dimensional structure. Therefore the indices can be written within one pair of brackets.

Accessing a range of elements within a DataFrame

Each column in a DataFrame has a name. These names can also be used to reference a column. So if we want the elements from rows 4 to 10 in the columns named "A" and "B" we write:

Python: df.loc[3:10,["A","B"]]    Julia: df[4:10,["A","B"]]

Assign a value to a matrix element

In order to assign the value 5 to the matrix element in the 2nd row and 4th column of a matrix m, we have to write:

Python: m[1][3] = 5               Julia: m[2,4] = 5

Assign a value to an element within a DataFrame

If we want to assign the value 5 to the element in the 2nd row and the 4th column of a DataFrame df the following expressions are used:

Python: df.iat[1,3] = 5          Julia: df[2,4] = 5

To assign that value to the element in the 2nd row in column "A" we write:

Python: df.at[1,"A"] = 5         Julia: df[2,"A"] = 5

Summary

For indexing a matrix in Python (which is a list of lists) a notation consisting of two pairs of brackets is used, whereas for DataFrames one pair of brackets in combination with a method-call must be used. There is a different method for each sort of indexing (iloc, loc, iat, at).

In Julia there is one common notation for all of these cases: The two indices are always placed between a pair of brackets. That’s it.

But why?

I think it’s obvious that the higher degree of consistency in Julia makes the language easier to learn, easier to read and easier to use. So the question arises: If the advantages are so obvious, why don’t they do it in Python (or some other programming language with a lower degree of consistency)?

The short answer is: Because they can’t!

The longer answer

Now to the longer answer: If we look behind the scenes, then e.g. accessing an element within a matrix in Julia (like m[2,4] ) translates to a call of the function getindex as follows:

getindex(m, 2, 4)

The notation with the brackets ist just syntactic sugar (enabled by operator overloading) for this function call. This holds for all examples:

Each of these function calls is done with arguments of different types:

Different argument types for getindex [image by author]
Different argument types for getindex [image by author]

Depending on the data types used for the arguments, an appropriate implementation of getindex gets called. And you guessed it perhaps already: This is the famous concept of multiple dispatch at work!

So multiple dispatch facilitates in the end the consistent use of one function (getindex) that can be applied in many similar variations. In addition, it gets wrapped in nice clothes (the notation using brackets); but that’s just a wrapper for better readability. The core is based on multiple dispatch. Thus, Programming Languages which don’t have this concept, cannot offer this degree of consistency.

At the moment the base library of Julia alone has 220 variations of getindex. So it really allows a wide application of this concept.

Assign a value

The examples where a value is assigned to an element of a matrix-like structure work in an analogous way. Instead of getindex they use setindex!. So the mechanism is the same here.

Behind the scenes of ranges

Now to the examples, where ranges have been used: All the expressions above, which specify a range, represent objects of type UnitRange (having default step size 1) or StepRange (using other step sizes). Both types are subtypes of the abstract type AbstractRange as the following type hierarchy shows:

Type hierarchy for ranges [image by author]
Type hierarchy for ranges [image by author]

So for example

  • 4:10 is an object of type UnitRange{Int64} (the Int64 in curly braces tells us, that the lower and upper bound of the range are of this integer type),
  • 4:2:10 an object of type StepRange{Int64, Int64} (here the bounds as well as the steps are of type Int64) and
  • DateTime(2022,4,8,12):Hour(1):DateTime(2022,4,8,14) is of type StepRange{DateTime, Hour}.

And then again multiple dispatch comes to the game:

  • Accessing a range of elements of an array with a[4:10] translates, as we’ve learned above, to getindex(a, 4:10). So there is an implementation of getindex which accepts a subtype of AbstractRange as an index argument (apart from the "classic" indexing using an integer number and quite a few other variations).
  • Drawing a random number from a range of numbers using rand(4:10) means, that there is an implementation of rand (among other versions), which accepts a subtype of AbstractRange as its argument.

So in the end it also boils down to the application of multiple dispatch (which inherently relies on the Julia type system, that can be arbitrarily extended by user-defined types).

And more syntactic sugar

For those interested: The notation for ranges using colons like 4:10 or 4:2:10 is also just a syntactically nicer version of a normal function call (again based on operator overloading). In this case it is the function range.

So 4:10 is actually range(4, 10) and 4:2:10 is equivalent to range(4, 10, step = 2) (the step size is an optional keyword argument; that’s why the keyword step is necessary in this example).

Conclusion

Using examples like indexing and ranges we could see that things can be expressed in very consistent ways in Julia, which makes the language easy to learn, easy to read and in the end also easy to use.

This consistency has been achieved with a few powerful concepts like multiple dispatch, an extensible type system and operator overloading, which have been carefully crafted in order to work seamlessly together.


Related Articles