How easy (or difficult) it is to learn and use a programming language depends in a significant way on how consistent different constructs of that language can be used and applied. A high degree of consistency also helps to avoid mistakes, thus making it a crucial aspect for building quality software. In this article I will show how well Julia and Python perform in this area.
Using ranges and indexing as an example, I will first illustrate how the two languages differ in the usage of these concepts and then explain the rationales behind it.
Ranges
In Programming a range of numbers, specified by a lower bound and an upper bound, is an often-used concept. Let’s have a look on how it is applied in both languages.
A range of integers
If we need, let’s say a range of integers from 4 to 10, this is expressed as follows:
Python: range(4,11) Julia: 4:10
In Julia the lower bound and the upper bound are inclusive when specifying a range, whereas in Python the lower bound is inclusive and the upper bound exclusive. So we have to give an upper bound of 11, if we want numbers up to 10 in Python.
A range of elements from an array
If we have an array a
and want to extract the 4th up to the 10th element, we have to write:
Python: a[3:10] Julia: a[4:10]
In Python indexing of arrays (lists) begins at zero, so the 4th element has an index of 3. And again the upper bound is exclusive. So to specify that we want the elements up to the 10th element (which has index 9), we have to put there a 10.
As we can see here, in Julia the exact same expression as above is used. And it is not only the exact same expression, but the same (range) object. It can be used ‘standalone’ as above or given as an argument to an array for indexing.
A range to draw random numbers from
The following expressions are used to generate random numbers in the range from 4 to 10:
Python: randint(4,10) Julia: rand(4:10)
In Python the upper bound is in this case inclusive, thus deviating from the common concept. In Julia we pass the exact same range object, introduced above, as an argument to the function rand
.
Introducing a step size
If we don’t want each element within a range, but e.g. only every second element, this can be specified in both languages using a step size (in this example a size of 2 instead of 1). So to get every second number from 4 to 10 we write:
Python: range(4,11,2) Julia: 4:2:10
In Julia this extended version is also just a range object. So it can be used in each place where a range object may be specified, especially also in the cases above to specify a subrange of an array or to specify the range from where random numbers are drawn.
Julia: a[4:2:10]
rand(4:2:10)
This isn’t possible in Python.
Other sorts of ranges
Of course there are not only ranges of integers. The next example shows a range of three hourly timestamps on April 8th of 2022 (from 12:00 to 14:00):
Python: pandas.data_range(start='08/04/2022 12:00',
periods=3, freq='H')
Julia: DateTime(2022,4,8,12):Hour(1):DateTime(2022,4,8,14)
As can be seen, in Julia the same basic syntax (lower:step:upper
) is used as in the case of integer ranges.
In Julia this can be applied to other data types too, like characters. Here we specify a range of characters from ‘d’ to ‘k’:
Julia: 'd':'k'
This isn’t possible in Python.
Indexing
The second example used in this article to demonstrate how consistent (or inconsistent) a concept is applied, is indexing of 2-dimensional array-like data structures.
Accessing an element within a matrix
To access the element in the 2nd row and 4th column of a matrix m
(a 2-dimensional array) we write:
Python: m[1][3] Julia: m[2,4]
In Python a list of lists is used (because there are no n-dimensional arrays in the base library). Therefore indexing such a structure is a two step-process (and needs twice the brackets). And again, as indexing starts at zero in Python, the 2nd row has index 1 and the 4th column has index 3.
Accessing an element within a DataFrame
To get the element in the 2nd row and 4th column of a DataFrame df
(which is a 2-dimensional data structure similar to a database table) we have to write:
Python: df.iloc[1,3] Julia: df[2,4]
In Julia this is exactly the same notation as above.
In contrast to the list of lists used to represent a matrix in Python, a DataFrame is a true 2-dimensional structure. Therefore the indices can be written within one pair of brackets.
Accessing a range of elements within a DataFrame
Each column in a DataFrame has a name. These names can also be used to reference a column. So if we want the elements from rows 4 to 10 in the columns named "A" and "B" we write:
Python: df.loc[3:10,["A","B"]] Julia: df[4:10,["A","B"]]
Assign a value to a matrix element
In order to assign the value 5 to the matrix element in the 2nd row and 4th column of a matrix m
, we have to write:
Python: m[1][3] = 5 Julia: m[2,4] = 5
Assign a value to an element within a DataFrame
If we want to assign the value 5 to the element in the 2nd row and the 4th column of a DataFrame df
the following expressions are used:
Python: df.iat[1,3] = 5 Julia: df[2,4] = 5
To assign that value to the element in the 2nd row in column "A" we write:
Python: df.at[1,"A"] = 5 Julia: df[2,"A"] = 5
Summary
For indexing a matrix in Python (which is a list of lists) a notation consisting of two pairs of brackets is used, whereas for DataFrames one pair of brackets in combination with a method-call must be used. There is a different method for each sort of indexing (iloc, loc, iat, at
).
In Julia there is one common notation for all of these cases: The two indices are always placed between a pair of brackets. That’s it.
But why?
I think it’s obvious that the higher degree of consistency in Julia makes the language easier to learn, easier to read and easier to use. So the question arises: If the advantages are so obvious, why don’t they do it in Python (or some other programming language with a lower degree of consistency)?
The short answer is: Because they can’t!
The longer answer
Now to the longer answer: If we look behind the scenes, then e.g. accessing an element within a matrix in Julia (like m[2,4]
) translates to a call of the function getindex
as follows:
getindex(m, 2, 4)
The notation with the brackets ist just syntactic sugar (enabled by operator overloading) for this function call. This holds for all examples:
Each of these function calls is done with arguments of different types:
![Different argument types for getindex [image by author]](https://towardsdatascience.com/wp-content/uploads/2022/05/1BKXQlrvZrCOaQ-gU9qRoaQ.jpeg)
getindex
[image by author]Depending on the data types used for the arguments, an appropriate implementation of getindex
gets called. And you guessed it perhaps already: This is the famous concept of multiple dispatch at work!
So multiple dispatch facilitates in the end the consistent use of one function (getindex
) that can be applied in many similar variations. In addition, it gets wrapped in nice clothes (the notation using brackets); but that’s just a wrapper for better readability. The core is based on multiple dispatch. Thus, Programming Languages which don’t have this concept, cannot offer this degree of consistency.
At the moment the base library of Julia alone has 220 variations of getindex
. So it really allows a wide application of this concept.
Assign a value
The examples where a value is assigned to an element of a matrix-like structure work in an analogous way. Instead of getindex
they use setindex!
. So the mechanism is the same here.
Behind the scenes of ranges
Now to the examples, where ranges have been used: All the expressions above, which specify a range, represent objects of type UnitRange
(having default step size 1) or StepRange
(using other step sizes). Both types are subtypes of the abstract type AbstractRange
as the following type hierarchy shows:
![Type hierarchy for ranges [image by author]](https://towardsdatascience.com/wp-content/uploads/2022/05/1cGRx9L2Ic-oRGXPeugBMUA.jpeg)
So for example
4:10
is an object of typeUnitRange{Int64}
(theInt64
in curly braces tells us, that the lower and upper bound of the range are of this integer type),4:2:10
an object of typeStepRange{Int64, Int64}
(here the bounds as well as the steps are of typeInt64
) and-
DateTime(2022,4,8,12):Hour(1):DateTime(2022,4,8,14)
is of typeStepRange{DateTime, Hour}
.
And then again multiple dispatch comes to the game:
- Accessing a range of elements of an array with
a[4:10]
translates, as we’ve learned above, togetindex(a, 4:10)
. So there is an implementation ofgetindex
which accepts a subtype ofAbstractRange
as an index argument (apart from the "classic" indexing using an integer number and quite a few other variations). - Drawing a random number from a range of numbers using
rand(4:10)
means, that there is an implementation ofrand
(among other versions), which accepts a subtype ofAbstractRange
as its argument.
So in the end it also boils down to the application of multiple dispatch (which inherently relies on the Julia type system, that can be arbitrarily extended by user-defined types).
And more syntactic sugar
For those interested: The notation for ranges using colons like 4:10
or 4:2:10
is also just a syntactically nicer version of a normal function call (again based on operator overloading). In this case it is the function range
.
So 4:10
is actually range(4, 10)
and 4:2:10
is equivalent to range(4, 10, step = 2)
(the step size is an optional keyword argument; that’s why the keyword step
is necessary in this example).
Conclusion
Using examples like indexing and ranges we could see that things can be expressed in very consistent ways in Julia, which makes the language easy to learn, easy to read and in the end also easy to use.
This consistency has been achieved with a few powerful concepts like multiple dispatch, an extensible type system and operator overloading, which have been carefully crafted in order to work seamlessly together.