Introduction to Scala

Eric Stokes
Towards Data Science
7 min readAug 8, 2019

--

source

What is Scala? Scala is a high level language that combines functional and object oriented programming with high performance runtimes. So why would you use Scala instead of Python? Spark is typically used in most cases when dealing with big data. Since Spark was built using Scala, it makes sense that learning it will be a great tool for any Data Scientist.

source

Scala is a powerful language that can leverage many of the same functions as Python, such as building machine learning models. We are going to take an introductory dive into Scala, get familiar with the basic syntax and learn how to use loops, maps and filters. I will be using a community version of databricks.com throughout this guide.

Variables

In Scala, we declare variables with val or var and we will discuss the difference between the two. It is worth noting that the syntax convention for Scala is camelCase, whereas Python uses snake_case when declaring variables. Let’s create our first variable in Scala:

val counterVal = 0counterVal: Int = 0

Now let’s declare a variable using var :

var counterVar = 0counterVar: Int = 0

They look the same and have the same output when we run the cells, but they are essentially different. We can see it when we try to impute the values of each variable:

counterVar = 1counterVar: Int = 1

Here is what happens when we try to give a new value to counterVal:

notebook:1: error: reassignment to val counterVal = 1 ^

Any variable that is declared using val is immutable, therefore we cannot alter its value. Using val is great when we don’t want a variable to be altered, whether intentionally or accidentally. For instance, we may want to use a name as our val and store it so that no one can alter a person’s name.

val firstName = "John"val lastName = "Doe"firstName: String = John lastName: String = Doe

Notice that Scala shows the type of our variable (String) when it is created.

Strings

In Scala we can work with strings in a very similar way that we do in Python. A common use of strings is interpolation, which means to inject a phrase or sentence with a remark or word. A string interpolation looks like this:

s"Hello, $firstName $lastName"res0: String = Hello, John Doe

Similarly to how Python uses the f string, here we are using s. The double quotes are important in Scala as it will return an error if we try to pass single quotes. Scala uses the $ to preface the variable being passed into the interpolation. Notice how it automatically adds the space between the words:

s"Hello, ${firstName + lastName}"res1: String = Hello, JohnDoe

There is no separation between the first and last name in the method above. To get the space we would have to be explicit using: ${firstName + " " + lastName}" . I prefer using the $ for each variable without {} — you can use either method for interpolation.

String Indexing

Probably the most used techniques in Data Science are indexing and using ranges. In both cases, Scala uses the .slice() method, where the first number is inclusive while the last number is exclusive. Let’s work on some examples.

First I created a new variable called “fullName”.

val fullName = firstName + " " + lastName

In Scala, we can call the first index of our variable by simply using () after the variable.

fullName(0) this will return J in our fullName variable. For indexing a range of indices within our variable, we need to call .slice() and pass in the range of indices.

Running fullName.slice(3, 6) will return n D from firstName. Scala is including 3 and counts the space as 4. It stops at D because 6 is exclusive when setting a range. This is similar to other programming languages. It takes time to become familiar with this concept and there will still be times when you set the range incorrectly. An important note here is that you cannot index negative numbers. For anyone familiar with Python, using [-1] will return the end of an index, whereas Scala will give an error. Indexing beyond the range of the variable will just give the last. To get the length of a string use: .length() .

Arrays

source

Arrays are basically how Scala handles lists. Arrays have methods that resemble Python but has slight differences that makes Scala arrays unique. When creating an array be careful to choose var as you will want to alter your array values. Since Arrays can only have their values altered and cannot change the size of the array itself, we will be using ArrayBuffer to demonstrate.

var myArr = ArrayBuffer(2, 3 , 4, 5, 6)myArr: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(2, 3, 4, 5, 6)

Notice that Scala detects that our array contains all integer types. Now that we have an array, let’s work through some examples of things we can do with them. Like strings, we can index and slice an array to see the values at any given placement.

myArr(3)res0: Int = 5myArr.slice(3, 5)res1: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(5, 6)

To add an element to an array use += to append a value:

myArr += 10

myArrres3: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(2, 3, 4, 5, 6, 10)

You can see that 10 was added onto the array as the last element. We can also remove an item in a similar way:

myArr -= 10myArrres7: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(2, 3, 4, 5, 6)

To remove multiple elements from the list we need to use () like this:

myArr -= (2, 4)myArrres8: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(3, 5, 6)

Removing elements by index can be done using the .remove(x) method by simply inputting the index you want to remove with x . You can also pass a range into this method to remove the indices within the range: .remove(0, 3) will remove index elements 0 and 2.

Mapping & Filtering

source

More often then then not we want to filter elements in a list or map them. Before we look at either of those, take a look at how we use loops in Scala to iterate through an Array.

for (n <- myArr) {
println(n)
}
3 5 6

The code above will run a for loop that iterates through each element in our Array and prints out each element inside the Array. The use of <- tells Scala that we want to iterate through myArr and for each n (element) in myArr print n . Indentation is not necessary as using {} will denote the beginning and end of the code block.

Mapping will iterate through each item in our list and transform it into a different element. This is great if we want to change the values within our array all at once.

Here we will multiply each element in myArr by five using the .map() method:

myArr.map(n => n * 5)res22: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(15, 25, 30)

Filtering will return a subset of the original data or array if it meets the criteria or conditions we instate. There will be many times where we want to use filter to grab data or find certain elements within an array or dataset. I want to take myArr and filter it out so that it will only return the numbers that are divisible by two.

myArr.filter(n => n % 2 == 0)res26: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(6)

The above code iterates through myArr and returns the number that is divisible by 2 (basically even numbers). We can also combine mapping and filtering together to multiply our list and check for even numbers. I will actually append some numbers randomly onto myArr so we can make it interesting!

Here we are appending multiple elements to our Array:

myArr += (10, 3, 7, 5, 12, 20)res30: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(3, 5, 6, 10, 3, 7, 5, 12, 20)

Now we will combine mapping and filtering together to return the even numbers within myArr :

myArr.map(n => n * 5).filter(n => n % 2 == 0)res31: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(30, 50, 60, 100)

We can also do the same thing for odd numbers by changing n % 2 == 0 to n % 3 == 0 .

Mapping and filtering is essential to the Data Science workflow, a technique that is used every time we work with a dataset.

Scala is a great tool to have in our arsenal as Data Scientists. We can use it working with data and building machine learning models. This introduction to Scala only covers the very basics. Now it is up to you to dig deeper into the language.

--

--