Understanding Swift for TensorFlow

Published in

Towards Data Science

8 min readApr 28, 2018

Swift for TensorFlow was introduced by Chris Lattner at TensorFlow Dev Summit 2018. On April 27, 2018 Google team has made its first release to public community on their GitHub repository. But Swift for TensorFlow is still in its infancy stage. And it seems to be too early for developers/researchers to use it in their projects. If you are still interested in trying it out then install Swift for TensorFlow’s snapshot from Swift Official website.

In this article I’ll focus on explaining the following topics:

Accessing Python APIs and PyValue (Python’s dynamic system for Swift)
Automatic differentiation system in Swift
Perform computation on Tensors in Swift for TensorFlow
Train a neural network

Swift for TensorFlow is very likely to evolve swiftly with open-source community like Swift and TensorFlow did independently. So, it might be a worthwhile effort to understand something about it.

Note: One major thing to note about Swift for TensorFlow is that it is a define-by-run framework. This means that although Swift for TensorFlow is creating graphs in the backend [like TensorFlow] but you don’t have to create sessions for executing these graphs. This approach is similar to eager-execution as in TensorFlow.

1. Major Features

Automatic Reverse Differentiation (Forward not implemented yet)
Define-by-Run design (no sessions required)
Swift optimized to include machine learning specific functions
Allows Python APIs access in Pythonic way
Includes PyValue type for Python’s dynamic system behavior

2. Python Interoperability

With Swift for TensorFlow we can use Python APIs in the most Pythonic way. To access a Python APIs one has to import Python into the program as in the following example code snippet.

import Pythonlet np = Python.import("numpy")  // akin to `import numpy as np`
let pickle = Python.import("pickle")
let gzip = Python.import("gzip")

And Swift for TensorFlow also has a new type called PyValue which exhibits the complete dynamic type system behavior of Python in Swift without affecting the behavior of other types in Swift.

var x: PyValue = 3.14159
print(x * 2)  // Prints "6.28318"
x = "string"
print("now a " + x)  // Prints "now a string"

For more information please refer to official Python Interoperability documentation.

3. Automatic Differentiation

Swift for TensorFlow has built-in support for computing gradients of functions with respect to other variables. This functionality has been directly incorporated into Swift’s compiler for optimized behavior. It supports two differential functions: #gradient(of:withRespectTo:) and #valueAndGradient(of:). Although it didn’t work for me 😩 (it’s too early) but the documentation says the following syntax to follow. This code snippet is obtained from official documentation.

@differentiable(reverse, adjoint: dTanh)
func tanh(_ x: Float) -> Float {
  // ... some super low-level assembly tanh implementation ...
}
func dTanh(x: Float, y: Float, seed: Float) -> Float {
  return (1.0 - (y * y)) * seed
}// Get the gradient function of tanh.
let dtanh_dx = #gradient(of: tanh)
dtanh_dx(2)
// Get the gradient function of foo with respect to the first parameter.
let dfoo_dx = #gradient(of: foo, withRespectTo: .0)
dfoo_dx(3, 4)

Currently only automatic reverse differentiation is allowed and forward differentiation is in discussions.

4. Working with Tensors

As a simple example we will create a Tensor instance on which we apply some operations using TensorFlow.

import TensorFlowvar x = Tensor([[1, 2], [3, 4]])
for _ in 1...5 {
  x += x
}
print(x)  // Prints "[[32.0, 64.0], [96.0, 128.0]]"

In above code used the basic + operator which adds the Tensor to itself in a loop. This becomes possible because of Swift’s Advanced Operators feature that overloads for Tensor instances.

5. Train a Simple Feed-Forward Neural Network

Moving on to the neural network—true reason for the popularity of machine learning in this era. In this section we will teach our 3-layer fully connected feed-forward neural network to predict the digit in images from MNIST dataset. To begin with TensorFlow in Swift import TensorFlow in your Swift file first.

Note: The training code can be found here if someone’s impatient.

Our neural network will have 3 layers:

Input layer: It presents the input data (pixels values in our case) to the neural network. And we will have 784 values for each image in our case.
Hidden layer: It computes the affine transformation on our input data with weights and biases. Then a sigmoid activation function is applied on the transformation. The hidden layer in our example will have 30 units (neurons).
Output layer: The data from hidden layer again goes through affine transformation and an application of sigmoid function like before forming the output layer. This is where the prediction happens. We have 10 units in this layer with each representing the probability of being a specific digit in image. Also note that we use one-hot/1-of-k coding in this layer in which a single value is 1 except all other values are 0 in a one-dimensional tensor. For example, [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] represents two digit in our image at output [prediction] layer.

The affine transformation is basically the dot product of data with weights and then the addition of biases further followed by the element-wise application of activation function. Following is the equation for affine transformation of input data x.

O(x; W, b) = f(W•x + b)

Here, O(.) is output function, f(.) is activation function (sigmoid in our case), W is weight matrix, b is bias vector, and • represents dot product.

We use sigmoid activation function because it squashes the values to the range [0, 1] restricting the output range and thus provides the probability of possible digit in image at output layer.

5.1 Reading MNIST data

Let’s read the dataset and construct Tensor instances of these images and labels. We have to create Tensor objects because this is what a TensorFlow model (neural network) allows to flow through it hence its name.

let (images, numericLabels) = readMnist(imagesFile: imagesFile,labelsFile: labelsFile)
let labels = Tensor<Float>(oneHotAtIndices: numericLabels, depth: 10)

5.2 Hyper-Parameters

We define 3 hyper-parameters: learning rate, training loss, and iteration steps.

let iterationCount: Int32 = 20
let learningRate: Float = 0.2
var loss = Float.infinity

5.3 Trainable Parameters

Next we create 2 weight matrices, 2 bias vectors in terms of TensorFlow’s Tensor type as following.

var w1 = Tensor<Float>(randomUniform: [784, 30])
var w2 = Tensor<Float>(randomUniform: [30, 10])
var b1 = Tensor<Float>(zeros: [1, 30])
var b2 = Tensor<Float>(zeros: [1, 10])

5.4 Training Loop

The training loop is the block of code where the neural network does its learning. We pass the images and labels Tensor through the network (forward pass). Then computes the errors in prediction and then back propagates them to compute the gradients of trainable parameters. Next we descent these parameters using their corresponding gradients with the help of learning rate. Finally the loss is computed giving the notion of how far we are from the images’ true labels. Each of these steps are described as follows.

5.4.1 Forward Pass

As already discussed above the input image pixel values go through affine tranformation. Here the values are dot product with weights and then add the biases which further goes through sigmoidal activation function (applied element-wise).

let z1 = images ⊗ w1 + b1
let h1 = sigmoid(z1)
let z2 = h1 ⊗ w2 + b2
let predictions = sigmoid(z2)

One thing to note here is the Swift uses ⊗ unicode to represent dot product which shows that how cool Swift language actually is! Frankly I really love ♥️ this programming language.

5.4.2 Backward Pass (Compute Gradients)

The backward pass computes the error between predictions and true labels. These errors are then back-propagated through the network computing the gradients of learnable parameters.

let dz2 = predictions - labels
let dw2 = h1.transposed(withPermutations: 1, 0) ⊗ dz2
let db2 = dz2.sum(squeezingAxes: 0)
let dz1 = dz2.dot(w2.transposed(withPermutations: 1, 0)) * h1 * (1 - h1)
let dw1 = images.transposed(withPermutations: 1, 0) ⊗ dz1
let db1 = dz1.sum(squeezingAxes: 0)

5.4.3 Descent the Parameters

Now we descent the parameters with their gradients and the learning rate which decides the velocity with which the neural network learns it parameters so as to predict true-er values when next time the input image is fed to it.

w1 -= dw1 * learningRate
b1 -= db1 * learningRate
w2 -= dw2 * learningRate
b2 -= db2 * learningRate

5.4.4 Update the Loss

We update the loss value to see that how close we are to true labels so as to predict the digit images more correctly next time.

loss = dz2.squared().mean(squeezingAxes: 1, 0).scalarized()

Let us now print our loss which tells that how well we perform on learning to recognize digit images from our training set. Lower the loss better is our network at recognition task.

print("Loss: \(loss)")  // Prints "0.1"

6. Summary

In this article we learned about Swift for TensorFlow and how easy is it to use it since Swift is much similar to Python and looks like scripting language but is pretty fast. We saw that Swift for TensorFlow allows us to use Python APIs and also that Swift’s compiler has been deeply optimized to built-in support for automatic differentiation which is something very important for machine learning tasks. We also saw how to use TensorFlow with Swift and we created our Tensor instances and played with them a little (using basic operator +). Finally we trained our 3-layer neural network to solve the traditional problem of recognizing digit images.

7. Discussion

It seems like the name should’ve been TensorFlow for Swift instead of Swift for TensorFlow. It’s not the case because actually Swift’s compiler has been modified to support TensorFlow therefore Swift doesn’t only act as wrapper around Python libraries and TensorFlow but more like machine learning language now. To keep the workflow consistent throughout the machine learning and data science community (since Python is heavily used) it also allows access to Python APIs in Pythonic way and also has a new type for Python’s dynamic system type behavior of instances.

As last words, Swift for TensorFlow is being built by Google so it is very likely that it will become famous in the coming times. Also it is trying to take best capabilities of original TensorFlow implementation such as eager-execution.

8. References

[1] Swift for TensorFlow, Google

[2] Swift.org, Apple

[3] Python Interoperability

[4] Automatic Differentiation in Swift

[5] The Swift Programming Language (Swift 4.1): Advanced Operators

[6] Swift for TensorFlow: MNIST Example

If you find this article useful/knowledgeable please hit some claps 👏 so that others can find it too or you may also share it on social networks. You may also drop a comment below if you find some error in my explanation (maybe I explained it wrongly) or if something isn’t clear to you from this article.

Keep [machine] learning till you’re fossil fuel! 🤘🤖