Run or Walk (Part 3): >99% Accuracy Neural Network Classifier for Detecting Motion Activity

Published in

Towards Data Science

7 min readSep 4, 2017

Read previous parts:
- Run or Walk (Part 1): Detecting Motion Data Activity with Machine Learning and Core ML
- Run or Walk (Part 2): Collecting Device Motion Data the Right Way

After collecting a solid amount of walking and running data in the previous part, I was ready to go further and make use of it. My goal was to design and train a machine learning model which accurately predicts the type of user activity based on this data.

Framework and model selection

Thinking about what machine learning framework to choose for this task, I referred to the list of frameworks supported by Core ML. This dramatically narrowed the list of the candidates, since only a few of them are natively supported.

Everyone who tries to solve a problem with machine learning faces the challenge of variety. A variety of models the one can use is vast. Selecting the most appropriate one is critical because usually, you have time and resources to evaluate only a couple of them, not all.

The hype around a deep learning is huge these days and you can’t ignore it when thinking about the perfect model for your project. Well, probably I won’t need a really deep neural network to solve this problem — I thought — but at least it’ll be a move in the right direction. “Shallow” artificial neural networks are also known for their efficiency in solving multiclass classification problems.

Considering the fact that feed-forward neural networks in Core ML are supported only by Keras framework, my choice was straightforward. This framework is designed with a focus for rapid experimenting and can be used on top of TensorFlow, CNTK, or Theano, therefore, was a perfect option for me. I selected TensorFlow backend for Keras as the one I already made some experiences with and was comfortable understanding its concepts.

Saying in advance about my adventure working with such setup, I can confirm that it allowed me to experiment fast and made myself concentrate on design and not the implementation of the model.

Model Architecture: input layer

In feedforward neural networks everything starts from the input layer. It was a time to take a break and look at my data again so that I can define the number of input neurons in the input layer.

Initial format of my dataset was following:

This was a good time point to think about how it’s theoretically possible to predict the type of motion activity based on sensor data? It’s not feasible to do based on a single sample, but taking into account multiple data samples will make it achievable to detect a pattern.

How much time does an average human need to make a single natural wrist movement when walking or running? Between 1 and 2 seconds. Considering this, training data has been collected with ~5Hz frequency, I needed only to select ~10 samples for each learning iteration. That would give me a 10 by 6 matrix as an input for my neural network

This is, however, not acceptable, since an input layer of a feedforward neural network must be a column matrix and not an m by n matrix. It meant that the idea of combining data from all sensors in a single learning iteration was not achievable and, so, I had to use 6 models: for 2 sensors * 3 axes each.

When using those models for predicting motion activity types, I could apply ensembling techniques to evaluate their output and generate a single prediction.

This approach required me to transform my initial dataset into 6 separate ones — each representing an axis either of accelerometer or gyroscope and containing 12 sensor samples (~2 sec. observation time) per row. This “magic number” was derived from numerous experiments on the latter stages, where I tried to find a balance between model accuracy and the number of its inputs.

The format of one of the final datasets: rows represent continuous sensor measurements equivalent to ~2 sec. of observation. Column ‘y’ represents a label for each row.

Model Architecture: hidden layers

I decided to start with a neural network containing only 1 hidden layer, evaluate its performance and move further adding additional hidden layers if needed. Such approach of starting with a basic model and then gradually adding complexity guaranteed me the absence of hard-to-identify issues when I was training the model.

Having a single hidden layer resulted in an accuracy of 92,5%, adding one more gave me 97,2% and neural network with 3 hidden layers produced at the end 99,2% accuracy. Appending more than 3 hidden layers had either no effect on accuracy or reduced it. I used 10-fold cross-validation on test data here and in all further experiments to derive accuracy numbers.

It’s arguable, whether such a neural network can be considered a “deep” one, but most importantly I was able to find an optimal number of hidden layers.

Model Architecture: other hyperparameters

There were few other hyperparameters available which I tried to tweak with help of grid search: number of neurons in hidden layers and their activation functions. The search procedure showed that the highest prediction accuracy had a network with 15 neurons in each hidden layer and rectified linear unit activation function.

I selected categorical cross-entropy loss function for its ability to increase a network’s learning speed independently of defined learning rate. ADAM optimizer was chosen for my model for computational efficiency and delivering adequate results for the kind of problem I was trying to solve.

Significantly less data, almost the same accuracy

When I was ready with the architecture of my neural network, I wanted to prove hypothesis about where lays the lower limit of the data amount the network needs to perform accurately.

By data amount, I understand the number of data samples fed into the network when predicting a single example. Remember, I mentioned a number of 12 (~2 seconds of observation)? I selected this number by comparing network’s accuracy when using lower and higher amount of data:

17 samples: 99,32%
12 samples: 99,23%
6 samples: 97,93%

Even when 12 samples option has a minimally lower accuracy than 17, it greatly reduces the amount of input data consumed by the network, so, drops the amount of computation power needed to train it. Impressively, 6 samples per prediction (~1 sec. of observation) allow the neural network to perform quite accurately as well!

I decided to stick to 12 samples since it gives the model more data to generalize better, and, additionally, allowed me to have a dataset of 7387 examples out of total 37777 data samples for each sensor/axis.

Proving another hypothesis

Different wrist, no training data

What if I would have collected all my training data solely on one wrist, trained my model on it and then asked if to predict activity type from the other wrist? An interesting question I spent little time to find an answer to.

It turned out, the neural network is able to predict a correct activity only in 91% of all cases. Well, not a tremendous result, but it still shows how well a relatively simple neural network could detect patterns in data regardlessly of its sign!

Low predictability of gyroscope’s axis Y

A model trained exclusively on data from gyroscope’s axis Y has “only” a 85% prediction accuracy. This proves that this axis doesn’t contribute a lot to an overall accuracy when one tries to predict either walking or running: human’s hand wrist doesn’t make movements reflecting in strong patterns around this axis. Or at least I move my hands in the way that it leaves no chance for gyroscope’s axis Y to be used for reliable predictions.

Results from other people

I got interesting insights from contributors on Kaggle, who played with my full dataset: whatever model or approach they used, the accuracy was always top!

After deeper looked at their implementations, I realized that all of them fed the dataset “as is” into their models while training — sample by sample, without comprehending that a single sample in time-series data worth nothing in terms of detecting patterns in the whole series.

A dataset which should have been used may look similar to this one.

It was a good reminder to not attack a problem without spending the time to understand its basics.

What’s next?

If you’re interested in how the model described in this post looks in the real world, you can find it on Kaggle.

Stay tuned for last article in this series, where I show how the neural network has been imported and used in the iOS app with help of Core ML.