Two methods using Tensorflow and Keras

In this article we will demonstrate two different methods of using autoencoders. In especial, we will try to classify credit card transactions into fraudulent and non-fraudulent by using autoencoders. The dataset we are going to use is the "Credit Card Fraud Detection" dataset and can be found in Kaggle. The full code is available on GitHub. In it there is a link for opening and executing the code in Colab, so feel free to experiment. The code is written in Python and uses Tensorflow and Keras.
The dataset contains 284,807 credit card transactions from european cardholders. For security reasons the original features of the dataset are not available. What is available are 28 features that are the result of PCA of original features. There is also, the amount of each transaction and a "Time" column. The last one counts the number of seconds between each transaction and the first transaction in the set. Finally, the type of each transaction is in column "Class". Fraudulent transactions are represented with 1 and non-fraudulent with 0. The dataset is highly imbalanced with non-fraudulent transactions consisting of 99.8% of the total. Thus, our classification problem can also be seen as an outlier detection problem with fraudulent transactions being considered as outliers.

For our case we will ignore the "Time" column. The standard way of splitting in train and test dataset will be used for evaluating each method. Because there are too few cases of one class, we will split our dataset in half instead of the more usual 70%-30% split.
As stated in the beginning, we will use autoencoders for our Classification task. According to Wikipedia:
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner.
In simple terms, an autoencoder is a neural network that is trained to recreate as output whatever it is fed as input. Between the input layer and the output layer there is a stack of hidden layers. Right in the middle there is a layer that contains fewer neurons than the input. The output of this layer is the result of the so-called encoder part of the autoencoder.

The reasoning behind the use of autoencoders, is that the hidden layers will map the input to a vector space in a nice way (whatever "nice" might mean). By using the outcome of a layer with few neurons, we map our input from a high dimensional space to one with fewer dimensions.
First method: Using reconstruction error
Our first method is to create an autoencoder and train it only on the non-fraudulent transactions. It is logical to expect that the reconstruction error of this autoencoder will be higher for the fraudulent case than the non-fraudulent. I have read about this technique in the excellent Medium article of Venelin Valkok. I suggest that you study it since in it the method is explained in a very detailed and easy to follow way.
Credit Card Fraud Detection using Autoencoders in Keras – TensorFlow for Hackers (Part VII)
For demonstration reasons we will create a simple autoencoder consisting of three layers. Input layer, one hidden layer and output layer. The hidden layer will have 12 neurons.

As stated before, the net is trained only with the non-fraudulent cases of the train set. After 100 epochs, we obtain a network that we fed with all the cases of train set. We can then calculate the error (reconstruction error) between the input and the output. The result is shown in the table below.
+----------------+--------------------+--------------------+
| Class | Mean Squared Error | Standard Deviation |
+----------------+--------------------+--------------------+
| Non-Fraudulent | 0.767519 | 3.439808 |
| Fraudulent | 29.855354 | 43.107802 |
+----------------+--------------------+--------------------+
Based on these results, in the test dataset we will characterise an instance as fraudulent if its reconstruction error is greater than three times the standard deviation from the mean i.e. greater than 0.767519+3*3.439808=11.078922. Of course the selection of the threshold is a hyperparameter of our model and in a real application it should be fine-tuned.
We can see that our model detects 113 out of 243 (46.5%) fraudulent cases of the test dataset. Also, 771 out of 142161 (0.5%) non-fraudulent cases are classified as fraudulent.

Second Method: Encoder and k-NN
In our second method, we will use the encoder part of an autoencoder. The encoder will map the instances into a low dimensional space and k-Nearest Neighbors (k-NN) will be used for the classification. In this method both fraudulent and non-fraudulent transactions will be used to train the encoder. It can be said that the encoder will be used for dimension reduction speeding up this way the execution of k-NN.
We will use the same model as in the first method. The input layer and the inner, hidden layer with 12 neurons will be the encoder part.

For the classification part, all the instances (both from train and test sets) will be mapped to a 12-dimensional space with the encoder. For each instance in the test set, the three closest neighbouring cases of the train set will decide whether it is fraudulent or not.
The second method detects 184 out of 243 (75.7%) fraudulent cases of the test dataset. Also, 12 out of 142161 (0.008%) non-fraudulent cases are classified as fraudulent.

Ending Remarks
We ‘ve quickly seen two methods using autoencoders for classification. You are welcome to experiment with the Colab code. Things to try out are:
- changing the error threshold in the first method
- adding more layer to the autoencoder
- changing the number of neurons in the final layer of the encoder
Further reading
- Medium article of Venelin Valkok explaining in more detail the first method https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorflow-for-hackers-part-vii-20e0c85301bd
- Article in Packt on the second method https://hub.packtpub.com/using-autoencoders-for-detecting-credit-card-fraud-tutorial/