How I automated keeping track of my expenses

Keeping track of pocket money
When I was younger, I liked to keep track of how I spent my pocket money. I used to keep receipts, register the amount and category of the transaction on a spreadsheet, and every so often I could generate a pie chart (see graph below) to see the breakdown on where my pocket money went, and review which expense category I should spend less on if I needed to save more money for bigger purchases.
Sample Monthly Expenditure Breakdown
Things got easier when I moved from retaining receipts to getting digital statements (e.g. csv) which included information such as the amount and description of each transaction. Categorising transactions manually was boring, so I wrote a python script to automate this process, which did the following:
- Grab all transactions that had assigned categories (the training data)
- Train a machine learning (ML) model (comprising of a bag of words + random forest) using the training data, with the category as the target to predict
- Apply the model to new transactions to get the predicted category and save the data
Then I’d go and analyse the batch of transactions. But instead of processing all the transactions, I mostly focused on the ones with the highest transaction amount or the most frequently occurring descriptions, making sure their assigned categories were correct (amending those that weren’t), and added them to the pile of training data to improve the prediction on the next batch of new transactions. That way, with much less effort, although I wasn’t getting all the transactions correctly categorised, the ballpark of each category was good enough to give me a good feel of what I wanted to know with my expenses.
All of this was done several years ago, before financial institutions started offering analysis on account transactions and spending habits. So I don’t use my script as often anymore. That said, I always like to think of how to do things differently in hindsight, which helps me to learn and improve continuously.
Non-ML solution?
There are many ways to address this challenge, which is a general multiclass classification problem for short text descriptions. If you treat this as a ML problem, you could throw all the Data Science tools at it, with better ways of processing and representing the data other than the bag of words model, or using a different algorithm to random forest with better hyperparameter optimisation.
But taking a step back, was ML the best and only way (it is probably not easy to say no to this question for data scientists)? I was thinking about this after playing around with elasticsearch and I came across its fuzzy query function, and thought that this function alone might do the trick.
The fuzzy match function in elasticsearch uses the Levenshtein distance (which measures the difference between two string sequences) to get the closest matching entry. For example, given a new entry "HONEST BURGER", it requires less steps to change the text to match an "EAT OUT" entry like "BURGER KING" than it is to a "PUBLIC TRANSPORT" entry like "GATWICK EXPRESS". So we would expect that this method should have some degree of predictability, possibly comparable to the performance of ML methods.
I wrote a simple elasticsearch python client (here is the github repo) to see how feasible this non-ML solution was. The strategy was:
- Instead of putting the training data through any bag of words or other preprocessing methods, simply upload the training data to an elasticsearch server (which can be easily set up using the elasticsearch image on dockerhub)
- When it comes to predicting new transactions, do a fuzzy query to find the closest transaction on the elasticsearch server, and assign the corresponding category to the new transaction
Test drive
To put this non-ML solution to the test, I did some rough analysis in comparing it against some ML methods like Neural Network, XGBoost, and Random Forest, just to pick a few. My sample jupyter notebook on github contains the exploration code that uses a sample dataset (an anonymised sample of my bank transaction descriptions with annotated categories) to produce the results shown below.
Before I go on to the results, here are a few assumptions and caveats that are worth mentioning:
- The dataset is tiny with only around 450 entries, and assigned to just over 10 categories (see the graph below). Therefore it is expected that there will be quite some fluctuation in the assessment of performance of each model, even with cross validation
- There will always be room to improve models (and same with the non-ML solution). The aim was to have a rough comparison of the different techniques, not to compete and find the best performing one! So all the models were trained using basic default parameters (i.e. no hyperparameter optimisation considered)
- There are many different metrics to choose from for a multiclass classification problem like this. There’s a whole lot of debate about which is the best, but I just stuck with accuracy and balanced accuracy (which attempts to account for the imbalance in classes) in this case, as these are simple metrics that allow me to know how many predictions I got right out of all the data

Results
The graph below shows the performance of models trained (using 80% of the data) when applied to both the training and test set. Here are some points worth highlighting:
- This graph was based on just splitting the data to training and test once (i.e. not cross validated), so we need to be careful not to over-interpret the precise numbers (which are expected to fluctuate over a different split of the small dataset). That said, we can still get some high level insight
- As all the training data is loaded, it is guaranteed that the categorisation on the training data by the non-ML method (elasticsearch) is 100% accurate.
- The test performance of the elasticsearch method is comparable to the other ML techniques, even though it completely "overfitted" on the training data

To compare the predictions in a fairer way, KFold cross validation (CV) (n = 10) was applied, and the results of the accuracy and balanced accuracy are shown in the graph below. What we can see is:
- Once again the prediction of training data prediction using elasticsearch is 100% accurate without any spread, therefore is not visible on the chart (one orange line at 1.0)
- The spread of performance on the test sets is somewhat expected given a small dataset (with CV)
- While we shouldn’t over-interpret the exact numbers, this graph suggests that the test performance of the elasticsearch method is actually quite comparable to the other ML methods




Thoughts
While one can argue about how the ML methods can be made much more superior with more sophisticated feature generation, hyperparameter optimisation, or more layers of neural networks (one counterargument is you can also fine-tune the fuzzy matching as well!), what I like about the non-ML elasticsearch approach is:
- The simplicity of it (at least the indexing of the data behind the scenes are handled for you already by elasticsearch which is great!).
- Not needing to worry about reshaping the data for a ML model to consume
- Not needing to care about fine-tuning any model or worry about "overfitting"
- The prediction power is reasonably good (even with "overfitting")
- While other ML method can’t guarantee predicting the correct category on the training data, this non-ML method can guarantee 100% because it is simply "recording" training data instead of needing to "learn" and generalise
While ML methods are amazing and get solve a lot of problems, sometimes it is easy to overlook the simpler solutions that can also elegantly address the problem at hand. Also, if you sell a product that contains the "Machine Learning" or "artificial intelligence" buzzwords, it will most certainly sell better than something like "lookup closest match to your records"!
I’ve really just touched the surface of what elasticsearch can do, and am already loving it. Elasticsearch also has ML capabilities, which I will be further exploring in the future!
Links:
- Github repo for the elasticsearch Python client that I wrote
- Sample jupyter notebook in the repo that captures my exploration code results showed in this blog
Note:
- Article originally posted on blog.chilledgeek.com
- Disclaimer: the content and opinions expressed in this article are solely my own and do not express the views or opinions of my current or past employers