[UPDATE: I have started a tech company. You can find out more here]
Millions of dollars are spent each year by organisations on travel expenses of which are large proportion is variable and difficult to estimate. Apart from the Air Fare and the Accommodation which are known at the time of the booking, additional expenses such as meals and incidentals are unknown and can have a big impact on the total expenses.
In this article we are going to build a model in minutes that can work out the variable expenses based on the employee’s title, purpose, destination, accommodation and air fare with 95% accuracy on held out data and without writing a single line of code. The data preparation and model building will be handled by AuDaS, the Automated Data Science team in a box, built by [Mind Foundry](http://mind foundry.ai).
Data Preparation
The expenses data was downloaded from the Ontario Pension Board and contains expense claims for its employees since 2010. The goal will be to use the data up to 2017 to predict 2018’s total spend on expenses.

Feature Engineering
The dataset contains the start and end dates of each trip so we going to ask AuDaS to calculate the duration as this will probably have a strong impact on the total spend.

We are then going to remove the $ signs from the data for formatting purposes using a RegEx transform.

We are then going to drop the columns including variable expenses to avoid data leakage as they are already included in the total spend which we wish to predict. After doing this, AuDaS has detected that there are some missing values and is suggesting advice to the user on how to correct it.

Throughout this whole process, AuDaS has added steps to the workflow which is kept as an audit trail. You can go back to previous versions of the data set or you can export the workflow. In our case we are going to export this workflow to our test set containing the pension fund expenses of 2018. This automatically reproduces the data preparation steps and will allow us to easily deploy our model on it once we have trained it.

Data Exploration
Now that we have cleaned the data we can access the histogram view to extract initial insights. We can also change the scale to see the values which are sparsely distributed.


Our immediate takeaways are that the most common destination is Toronto and the Board members travel the most. There doesn’t seem to be a key pattern which is why we are going to use Machine Learning to uncover more intricate relationships.
Automated Modelling
We are going to ask AuDaS to build a regression model to predict the total spend.

AuDaS automatically withholds a balanced 10% hold out of the training set for final validation purposes. It also trains the model using 10-fold Cross Validation to avoid overfitting. This guarantees that models trained by AuDaS perform well in production. Once we are happy we can now launch the training with the Start button.
The training is achieved using Mind Foundry’s proprietary Bayesian Optimiser, OPTaaS, which allows AuDaS to efficiently navigate the large search space of possible Regression pipelines.

AuDaS provides full transparency of the chosen pipeline, model and parameter values as well as performance statistics. AuDaS also provides feature relevance for the best found model.


In this case the Accommodation and air fare spend as well as the purpose and the destination city being London are the strongest predictors of variable expenses. The CIO title also seems to be a good indicator of total spend. Now that we are happy with the accuracy of the model we can view its model health. In our case it is good and we can confidently apply it to the test data set.

After the running the trained model on the test data set we are given the predictions for each entry. AuDaS automatically ignores the columns which aren’t used in the model training. This means that we can very easily compare the actual total spend with the predicted spend. To do this we can export the data and access it in excel.


After calculating the totals for each column we can see that AuDaS achieved an accuracy of 95%.
Why this is important
Being able to predict the total expenses with a certain degree of accuracy is extremely valuable for CFOs. In large corporations travel spend could reach $100M per quarter, and more. There can be considerable variation, and analysts would typically use spreadsheets to estimate budget requirement based on accumulated input from different departments. Department managers make significant efforts to accumulate information about open projects and past spending, adding some contingency margins. The same approach applies not only to travel expenses, but in many cost forecasting situations. With machine learning, business line managers and finance can have a much more accurate forecast in minutes. Most critically, these models can run live and provide advanced warnings when situations change. This avoids notorious travel and cost embargos during the vital last few weeks of the quarter, because the budget has already been reached.
Find out more
If you would like to see if AuDaS can solve your problem fill in this short [form](https://mindfoundryai.typeform.com/to/jVPCue). If you are curious about OPTaaS you can qualify for a free 1 week trial using this even shorter form!
A full video of this process can be viewed bellow:
You can read more AuDaS case study tutorials bellow:
Optimize your Email Marketing strategy with automated Machine Learning
Solving the Kaggle Telco Customer Churn challenge in minutes with AuDaS
Team and Resources
Mind Foundry is an Oxford University spin-out founded by Professors Stephen Roberts and Michael Osborne who have 35 person years in data analytics. The Mind Foundry team is composed of over 30 world class Machine Learning researchers and elite software engineers, many former post-docs from the University of Oxford. Moreover, Mind Foundry has a privileged access to over 30 Oxford University Machine Learning PhDs through its spin-out status. Mind Foundry is a portfolio company of the University of Oxford and its investors include Oxford Sciences Innovation, the Oxford Technology and Innovations Fund, the University of Oxford Innovation Fund and Parkwalk Advisors.