The world’s leading publication for data science, AI, and ML professionals.

Excel Won’t Go Away

Some thoughts on how to embrace it

Excel is one of the most widely used software packages in the world. It has been around since 1985 and has approximately 750 million users. Microsoft likes to call Excel formulas "the world’s most widely used programming language."

Excel is great at doing simple Analytics and helping people make sense of numbers. It is amazingly flexible and versatile. Excel is popular with users but often less so with IT departments. An example of this popularity was the backlash to a Wall St Journal article to stop using Excel – "You can have my excel, after you ripped it from my cold, dead hands."

Cold Hands | Photo by Matt Foster on Unsplash
Cold Hands | Photo by Matt Foster on Unsplash

What Does This Mean?

The downside of all this flexibility is that Excel‘s scope has exploded, limited only by the imagination. Need to build an accounting system, model a power grid, create a data storage system … you can do it in Excel. But just because you can do something doesn’t mean that you should. In many ways, Excel is the bane of corporate IT departments– with security and storage issues. Spreadsheets allow sensitive data to be easily shared, but they offer little or no protection. Ballooning storage issues results as multiple users across the company save their own version of the same Excel file.

Why Does This Matter?

You’ll come across countless articles telling you to stop using Excel now. That Python is much better / faster / more reliable for performing analytics. All of that is true. However, with around 10 million Python developers vs. 750 million Excel users, chances are you are going to have to use Excel in some capacity.

If our goal as Data Scientists and Analysts is to make data useful, we need to deliver our analysis in the tools that the stakeholder or decision-maker wants to use. Often one of these tools will be Excel.

The Right Tools | Photo by Oxa Roxa on Unsplash
The Right Tools | Photo by Oxa Roxa on Unsplash

What Makes Excel so Popular?

Excel is packaged with MS Office, which has helped it become the default analytical tool in offices around the globe. There is a massive base of Excel users, which means that there are plenty of resources to help you. There is plenty of help available online or from your office Excel expert.

Excel is intuitive, and it is easy to learn basic steps. Data is immediately visible, and there is no code to execute. You can type in information directly, which gives you a feeling of control. In Excel, adjusting data, sorting, filtering is all effortlessly handled. You can make formatting changes with the click of a mouse. You can even create surprisingly attractive charts and dashboards.

Using Excel is easier than learning to code. You can cut and paste, drag and drop, and most importantly, ctrl-z (undo) away any recent mistakes. There is a wide range of formulas that you can use. Pivot tables, SUMIF and VLOOKUP statements, and Sparklines make summarizing data easy. Flexibility means that you can do your own research or answer ad hoc queries – all without the need for support from IT or business intelligence. You can run simple regression models without too much trouble and run more sophisticated statistical models with a great deal of trouble.

Excel is continually improving. Tools like Power Query make it easier to load and transform data without manual effort. Interfacing with PowerBI makes Excel a much more powerful visualization tool. In the last few months, Microsoft released LAMBDA, which allows you to define your own custom functions using Excel’s formula language.

Popular | Photo by The Humantra on Unsplash
Popular | Photo by The Humantra on Unsplash

What’s the Problem with Excel?

There is a high cost to Excel’s flexibility. It is really easy to wreck a formula. Insert a row here, delete a cell there, or accidentally type data over a formula, and all of a sudden, the spreadsheet has an error. Even worse, the spreadsheet continues to function with the mistake hidden inside. Some statistics floating around the internet claim that 80%-90% of spreadsheets have an error. (I couldn’t find a reliable source for this. Maybe these percentages are based on an Excel error of their own).

The ability to link with other spreadsheets and documents makes Excel more powerful, but it makes error tracking even more difficult. Formulas with more than one function or layer are complicated to read, which adds to the error tracking headaches. Anyone who’s been handed someone else’s spreadsheet and told to fix the #REF! error will know exactly what I mean.

Using Excel can require a lot of manual effort cleaning data and formatting. This effort is often not reusable. Tools like Power Query and Macros can help, but you need to be a far more sophisticated user to use these.

When working with large data sets, some calculations like SUMIFS and VLOOKUP take an extraordinary length of time to complete. Your computer is essentially on lockdown while it’s calculating. You can switch calculations from automatic to manual, but this creates a new set of problems. Now you could be making decisions with data that is stale. There is also the potential for your computer to freeze or crash during the calculation, and all work is lost.

We have already covered some of the IT issues with Excel. Oversharing of data and storage requirements. No doubt we’d be shocked if we truly knew just how much critical infrastructure relies on Excel at some point in its processing.

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

What’s the Future?

The best solution may be a hybrid model. Python is the better choice for sophisticated data analysis. But what if your boss or stakeholder is one of the WSJ Excel diehards? There is an increasing number of tools that you can use to interface between Python and Excel. If you need something more advanced than exporting your final data frame as an Excel file or CSV, then here are some tools that are worth investigating.

These three packages can read and write Excel files, perform operations, and plot graphs.

The next two packages: xlwings and pyxll, allow you to call Python from within Excel. Pyxll uses Excel as a front end to run Python code in Excel. Xlwings uses Excel as the user interface to run Python scripts. Note that pyxll has a monthly subscription charge. Xlwings base model is open source, and the pro version has a license fee.

I expect that these tools will advance both in features and ease of use, making it easier to have the best of both worlds.


Related Articles