Co-authored with Sarah Krasnik.
In 2021, the Modern Data Stack is the talk of the town. As predicted by Tristan Handy last year, there’s a "Cambrian Explosion" of data tools taking place. As companies and open source projects scramble to fill in the blanks, the very ways that insights are developed and delivered is being reshaped. Across all the newcomers, one thing is becoming more and more clear: Flexibility is king. Dashboards and point-and-click BI tools, in their inflexible glory, are increasingly not enough to cope with the demands of data analysts and consumers.
A brand new generation of tools to build Data Applications will rise to replace them, delivering dashboard-like ease of use with vastly more flexibility on both the backend and the frontend.
What’s wrong with the status quo?
In 2021, as the rest of the data stack matures at breakneck speed, old-school dashboards are no longer sufficient as the standard output of Data Analysis.
They are often insufficient for the data analyst, forced to fit their work into a series of context-lacking tiles. They are almost always insufficient for the data scientist, unable to easily pipe the results of their modeling back into a point-and-click dashboarding tool. They can be insufficient even for the professional dashboard builder, doomed to eternally try and "craft a narrative" from a page full of square tiles and inflexible filters.
BI tools, for all their great power, have hard limitations. Users can easily read data, but that user-friendly interface is a double edged sword that hinders both the flexibility of reports and their level of interactivity. They are also typically limited to SQL based analysis, closing the door on a massive section of the work done by data teams in other languages.
The main sticking power of dashboards is that generally, they are actually pretty sufficient for the consumer of the dashboard— or at least more sufficient than available alternatives. For stakeholders that need to act on data, dashboards are standardly laid out, very easy to consume, and most importantly, require zero technical overhead to load and read.
This is in contrast to something like a Jupyter Notebook, which is excellent for more complex data analysis, but realistically cannot be shared with non-technical stakeholders. The same goes for a myriad of other clever ways to deliver data analysis: You can build amazing things, but they’re not useful because no one can use them. And everyone can click a link to open a simple dashboard.
But everyone can also use Excel spreadsheets, Google Docs, Powerpoints, and PDFs. The inadequacy of BI tool dashboards as a vehicle for rich analysis doesn’t prevent that analysis from happening – It just creates a "black market" of ad-hoc data work within most organizations. Look your webcam in the eyes, dear reader, and swear out loud that you have never sent or received a ModelOutput-final(1).csv, or taken a screenshot from a Jupyter Notebook to copy into a slide, or downloaded a report from your BI tool and opened it in Excel so that you could just tweak that one thing it wouldn’t let you do.
I thought so.
This is the true problem of relying on just dashboards as the standard output of the data world. It’s not only that they are themselves insufficient as a vehicle for complex data analysis. It’s that their shortcomings provoke a chaotic underworld of even worse data products that are often static and stale.
Data Applications
So what’s a better way? It’s actually not that groundbreaking. Everyone on the analysis side of the equation wants more flexibility, and everyone on the consumption side wants simplicity – though not at the expense of interactivity. Both sides can agree on the fact that it’s confusing to have a tornado of unversioned loose documents flying around.
"Data Applications" is an umbrella term for a variety of different data products that allow for both rich analysis and simple presentation. Data apps can look very diverse – a simple point-and-click dashboard, a story-style document with live charts, a two-button transformation tool – but they always present a simple and concise output that belies a very flexible analytics backend. Like websites, data applications can stand alone and be easily shared with wide audiences.
Excel spreadsheets are actually a great example of a (rudimentary) data application! Rich flexibility and complexity is available to the technically advanced analyst, but a dead simple yet interactive interface is surfaced to the end user. Maybe Excel isn’t the most cutting-edge example, but it serves to highlight the range of possibilities for what can be considered a data application.
Some large organizations have solved for this by building bespoke data apps, with home-rolled infrastructure and frameworks created by their own engineering teams. While these one-off data applications can be customized to meet any need, the overhead of developing and maintaining them is extremely high. This approach is just not worth it for 99% of teams.
Data teams don’t need access to an army of web developers waiting to turn their notebooks into full-fledged web apps. They certainly don’t need to learn React themselves to bring that capability in-house. Data teams just need a framework to do analysis and share it themselves; the most productive solution is in the data application tooling space, not in artisanal one-off web apps.
The new kids on the block
There is a new generation of tools emerging in this space that empower data teams to quickly build data applications for the rest of their organization. In addition to allowing analysts to create more flexible interfaces to data, these tools closely couple analysis and output without creating fragmentation across multiple tools. This means data practitioners can use any languages and frameworks they want without needing to detach their work from reality by screenshotting or exporting it.
Shiny has helped R users do this for years, allowing interactive apps to be built without writing a single line of non-R code. Streamlit and Dash let users do the same thing without leaving Python, which has become the most popular scripting language in the data world. Hex takes a hybrid approach, letting users do analysis in Python and __ SQL, then use a drag-n-drop app builder to construct data applications.
All these tools aim to bridge the sharing gap between technical data practitioners and less technical data consumers without either party having to make sacrifices. This is the promise of data applications – Rich analyses, easily shared and consumed outputs, low overhead.
A data (application) driven future
Data practitioners deserve to keep working with any languages and frameworks they are familiar with, without having to resort to one-off exports, orphaned screenshots, or inflexible BI tools when it comes time to share results with stakeholders.
Data consumers deserve easy access to operationally useful and interactive data products. They should never need to ask the data team for a new csv dump with one input changed. They also shouldn’t have to consume every single report via the same tired newspaper dashboard layout!
Standard dashboards deserve a break! It’s been a long few decades of trying to meet everyone’s expectation of what a good data report should be. They can take a rest and go back to doing the basic top-line BI reporting they excel (heh) at. They’ve certainly earned it.
Loose csv files and stale slide decks deserve… whatever’s coming to them. No mercy for bad data.
As the Modern Data Stack continues to expand, data teams will have more options than ever when it comes to working on and sharing analyses. The rise of data applications and the tools to build them will keep data analysis collaborative, interactive, and above all else: actually useful.
NB: Davis Treybig recently wrote an excellent piece for TDS that covers similar territory, with an angle towards end-user facing / productized data apps. It’s a great follow up read, but talks about a totally different type of data product.
Used any of these tools, or another one I forgot to mention? Think that I did csv files a disservice? I would love to hear any and all feedback. Tweet me @isidoremiller!