Data Engineering
How to use Metabase Metadata to facilitate Data Discovery
Getting the most of your Metabase Production Database
Metabase is the best open-source BI tool.
For those who are unfamiliar, Metabase is an open-source business intelligence tool with both a query builder and a native query option. Being an open source tool, it’s a great MVP and, for many organizations, it’s a great tool to build on top of and really make your own on account of its open-source code and options for self-deployment.
One key way that I have built on top of Metabase is by tapping into its production database — which is either Postgres or MySQL, depending on your configuration — to really extend a self-serve data strategy. Here’s how to do it.
Before diving in, a few notes up top:
- This is not an exhaustive list of what you can do with the production data, but rather the motivation to get started yourself!
- All examples are using the Postgres option, so there may be some oddities that I’m not fully aware of when using a MySQL database.
- And, lastly, I use leading commas. 😬
Why bother with the production database?
Well, because that’s where all the good stuff is! The chances are that if you’re using Metabase you won’t have great usage statistics coming from your database or data warehousing system because all of the usage will be funnelled through the Metabase system user. Because of this, you’re leaving a lot of usage data on the table; usage data that could help you better understand which data is most important, and to whom.
In short, this data can be used to:
- Monitor your self-serve usage, and measure the impact of internal experiments and trainings: Who is using Metabase now and how often? And are these users just viewing questions or are they also creating new questions?
- Aid in data discovery and highlight needs for better documentation: Which tables are viewed the most and could they use better documentation?
- Identify database objects for deprecation and Metabase questions for archiving, to clean up the clutter: Which tables are not being used as often now?
- Investigate impact of data model updates and, in some cases, automate data model migrations. What will the effect of changing “opened_at” to “created_at” in the orders table?
The following are a few queries that, out of the box, you can start using on your own Metabase production database.
Who is using Metabase?
Let’s start off with something simple to answer our first point above about monitoring self-serve: how many questions are being viewed and how many unique users are viewing questions?
The next level of this type of query could look something like this. In addition to questions viewed we include data about tables viewed, questions created, tables used in creating those questions, unique creators, and more.
Rather than just focusing on views, which is a low threshold for usage, this helps us understand how many power users are using Metabase, and in what ways.
What are the most frequently used tables?
Now on to point two: how can we help with data discovery? The key is knowing which tables are being referenced in your questions.
For simplicity theses query only focus on “GUI” questions. Native SQL questions are important too, but in general these are done by power users so the problem of data discovery is often more complex.
A more detailed version of this query adds, in addition to count of Metabase questions, count of creators, count of views, count of unique viewers, and more. Similarly to the above, this helps give a more complete understanding of those tables are their usage.
Conversely, these queries can also be used for points 3 and 4 above. If you identify how often a table is used, it helps illustrate the potential extent of a table definition change or a deprecation.
What groups of people are using Metabase most?
Another important thing can be understanding whose activity it is and who is using these most commonly used tables. A great way to do that is by adding context to users by way of their permissions groups.
If you have finance team members, for example, you can identify them as a part of a permission group. By extensions this could then be used to filter activity or top tables.
Conclusion
Hopefully this small introduction has convinced you that there’s a lot of potential in your Metabase production database. Whether you use it like the above by running the occasional query to find out about your users, or you load the data into your own warehouse and run some automated reporting on it, the data has a lot of quick value add for the average data team.