Easy data mapping using AirMap

AirMap is a data mapper powered by Airtable

Erik Yan
Towards Data Science

--

Image by Pixabay

Easily keep track of data dictionaries, mappings, sources, columns, and validations. AirMap is a data mapper powered by Airtable. Check out the Github repo for more info.

AirMap is a data ETL tool created to help manage data flow using cloud-hosted documentation tables. This allows us to use a centralized “source of truth” for any information we need to know about our data (requirements, sources, validations, etc). AirMap leverages the documentation tables we create in Airtable to handle data mapping, merging of data sources, and everything in between so that you don’t have to. No need to constantly update your data pipelines to match new data set requirements. Simply update the master mappings in Airtable and let AirMap do the rest.

The benefits of AirMap:

  • Airtable centralizes all of your data’s information, everything you need in one place.
  • Update master data mappings in Airtable, AirMap will automatically update your pipelines to match what you have in the cloud!
  • No need to send static documents back and forth, easily collaborate with others via Airtable’s easy-to-use web/desktop GUIs.
  • Quickly manage, control, and share your data mappings, validations, and data dictionaries.

Want to try AirMap for yourself? You can download a copy of the demo here.

All of your data sources in one place

We can use Airtable to keep track all of our data sources. Providing centralized documentation of your data sets and their origins.

AirMap uses this information to validate data in your pipeline against the requirements you outline in Airtable.

Easily track and update data requirements

Airtable’s database-like structure allows us to create links between data sources and their corresponding columns.

This means we can ensure we manipulate our data using the correct data details, validations, and/or requirements. Each record is unique, allowing AirMap to avoid costly human errors.

Quickly design and deploy data mappings

Airtable’s excel-like GUI allows users to easily update and create data mappings. This is especially useful when incoming (or outgoing) data set requirements are constantly changing.

How to use AirMap

AirMap is easy to integrate into existing data pipeline structures. Aggregate your data sources as you normally would, and then pass the data to AirMap.

You can try testing AirMap using the sample data and Jupyter Notebook I’ve provided in this Github repo.

AirMap reads through the mapping structure you create in Airtable to generate the expected data set output format.

Retrieves the designated data map from Airtable and transforms input data to generate a mapped output
Resulting data from AirMap

You can also easily review your data mapping requirements directly in your Python environment. This allows us to double check our data sources, verify requirements, and ensure that we pass the correct datasets through AirMap.

Queries Airtable for a chosen data map’s mapping details
Resulting data map details of the ‘Project A — Client Summary’ data map

The current version of AirMap is a proof of concept. Additional features for data validation and (potentially) data cleaning will be added. Thanks for reading!

--

--

I’m a Data Engineer with a passion for building solutions that make data accessible and digestible for others. I enjoy organizing, and analyzing data.