How to integrate Quickbooks with Python

Building a lightweight Quickbooks data integration pipeline using Python

hotglue
Towards Data Science

--

If you’re a B2B developer building a product, one of the earliest product development phases is creating a data integration pipeline to import customer data.

In this article, I’ll show you how to leverage Singer’s tap-quickbooks to extract data from Quickbooks. From there I’ll walk you through how to parse the JSON output data from Singer using target-csv and standardize it using a simple Python script.

Source: unDraw

Notes

The code for these examples is available publicly on GitHub here, along with descriptions that mirror the information I’ll walk you through.

These samples rely on a few open source Python packages:

  • tap-quickbooks: a Singer tap to extract data from Quickbooks. More info on GitHub. We’ll be using the hotglue fork because it has version compatibility with target-csv.
  • target-csv: a Singer target which converts input JSON data to CSV files. More info on GitHub. We’ll use the hotglue fork which uses updated dependencies.
  • singer-discover: an open source utility to select streams from a Singer catalog. More info on GitHub.
  • pandas: a widely used open source data analysis and manipulation tool. More info on their site and PyPi.
  • gluestick: a small open source Python package containing util functions for ETL maintained by the hotglue team. More info on PyPi and GitHub.

Without further ado, let’s dive in!

Step 1: Setup our environment

Create the virtualenv

Singer taps tend to have a lot of dependency conflicts with each other — to avoid dependency hell, I highly recommend running through this example in a virtual environment.

# Install JupyterLab if you don't have it already
$ pip3 install jupyterlab
# Create the virtual env
$ python3 -m venv ~/env/tap-quickbooks
# Activate the virtual env
$ source ~/env/tap-quickbooks/bin/activate
# Install the dependencies
$ pip install git+https://github.com/hotgluexyz/tap-quickbooks.git git+https://github.com/hotgluexyz/target-csv.git gluestick pandas ipykernel singer-python==5.3.1 requests==2.20.0 xmltodict==0.11.0 jsonpath-ng==1.4.3 pytz==2018.4 attrs==20.2.0
# Make our venv available to JupyterLab
$ python -m ipykernel install --user --name=tap-quickbooks
# Create a workspace for this
$ mkdir quickbooks-integration
# Enter the directory
$ cd quickbooks-integration

These commands may vary depending on your OS and Python version. For more info on venvs with Jupyter, check out this TowardsDataScience article.

Step 2: Configure the Singer tap

Get the OAuth credentials

First off, you’re going to need Quickbooks OAuth credentials. This process is already well-documented by Quickbooks, so I’ll assume you can follow that guide.

Create the Singer tap config

Now we have to create a Singer config. This will specify our OAuth credentials and some Singer specific settings. Their example config is of the following format:

{
"client_id": "secret_client_id",
"client_secret": "secret_client_secret",
"refresh_token": "abc123",
"start_date": "2017-11-02T00:00:00Z",
"api_type": "BULK",
"select_fields_by_default": true,
"sandbox": true,
"realmId": "123456789"
}

Fill in your credentials, and save this to a file called config.json in the local directory.

Run Singer discover

The first step of getting data from Quickbooks is to figure out what data is actually available. Singer taps offer a discover command which prints a JSON object describing all of this. Let’s run it now:

# Do the Singer discover and save to catalog.json
$ tap-quickbooks --config config.json --discover > catalog.json

If this worked successfully, your catalog.json should resemble this:

# Check discover output
$ less catalog.json
{
"streams": [
{
"stream": "Invoice",
"tap_stream_id": "Invoice",
"schema": {
"type": "object",
"additionalProperties": false,
"properties": {
"AllowIPNPayment": {
"type": [
"boolean",
"null"
]
},
...

Tell Singer what we want

From here, we want to select what objects we actually want to sync. We’ll use the singer-discover utility we downloaded earlier for this.

# Switch singer-python version to meet singer-discover dep
$ pip install https://github.com/chrisgoddard/singer-discover/archive/master.zip singer-python==5.4.1 prompt_toolkit==1.0.14
# Build our selected catalog
$ singer-discover --input catalog.json --output properties.json

This will launch an interactive utility to select what streams (objects) you want from Quickbooks. I am going to select Invoice (space) and press enter. This will prompt you the option to select specific fields. I’ll accept the default and press enter.

Selected fields for Invoice stream

This should give you the following output

INFO Catalog configuration starting...
? Select Streams [Invoice]
? Select fields from stream: `Invoice` done (18 selections)
INFO Catalog configuration saved.

Run Singer sync

We can now finally get the data from Quickbooks using the files we’ve generated, using the following command:

# Switch singer-python version to meet tap-quickbooks dep
$ pip install singer-python==5.3.1
# Get Invoice data from Quickbooks and save as a CSV
$ tap-quickbooks --config config.json --properties properties.json | target-csv > state.json

This will output two files:

  • the CSV containing the data from Quickbooks (something like Invoice-20210128T125258.csv)
  • a JSON file state.jsontelling tap-quickbookswhat it last synced. This can be fed back to the tap-quickbooks in the future to avoid syncing the same data again.

Finally! We’ve pulled our data from Quickbooks! Not too bad, right? If you wanted to use this in production, you’d have to automate the process of creating the properties.json and likely stick all of this into a Docker container (very similar to how hotglue and Airbyte work).

Step 3: Standardize the data

You can follow along with this part directly in the Jupyter Notebook (feel free to clone and try your own transformations).

Look at the data

Let’s take a peek at what tap-quickbooks gave us.

Invoice CSV from Quickbooks

Not too bad, right? Let’s load the data into a Jupyter Notebook and clean the data up a bit. For this article, I’ll keep it very simple but if you’d like to learn about other ETL operations check out my TowardsDataScience article.

Launch Jupyter

Let’s launch Jupyter

# You may have some issues in Jupyter if you don't do this
$ pip install prompt-toolkit==3.0.14
# Deactivate the virtualenv
$ deactivate
# Start Jupyter Lab
$ jupyter lab

This should start Jupyter in the current directory and open the browser.

JupyterLab started

If all the setup commands worked, you should see tap-quickbooks available under the Notebook sections. Let’s create a new Notebook with the tap-quickbooks kernel. I am going to name mine quickbooks.ipynb

Load the data

Let’s use the gluestick and pandas libraries to load the data and take a look. Our goal here is to be able to easily manipulate the output from tap-quickbooks.

Preview of the data in a Pandas Dataframe

Clean up the data

Now that we have the data in a Panda’s dataframe, you can transform it however you like. Of course, you’re not limited to use Pandas — you could use Spark, or any other Python based data transformation tool you like.

Example final data

Conclusion

Next steps

This is really just a starting point for a data integration pipeline. If you’re looking to take this further (orchestrating this on the cloud, connecting it to your product) it’s worth taking a look at developer focused tools like hotglue and Meltano, both of which aim to make data integration easier for developers.

Considerations

I recently published an article on TowardsDataScience about the pros and cons of building off Singer. I would recommend checking out Airbyte before resolving to build your pipeline off Singer.

Feel free to check out the open source hotglue recipes for more samples in the future. Thanks for reading! I’d love to answer any comments or questions below.

--

--