ELT (Extract, Load, Transform) is a modern approach to data integration that differs slightly from Etl (Extract, Transform, Data). ETL transforms data before loading it inside the data warehouse, whereas in an ELT, the raw data is loaded directly inside the data warehouse and transformed using SQL.
Building ELTs is a very important part of data and analytics engineers’s job, and it can also be a useful skill for data analysts and scientists with a wider scope, or job seekers building a complete portfolio.
In this article, we’ll build a short ELT pipeline in Python using data from dummyJSON. dummyJSON is a fake REST API. It provides 9 types of resources:

We will try to find out which customers spent the most money on our dummy shop.
This script will involve 3 steps:
- Extracting Data from dummyJSON API
- Loading raw data to BigQuery
- Executing a query to perform the analysis
Let’s start building our pipeline!
Extracting Data
We will need to retrieve 2 resources from the API: carts, and users.
Let’s create a function that makes the API call and returns the JSON data:
import requests
ENDPOINT = "https://dummyjson.com/"
def make_api_call(resource):
ENDPOINT = "https://dummyjson.com/"
response = requests.get(f"{ENDPOINT}{resource}") # making a request to the correct endpoint
if response.status_code == 200:
return response.json()
else:
raise Exception(response.text)
print(make_api_call("carts"))
We use the requests library to make a simple HTTP GET request. We check that it’s a success with the status code and return the JSON data.
{
"carts": [
{
"id": 1,
"products": [
{
"id": 59,
"title": "Spring and summershoes",
"price": 20,
"quantity": 3,
"total": 60,
"discountPercentage": 8.71,
"discountedPrice": 55
},
{...}
// more products
],
"total": 2328,
"discountedTotal": 1941,
"userId": 97,
"totalProducts": 5,
"totalQuantity": 10
},
{...},
{...},
{...}
// 20 items
],
"total": 20,
"skip": 0,
"limit": 20
}
We have our data about orders! Let’s try it with customers:
{
"users": [
{
"id": 1,
"firstName": "Terry",
"lastName": "Medhurst",
"maidenName": "Smitham",
"age": 50,
"gender": "male",
"email": "[email protected]",
"phone": "+63 791 675 8914",
"username": "atuny0",
"password": "9uQFF1Lh",
"birthDate": "2000-12-25",
"image": "https://robohash.org/hicveldicta.png?size=50x50&set=set1",
"bloodGroup": "A−",
"height": 189,
"weight": 75.4,
"eyeColor": "Green",
"hair": {
"color": "Black",
"type": "Strands"
},
"domain": "slashdot.org",
"ip": "117.29.86.254",
"address": {
"address": "1745 T Street Southeast",
"city": "Washington",
"coordinates": {
"lat": 38.867033,
"lng": -76.979235
},
"postalCode": "20020",
"state": "DC"
},
"macAddress": "13:69:BA:56:A3:74",
"university": "Capitol University",
"bank": {
"cardExpire": "06/22",
"cardNumber": "50380955204220685",
"cardType": "maestro",
"currency": "Peso",
"iban": "NO17 0695 2754 967"
},
"company": {
"address": {
"address": "629 Debbie Drive",
"city": "Nashville",
"coordinates": {
"lat": 36.208114,
"lng": -86.58621199999999
},
"postalCode": "37076",
"state": "TN"
},
"department": "Marketing",
"name": "Blanda-O'Keefe",
"title": "Help Desk Operator"
},
"ein": "20-9487066",
"ssn": "661-64-2976",
"userAgent": "Mozilla/5.0 ..."
},
{...},
{...}
// 30 items
],
"total": 100,
"skip": 0,
"limit": 30
}
This time, something catches our attention: there is a total of 100 users, but we only received 30.

Therefore, we need to call that API again until we have all the data, skipping the ones that we already have. However, we are not interested in sending those total, skip, and limit keys to our data warehouse; let’s keep only the users and carts.
Here is our updated function:
def make_api_call(resource):
ENDPOINT = "https://dummyjson.com/"
results_picked = 0
total_results = 100 #We don't know yet, but we need to initialize
all_data = []
while results_picked < total_results:
response = requests.get(f"{ENDPOINT}{resource}", params = {"skip" : results_picked})
if response.status_code == 200:
data = response.json()
rows = data.get(resource)
all_data += rows #concatening the two lists
total_results = data.get("total")
results_picked += len(rows) #to skip them in the next call
else:
raise Exception(response.text)
return all_data
users_data = make_api_call("users")
print(len(users_data))
This time, we have our 100 users!
Loading Data
Now, it’s time to upload our data to BigQuery. We are going to use the BigQuery client library for Python.

Looking at the documentation, we see that it’s possible to load a local file into BigQuery. Right now, our JSON is only a dict. Let’s download it to a local file.
We are going to use the native library json and write our JSON data inside a file. One thing we need to keep in mind is that BigQuery accepts JSONS in a newline delimited format, not a comma delimited format.
import json
def download_json(data, resource_name):
file_path = f"{resource_name}.json"
with open(file_path, "w") as file:
file.write("n".join([json.dumps(row) for row in data]))
download_json(carts_data, "carts")
download_json(users_data, "users")
We can now check that our carts.json file is in the right JSON format:
{"id": 1, "products": [{"id": 59, "title": "Spring and summershoes", "price": 20, "quantity": 3, "total": 60, "discountPercentage": 8.71, "discountedPrice": 55}, {"id": 88, "title": "TC Reusable Silicone Magic Washing Gloves", "price": 29, "quantity": 2, "total": 58, "discountPercentage": 3.19, "discountedPrice": 56}, {"id": 18, "title": "Oil Free Moisturizer 100ml", "price": 40, "quantity": 2, "total": 80, "discountPercentage": 13.1, "discountedPrice": 70}, {"id": 95, "title": "Wholesale cargo lashing Belt", "price": 930, "quantity": 1, "total": 930, "discountPercentage": 17.67, "discountedPrice": 766}, {"id": 39, "title": "Women Sweaters Wool", "price": 600, "quantity": 2, "total": 1200, "discountPercentage": 17.2, "discountedPrice": 994}], "total": 2328, "discountedTotal": 1941, "userId": 97, "totalProducts": 5, "totalQuantity": 10}
// other carts
{"id": 20, "products": [{"id": 66, "title": "Steel Analog Couple Watches", "price": 35, "quantity": 3, "total": 105, "discountPercentage": 3.23, "discountedPrice": 102}, {"id": 59, "title": "Spring and summershoes", "price": 20, "quantity": 1, "total": 20, "discountPercentage": 8.71, "discountedPrice": 18}, {"id": 29, "title": "Handcraft Chinese style", "price": 60, "quantity": 1, "total": 60, "discountPercentage": 15.34, "discountedPrice": 51}, {"id": 32, "title": "Sofa for Coffe Cafe", "price": 50, "quantity": 1, "total": 50, "discountPercentage": 15.59, "discountedPrice": 42}, {"id": 46, "title": "women's shoes", "price": 40, "quantity": 2, "total": 80, "discountPercentage": 16.96, "discountedPrice": 66}], "total": 315, "discountedTotal": 279, "userId": 75, "totalProducts": 5, "totalQuantity": 8}
Let’s try to upload our files now!
First, we need to download the client library for Python.
Once this is done, we have to download a service account key and create an environment variable to tell BigQuery where our credentials are stored. In the terminal, we can enter the following command:
export GOOGLE_APPLICATION_CREDENTIALS=service-account.json
Then, we can write our Python function. Luckily, BigQuery documentation gives us a code sample:

We can define a new function using this sample:
def load_file(resource, client):
table_id = f"data-analysis-347920.medium.dummy_{resource}"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
autodetect=True,
write_disposition="write_truncate"
)
with open(f"{resource}.json", "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
client = bigquery.Client()
load_file("carts", client)
load_file("users", client)
Loaded 20 rows and 7 columns to data-analysis-347920.medium.dummy_carts
Loaded 100 rows and 27 columns to data-analysis-347920.medium.dummy_users
We told BigQuery to truncate our table every time, so if we rerun the script, existing rows will be overwritten.
We now have our 2 tables inside BigQuery:

Transforming Data
It’s time to move on to the last letter in ELT!
We want to have a table with the users and the amount they spent on our shop.
Let’s join our two tables to have this information:
SELECT
u.id AS user_id,
u.firstName AS user_first_name,
u.lastName AS user_last_name,
SUM(total) AS total_spent
FROM `data-analysis-347920.medium.dummy_users` u
LEFT JOIN `data-analysis-347920.medium.dummy_carts` c
ON u.id= c.userId
GROUP BY u.id,user_first_name,user_last_name
ORDER BY total_spent DESC

Seems like Trace Douglas is our top spender! Let’s add this table as a part of our ELT.
query= """
SELECT
u.id AS user_id,
u.firstName AS user_first_name,
u.lastName AS user_last_name,
SUM(total) AS total_spent
FROM `data-analysis-347920.medium.dummy_users` u
LEFT JOIN `data-analysis-347920.medium.dummy_carts` c
ON u.id= c.userId
GROUP BY u.id,user_first_name,user_last_name
"""
query_config= bigquery.QueryJobConfig(
destination = "data-analysis-347920.medium.dummy_best_spenders",
write_disposition= "write_truncate"
)
client.query(query, job_config= query_config)
I removed the ORDER BY as this is computationally expensive; we can still order our results once we query the dummy_best_spenders table.
Let’s check that our table was created:

That’s it, we built our first ELT in just a few lines of code!
Going Further with ELTs
When dealing with real and bigger projects, there are a few more things to consider:
- We will deal with larger amounts of data. There will be new data every day, so we have to incrementally append data to our tables instead of treating all the data every day.
- As our ELTs get more complex, with multiple sources of data, we might need to use a workflow orchestration tool like Airflow or Prefect.
- We can only load files that weight less than 10MB directly into BigQuery. To load bigger files, we need to load them in Cloud Storage first.
Resources
I hope you enjoyed this article! If you did, please follow me for more content on Python, SQL, and analytics.
Become a member and read all stories on Medium. Your membership fee will directly support me and other writers you read. You’ll also get full access to every story on Medium.