Once in a blue moon, a data scientist might receive requests like below:
- Could you back up the stage and local Postgres database from the production machine?
- Could you copy X file from the production machine docker server and save it to the local machine?
- How about uploading X file to AWS s3 for your client? (Hint: you might have multiple AWS profiles)
If you have experienced similar situations but are not sure how to approach them, I hope my below code snippets can help restore your calmness and peace of mind.
How-to: back up the stage and local Postgres database from the production machine
In our company, we can only ssh to prod machine, and the Postgres database is very large. So here’s the solutionthat works well for our team:
- Step 0: Install
zstd
on your local machine.zstd
is a compression algorithm, which can help reduce the Postgrespg_dump
file size.
# in your terminal
$ brew install zstd
- Step 1: ssh to your production machine (if you are unsure how to ssh to your production machine, ask your DevOps coworker for help!)
$ ssh [alias for prod]
$ pg_dump -T [table1] -c | zstd -T32 -19 -c > /tmp/prod.psql.zstd
The above command = "run pg_dump for all tables except for table 1". And this command compresses the pg_dump
file and stores it in the Production machine temp folder.
- Step 2: Download the zstd output to the local machine
open a new terminal, and cd
to the directory you want to save the file. For example Downloads
:
$ cd Downloads/
$ scp discovery-prod:/tmp/prod.psql.zstd .
scp
command copies the file from prod to your local in a super clean way! Just like the Apparition in Harry Potter 💎
- Step 3: Backup your local or stage database with the
zstd
file Make sure to talk to your DevOps team first before your drop the database on your stage environment.
# I feel pretty free to do so on my local machine
$ dropdb discovery_db
$ createdb discovery_db
$ cat prod.psql.zstd | zstd -dc | psql discovery_db
How-to: copy X file from the production machine docker server and save it to the local machine
Well, similar to above process, the only thing new here is to grab certain file from a specific docker container.
# step 0: ssh to your production machine.
$ ssh prod
# step 1: show your docker container
$ docker ps
# step 2: copy a file under specific container ID
# Here I'm trying to copy a meltano.db file
# step 3: copy file and store it in temp folder
$ docker cp [container_id]:/projects/.meltano/meltano.db /tmp/meltano.db
Similarly, you can open a new terminal and use scp
to download the meltano.db
file from production temp folder to your local machine. Below is a screenshot:

Bonus Tips:
- you can use
docker container ls
withdocker ps
interchangeably - Docker container folder can become very large and sometimes you might want to remove the old unused container to release the space. You can do this:
# Run both to stop and remove the folder
$ docker stop [container_id]
$ docker rm [container_id]
The reason why I want to export meltano.db
is because we noticed the size of this file grows at an unexpected speed. So we want to export this file for further analysis. Here’s another trick to show the size of files:
du -h --max-depth=1
: this will show the files under this directoryls -alh
: this will show files under.
and their size in a human-readable way.

How-to: set up multiple AWS profiles and upload files to s3
For example, I’d like to configure two profiles: one for my company Outside, and the other for accessing a client’s company. You need to get the AWS access key and secret key for both profiles beforehand.
# step0: check your default profile list
$ aws configure list
# Configure my profile 1: wen_outside
$ aws configure --profile wen_outside
# follow the steps to filling the blanks
AWS Access Key ID [None]: [fill your info]
AWS Secret Access Key [None]: [fill your info]
Default region name [None]: [fill your info] # example: us-east-1
Default output format [None]: json
# Configure my profile 2: wen_client
$ aws configure --profile wen_client
# follow the steps to filling the blanks
AWS Access Key ID [None]: [fill your info]
AWS Secret Access Key [None]: [fill your info]
Default region name [None]: [fill your info]
Default output format [None]: json
# Now check to see your profile list
$ aws configure list-profiles
The next step is how to switch between profiles:
# Switch to my outside profile
$ export AWS_DEFAULT_PROFILE=wen_outside
Lastly, say I need to upload the meltano.db
to our client’s s3:
# switch to my profile wen_client
$ export AWS_PROFILE=wen_client
# from local to S3: for single file
$ aws s3 cp meltano.db [client's s3 directory]
# from S3 to current directory
$ aws s3 cp [s3 directory] .
# from local to S3: sync the whole folder "forClient"
$ aws s3 sync forClient [client's s3 directory]
Summary
- I share similar fear as some other data scientists, it can be intimidating to interact with production machines, which are typically Linus and Ubuntu systems. These tasks are not something I do frequently enough that I built my muscle memory. The good news is that such fear can be overcome if you keep good documentation of such code snippets.