A wealth of information in the Postgres catalog

Examining the Postgres catalog with Python

Getting the most out of database metadata

Mark Ryan
4 min readJul 20, 2020

--

Relational databases like Postgres include a set of tables that describe the tables in the database. This set of metadata tables is called the catalog and it contains a treasure trove of details about the database. I recently needed to write a program to automatically extract insights from database catalogs and this led me to write a simple Python module to connect to a Postgres database, get information from the catalog, and load that information into a Pandas dataframe for further processing in Python. This article describes the process I followed.

Introduction

One of the key facts about Postgres is that it has not one but two catalogs:

  • ANSI (information_schema): this catalog contains the common relational database information defined for the ANSI standard. If you limit your catalog use to information_schema your code should work with other relational databases that implement the ANSI standard.
  • PostgreSQL (pg_catalog): this catalog contains Postgres-specific metadata. If your code depends on this catalog it will need to updated before it can be used with other relational databases.

When you examine the objects related to a database using the Postgres UI admin tool pgAdmin, you can see that the database dvdrental has these two catalogs:

For the purposes of this article I am focusing on the ANSI catalog because it has the information I need and code that I write to work with the Postgres ANSI catalog should work with the catalogs of other relational databases.

Setting up the exploration of the Postgres catalog

To explore the Postgres ANSI catalog from a Python program, I followed these steps:

  1. Downloaded and installed Postgres, as well as pgAdmin. For the purposes of this exercise pgAdmin provided the perfect way to quickly examine the catalog and double-check the results of the Python code.
  2. Set up the dvdrental database. This sample database is easy to set up and has a rich variety of database objects, including triggers, views, functions, and sequences, so there is a lot to examine in its ANSI catalog.
  3. Created a Python module to connect to the database, run a query to extract information from one of the catalog tables, and save the result as a pickled dataframe

The following section describes the Python code that I wrote to implement step 3.

Python module to explore the Postgres catalog

You can find the Python module that explores the Postgres catalog here. Following are the highlights of the code:

  1. Create a connection to the Postgres database using the psycopg2 library. The connection is made using parameters read from the config file scrape_db_catalog_config.yml and the Postgres password provided interactively by the user:

2. Build a query to examine one of the tables in the catalog using parameters from the config file, run the query, and copy the results to the Python variable record_col_details.

3. Write the query results to a Pandas dataframe:

The config file contains the parameters for the connection (in the general section) and the query (in the query_scope section).

By updating the parameters in the config file you can easily change the query to examine a different catalog table (by updating from_table) or a different set of columns in the catalog table (by updating cols). This means that you can repeatedly rerun the Python module to get the details you need from the catalog without having to touch the Python code.

For example, if I wanted to include the ordinal_position column in the output for the query, I would simply have to add it to the cols list in the config file:

Note that the password for the Postgres connection is not included in the config file. Instead, the code prompts the user directly for their password:

Here is an example of the output of the Python module that shows a selection of columns from the columns table (information_schema.columns) in the ANSI catalog. I can use the data type information for the columns to help to automatically create simple deep learning models to make predictions on data in these tables.

column_name                    data_type     table_name
0 last_name character varying actor
1 actor_id integer actor
2 first_name character varying actor
3 last_update timestamp without time zone actor
4 first_name character varying actor_info
5 actor_id integer actor_info
6 last_name character varying actor_info
7 film_info text actor_info

Conclusion

The Postgres ANSI catalog contains useful metadata about a Postgres database. By using the psycopg2 library to connect to the database and maintaining the connection and query parameters in a config file, you can efficiently examine the catalog from within a Python module.

Related resources

--

--