The National Center for Biotechnology Information (NCBI) of the NIH manages 43 Entrez biomedical information databases open to the public. Use Python class c_e_info to retrieve statistics and other information about the databases.
"In God we trust, all others bring data."— W. Edwards Deming

Introduction
The United States government has amassed incredible amounts of Data. Much of it is freely available to the public through file downloads or APIs. For example, the National Center for Biotechnology Information (NCBI) within the U.S. National Institutes of Health (NIH) manages 43 publicly accessible biomedical databases collectively called Entrez databases.
Entrez databases contain diverse data, such as references and abstracts for over 30 million biomedical journal articles, genome sequences, and classification and nomenclature for all organisms in the public sequence databases.
Data engineers, data analysts, data scientists, and software developers can leverage the diverse biomedical and biotechnology data stored in Entrez databases for their projects. For example, I have developed several solutions to query, retrieve, and analyze data from the Entrez PubMed biomedical article abstract database.
This article describes Entrez programming utilities (E-utilities) you can use to access Entrez databases. It also demonstrates the Python c_e_info class to query metadata about the databases. It calls the Entrez EInfo utility to obtain the list of Entrez databases and metadata for any of its 43 databases. You can use other E-utilities to search the databases and retrieve biomedical data for your projects.
Overview of E-utilities
The E-utilities include nine server-side programs that provide programmers with an interface to query and retrieve data from the Entrez query and database system.
While this article covers the EInfo e-utility in detail, the following sections provide overviews of all E-utilities.
EInfo
EInfo provides two types of information. First, it returns a list of the names of all Entrez databases. Second, it provides metadata for a specified database. It retrieves the name, other descriptors, the number of records it contains, and the date and time when its data was last updated. It also provides each database field’s name and information about how it links to other Entrez databases.
ESearch
ESearch performs a textual search of a database. It returns a list of unique identifiers (UIDs) for records that match the search criteria. Then, programmers use those UIDs to retrieve data with other E-utilities, including ESummary, ELink, and EFetch.
EPost
EPost is used to upload a set of UIDs to the History Server. It returns a query key and web environment for the dataset. Programmers can later retrieve data from databases with the UIDs stored on the History Server.
ESummary
ESummary returns document summaries when called with a list of UIDs for a database.
EFetch
EFech returns data records from a database in a specified format when called with a set of UIDs.
ELink
When provided a list of UIDs for a database, ELink returns a list of related UIDs in the same database or linked UIDs in another Entrez database.
EQuery
EQuery returns the number of records from each database that match a textual search.
ESpell
ESpell resolves spelling suggestions for a textual query of a database.
ECitMatch
ECitMatch retrieves PubMed IDs (PMIDs) that correspond to a given set of citation text strings. PubMed is the Entrez database that contains abstracts of more than 30 million biomedical journal articles.
E-utilities Documentation
The E-utilities are well-documented on help web pages. You can download PDF files of the pages, too.
These help pages provide an overview of the E-utilities and additional details about the EInfo utility:
The EInfo E-utility
As described above, EInfo returns a list of all Entrez databases or, given a database name, it returns general information about the database, descriptions of its fields, and information about its links to other Entrez databases.
The Einfo section of the E-Utilities In-Depth help page provides information about how to use the utility.
URL to Retrieve Entrez Database List
Like all E-utilities, EInfo has a base URL called to request data from Entrez or one of its databases.
EInfo Base URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
To see the returned output of a call to EInfo with its base URL, open the URL in a web browser. It should return a list of valid Entrez databases in XML format, as shown in the screenshot below.

URL to Retrieve Entrez Database Details
To retrieve detailed information about an Entrez database, call the EInfo base URL with the db parameter and the databases’s name as its value. Calling the following example URL returns information about the PubMed database.
EInfo Base URL with db parameter:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed
To view the output returned from a call to EInfo with the db parameter, navigate to the URL address in a web browser. It should return metadata about the database, details about each field, and details about each link to other Entrez databases.

Other EInfo Parameters
In addition to the db parameter, EInfo takes the optional version and retmode parameters as inputs.
version
Use the optional version parameter to specify version 2.0 EInfo XML. It allows only a value of ‘2.0’. When used, EInfo will return two additional fields: and . Truncatable allows the wildcard character ‘*’ in search terms. The wildcard character will expand to match any set of characters up to 600 unique expansions. Rangeable fields allow the range operator ‘:’ between the limits for a range of values (such as 2018:2020[pdat]).
retmode
Use the optional retmode parameter to specify the format of the retrieved data. The default value is ‘xml’ to return data in the XML format. The value ‘json’ may also be used to retrieve data in JSON format.
Python Controller and c_e_info Class
The Python c_e_info class wraps EInfo and extends its capabilities. It performs these functions:
- Call EInfo utility to obtain the valid list of Entrez databases. Return output in XML format.
- Call EInfo utility to obtain an overview, field list, and database link list for the specified database. Return output in XML format.
- Write the returned XML output of a call to the EInfo utility to a file.
Python Controller
The controller code shown below imports the c_e_info class module and performs these tasks to obtain, display, and store a list of valid Entrez databases and the details of the Entrez PubMed database:
- Create an instance of c_e_info with no parameter. The class constructor will call a function that calls the EInfo utility and returns the database list in XML format.
- Write the XML stream that contains the database list to a file.
- Create an instance of c_e_info with a database name parameter (‘pubmed’ in this example). The class constructor will call a function that calls the EInfo utility and returns an overview of the database, details about its fields, and details about its links to other Entrez databases in XML format.
-
Write the XML stream that contains details for the specified database to a file.
Python c_e_info Class
The c_e_info class performs the tasks requested by the controller. Comments in the code below describe its functions.
Uses of the c_e_info Class and Controller
While you would use other E-utilities to query and retrieve data from Entrez databases, EInfo can be used to gain a basic understanding of Entrez databases. Here are some possible uses for EInfo and the c_e_info class:
- Understand data structures – Use field and link data retrieved for an Entrez database with EInfo to understand the data structures to prepare for a data analytics, Data Science, or software development project.
- Build a database – Obtain details of the fields in an Entrez database to build a database table (in a database management system such as Microsoft SQL Server, MySQL, Oracle, or PostgreSQL) to contain data from calls to E-utilities.
- Review basic Database metadata – Retrieve a list of databases and gain a general understanding of each database, its number of records, when it was last updated, its fields, and its links to other Entrez databases.
Where to Go from Here
I have developed several data analytics products and one application with biomedical article abstracts from the Entrez PubMed database at their core. Maybe you can use the data stored in Entrez databases in your projects. Here are some ideas:
- Enhance the c_e_info class to handle errors gracefully.
- Write information about the tasks performed by c_e_infor to a log file.
- Review the field information for each Entrez database and Identify use cases to leverage the data for upcoming or potential projects.
- Change or extend the functionality of the c_e_info class to meet your needs.
- Learn how to use other E-utilities to obtain data from Entrez databases.
- Write programs or classes to parse and create indexes of the XML or JSON output returned from E-utilities calls.
- Write a Python class to convert XML returned from calls to E-utilities to HTML files and publish them to a Web server.
- Write a Python class to convert XML returned from calls to E-utilities to other formats (such as CSV) to present and analyze in business intelligence and data visualization tools like Tableau or Microsoft Power BI.
- Write a utility class that calls E-utilities to retrieve data from an Entrez database and insert it into a database.
- Build wrapper classes in Python or another language to simplify and extend the use of other E-utilities.
Programming Environment
I used the following operating system and tools to write and test the code for this project and article:
- Windows 10 – Windows is my OS of choice, but the Python code will work on other platforms.
- Python 3.9 – I used Python 3.9.2 for this project, but other Python 3 versions should work.
- Visual Studio Community 2019 – I have used Microsoft Visual Studio for decades and find it reliable, versatile, and fast. The free Community edition works well for most of my needs. Visual Studio Code (VS Code) or whatever integrated development environment (IDE) or code editor you prefer should work well.
In the code example, the c_e_info.write_db_xml() function is called with standard Windows file directory and name formats, as shown below. You may need to modify the file directory name formats to work with your operating system.
_pubmed_db_list.write_db_xml(‘c://project_data/c_e_info/entrez_dblist.xml’)
Summary
The 43 Entrez biomedical databases store a rich and diverse collection of data that could drive or augment many data analytics and data science projects. The 9 Entrez E-utilities are easy to use. You can integrate them into your programs and extend them with wrapper classes written in Python or other programs.
I hope that this article and its sample code have provided you with a basic understanding of the E-utilities, useful how-to information about its EInfo utility, and how to use the Python c_e_info helper class that wraps and expands its capabilities.
Other Articles about Public Datasets
If you are interested in data to drive or augment your projects, see my other articles about freely accessible public datasets in Towards Data Science magazine:
- Use Public Datasets Cataloged on Data.gov to Power Data Science Projects
- Acquiring and Analyzing Data from analytics.usa.gov with Python and Tableau
- How to Write a Python Program to Query Biomedical Journal Citations in the PubMed Database
About the Author
Randy Runtsch is a writer, data engineer, data analyst, programmer, photographer, cyclist, and adventurer. He and his wife live in southeastern Minnesota, U.S.A.
Randy writes articles on public datasets to drive insights and decision-making, writing, programming, data engineering, data analytics, photography, wildlife, bicycle touring, and more.