A Practical Introduction to the Use of Molecular Fingerprints in Drug Discovery

Laksh
Towards Data Science
4 min readJul 2, 2019

--

Morgan fingerprinting

Molecular fingerprints are used in drug discovery for many reasons. Today we will focus on their use in the prediction of drug binding affinities.

Molecular fingerprints are a way to represent molecules as mathematical objects. By doing this, we can perform statistical analyses and/or machine learning techniques on the set of molecules to gain new insights that we could not gain as humans. One of the most common molecular fingerprinting methods is Extended Connectivity FingerPrinting (ECFP) which we will look at today.

Extended Connectivity FingerPrinting (ECFP)

The basic idea goes as follows. Each point will be expanded on.

  1. Assign each atom with an identifier
  2. Update each atom’s identifiers based on its neighbours
  3. Remove duplicates
  4. Fold list of identifiers into a 2048-bit vector (a Morgan fingerprint)

1. Assign each atom with an identifier

We choose an atom in the molecule and take note of:

  • number of nearest-neighbour non-hydrogen atoms

--

--