European financial regulatory data

A guide on how to access public sources of data relating to the EU financial markets

Alan Bunbury
Towards Data Science

--

Photo by Finn Protzmann on Unsplash

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Introduction

In the European Union, participants in the financial markets — including banks, insurance companies, investment funds, trading firms and securitisation special purpose vehicles (SPVs) — are subject to a broad range of rules and regulations. Many of those regulations require market participants to provide data about themselves and their activities to regulators, such as the European Central Bank (ECB) or the European Securities and Markets Authority (ESMA). The regulators primarily use this data to monitor the financial system and (hopefully) detect and respond to developments which could potentially cause issues down the line. However, a subset of that data is also made available to the general public, and could be of interest to data sleuths looking to paint a clearer picture of the financial markets in Europe.

In this article I will given an overview of a few of these public data sources, explaining what they are and how to access and use them. I will be using the Python programming language to provide a few illustrative examples of code to fetch and parse the data, but beyond that no knowledge of any programming language will be assumed. It will be helpful to have a basic understanding of topics such as APIs and XML, as well as, of course, financial markets themselves. All of the example code can be found at the following GitHub repository:

I will refer (and link) to various regulations in this article, but it’s worth noting that many, if not all, of those regulations have been amended since their original publication. Most of the regulations I mention here are supplemented by various regulatory technical standards, implementing standards and guidance from EU and national regulators, which in many cases will set out the implementation details of the obligations imposed by the primary regulations. Finally, of course, nothing you read here should be considered legal advice!

Financial Instruments Reference Data

Under the Markets in Financial Instruments Regulation (MiFIR) and the Market Abuse Regulation (MAR), trading venues (such as stock exchanges) and systemic internalisers (such as investment firms who internally settle trades placed by clients, rather than settling those trades on-exchange) must submit reference data about relevant financial instruments (stocks, bonds, etc) to ESMA. ESMA then publishes that data on its website. The database is known as the Financial Instrument Reference Data System (FIRDS) and is accessible through ESMA’s registers website.

The size of the FIRDS dataset and the information it contains make it potentially very interesting, so it is worth spending some time discussing how to use it. The data can be accessed through a user interface, but more importantly for our purposes, it can also be downloaded in bulk through an API.

The process for downloading the FIRDS files is set out in the download instructions published by ESMA. First, you need to query the given URL, specifying the dates for which you want the data (in ISO 8601 format). You will then get back an XML object (or a JSON object if you specified that format in your request) which will contain, among other things, the download links for the relevant files. For example, the following URL query will return an XML object with links to all files published between 11 September 2020 and 18 September 2020:

https://registers.esma.europa.eu/solr/esma_registers_firds_files/select?q=*&fq=publication_date:%5B2020-09-11T00:00:00Z+TO+2020-09-18T23:59:59Z%5D&wt=xml&indent=true&start=0&rows=100

The download instructions linked above tell you how to figure out the name of the file you’re looking for based on the date and type of data. For example, the file FULINS_D_20200912_02of03.zip contains data relating to debt securities published on 12 September 2020 (and is the second of three such files). Each download_link element specifies the URL you need to query to download the relevant file (as a zip archive).

The zip archive should contain an XML file, the contents of which are described in the reporting instructions published by ESMA. Briefly, the root element of the XML tree is a BizData element, which should have two direct children; a header (Hdr) element and a payload (Pyld) element. The Pyld element has a child called Document, which in turn has a child called FinInstrmRptgRefDataRpt. The first child of FinInstrmRptgRefDataRpt is a header RptHdr; the rest of the children are RefData elements, each describing a security. The various children of the RefData element contain the information you are looking for.

Exactly what child elements are present will depend on the type of security you are looking at, and to fully understand what you’re looking at you should explore the XML yourself in conjunction with the reporting instructions linked above. By way of illustration, below are just some of the children of the RefData element which contain some useful information about debt securities:

  • Issr: The LEI code of the issuer of the debt security (see the “Legal entity identifiers” section below).
  • DebtInstrmAttrbts -> TtlIssdNmnlAmt: The total issued nominal amount of the debt security.
  • DebtInstrmAttrbts -> MtrtyDt: The maturity date of the debt security (in YYYYMMDD format).
  • DebtInstrmAttrbts -> IntrstRate: The interest rate applicable to the debt security (if the security pays a floating interest rate, this element will have child elements describing the relevant benchmark rate and the spread).
  • FinInstrmGnlAttrbts -> Id: The ISIN of the debt security.
  • FinInstrmGnlAttrbts -> NtnlCcy: The currency in which the notional amount of the security is denominated.
  • TradgVnRltdAttrbts -> Id: The market identifier code of the trading venue on which the security is listed or traded.

This is just a small example; the FIRDS data contains plenty more information about debt securities. And of course, the FIRDS data about other types of security (shares, derivatives, etc) will contain different data that is relevant to those types of securities.

There are a few of things to bear in mind when using the FIRDS data:

  • It is quite large — for example, FULINS_D_20200912_02of03.xml is about 417MB uncompressed, and is just one of many FIRDS files published on that day. Depending on your system, loading multiple such files into memory at once can quickly swallow up all your RAM. If you are using Python to parse the XML, you could consider using the iterparse function, which allows you to handle elements as they are encountered, rather than waiting for the full tree to be read into memory.
  • It appears that the FIRDS data contains one RefData element per security per reporting trading venue. That is, if the same security is traded on three different venues (for example), it will appear in the data three times. If you want to avoid duplicated data, you should keep track of which ISINs you have already handled and skip RefData elements that reference the same ISINs.
  • The data is not always recorded in a completely uniform way. For example, if you are trying to find securities which pay a floating interest rate based on EURIBOR, you will find that the benchmark rate may be described as “EURIBOR”, “EURI”, “Euro Interbank Offered Rate”, or one of many other variants. Some of these variants may be misspellings — I have seen “EUROBOR” and “EURIOBOR” in the data, for example. Alternatively, a benchmark rate may be described by an ISIN rather than by name.

Here you will find a script which contains a number of functions to download the FIRDS data for a particular date and security type, and use that data to create a basic SQLite table mapping security ISINs to issuer LEIs.

Legal entity identifiers

Companies and other legal entities which are subject to regulation in the EU (and elsewhere) are typically required to obtain a unique legal entity identifier (LEI) and provide it to the relevant regulator. The granting and management of LEIs is overseen by the Global Legal Entity Identifiers Foundation (GLEIF), which also maintains the Global LEI Index, a comprehensive source of data about LEIs and their holders. Besides searching manually on the GLEIF’s website, there are two ways to access data about LEIs:

  • Download the concatenated files. The GLEIF publishes a daily concatenated file containing data about all LEIs. As of 26 October 2020, there are just over 1.7 million entries in the concatenated LEI file, which is about 3.6GB in size when uncompressed (about 224MB when compressed). The data is in XML (specifically, LEI-Common Data File) format. The GLEIF also publishes concatenated files with other ancillary information, such as “relationship records”, which describes the direct and indirect ownership of entities with LEIs, and information about ISIN-to-LEI relationships.
  • Access the API, which is documented here. There is also a demo application here which allows you to explore the API. The API allows you to search by entity name, ISIN or LEI and provides the results as a JSON object. By way of example, this is what you will find if you search for the LEI 7ZW8QJWVPR4P1J1KQY45 (Google LLC).

Information you can get from the Global LEI Index includes the name, address, country and (in some cases) owners of the relevant legal entity. This can be very helpful because the data you get from other sources (for example, the FIRDS or STS data) will typically refer to companies by their LEIs rather than their full names. Note that, although you can use the GLEIF data to find details of the issuer of a security given the security’s ISIN, in my experience not all ISIN-LEI relationships are present in the GLEIF data, and I have had better luck using the FIRDS data discussed above for this purpose.

By way of example, here is a basic function to query the API and return the issuer name and jurisdiction for each provided LEI.

Total volume and number of executed transactions

Under MiFID II, investment firms dealing on own account when executing client orders over the counter on an “organised, frequent, systematic and substantial basis” are subject to certain additional rules. Each investment firm, in order to determine whether it meets that test in respect of a particular security, needs to compare the number and volume of trades executed by it against the overall number and volume of trades in the security in the market generally. To assist firms in making that comparison, ESMA calculates and publishes, on a quarterly basis, the total volume and number of transactions executed in the EU, broken down by security. Current and historical data can be found on ESMA’s website.

Both current and historical data is available. The data is downloadable as an Excel (.xlsx) file, and there are separate files for equities, bonds and other non-equity securities (primarily derivatives). The equities and bonds files refer to securities by their ISIN, whereas the non-equities file is broken down by “sub-class identification”. Each Excel file comes with an “explanatory note” as a separate worksheet, and the explanatory note for the non-equities file explains how to interpret the “sub-class identification” of a derivative.

As an example, here is some code to parse an Excel file and generate a bar graph describing the most traded stocks in the EU, so that calling the script with the following arguments (from the same directory as the Equities Excel file):

$ python si_calcs.py equity_si_calculations_-_publication_file_august_2020.xlsx most_traded_stocks.png

… will generate the following graph:

This document has been drafted using material downloaded from ESMA’s website. ESMA does not endorse this publication and in no way is liable for copyright or other intellectual property rights infringements nor for any damages caused to third parties through this publication.

Simple, transparent and standardised securitisations

Securitisations in Europe are regulated by the Securitisation Regulation. Among other things, the Securitisation Regulation provides for certain securitisations to be designated as “simple, transparent and standardised” (STS) securitisations, if they meet the requirements and criteria set out in the regulation. The idea is to identify and promote (through preferential regulatory capital treatment) securitisations which are considered to be particularly low-risk for investors.

Details of securitisations wishing to avail of the STS designation must be notified to ESMA, and ESMA publishes certain of those details on its website. Currently, they are published as a spreadsheet which is updated regularly, though the plan is that they will eventually be published as a separate register on ESMA’s registers website.

STS securitisations are divided into public securitisations (in respect of which a prospectus is drawn up under the Prospectus Regulation) and private securitisations. Only very limited details of private securitisations are publicly available, such as the asset class of the underlying assets being securitised and whether the securitisation is an asset-backed commercial paper (ABCP) transaction. For public securitisations, more details are available, such as details of the originator(s) of the underlying assets and the ISINs of the notes (which can be cross-referenced against the FIRDS data to find out more information about the notes and the issuing SPV).

For a basic example of what can be done with this data, see this web page, which also draws on the FIRDS and LEI data discussed above.

Eurozone financial institutions

The European Central Bank (ECB) maintains lists of various types of financial institution in the euro area. The types of financial institutions are as follows:

  • Monetary financial institutions (MFIs), which include central banks, credit institutions, other deposit-taking corporations and money-market funds (MMFs).
  • Investment funds (IF), excluding pension funds and MMFs.
  • Financial vehicle corporations (FVCs), which are, broadly speaking, SPVs involved in securitisations.
  • Payment statistics relevant institutions (PSRIs), which are payment service providers and payment system operators.
  • Insurance corporations (ICs).

You can download the lists as follows:

  • PSRIs and ICs: Follow the link above, click on the relevant type of financial institution and additional text (including a number of links) will appear. The link to download the list of financial institutions will be of the form “Published details regarding the list of [PSRIs/ICs], including historical data”. This link is to a zip file, containing an Excel (.xlsx) for each year for which data is available.
  • IFs and FVCs: Follow the link above and click on the relevant type of financial institution. The links to download the list of financial institutions will be of the form “[IFs/FVCs] Overview [time period]” (eg, “FVCs Overview 2019–2020”). The link is to a zip file which contains an Excel file for each quarter in the relevant time period for which data is available.
  • MFIs: Go to this page which will allow you to search or download the dataset. The dataset is in CSV format (note that values are separated by tabs, not commas).

Where you download a list as of a particular quarter or year, note that the list will contain all registered institutions of the relevant type as at the end of that quarter, not just institutions registered in that quarter. For example, the file FVC_2020_Q1.xlsx, which is the list of FVCs for Q1 2020, will contain the full list of all entities that remain registered as FVCs at the end of Q1 2020.

In general, the lists of financial institutions will include (among other things) the LEI, legal name and address and country of registration of each institution. The list for each type of financial institution will also include some additional data relevant to that type of financial institution. For example, the FVC data will include details (including LEI, name and country) of the management company responsible for each FVC, as well as the ISINs of the debt securities issued by each FVC.

Other data relating to financial institutions, such as aggregate balance sheet totals, can be found at the ECB’s Statistical Data Warehouse website (along with many other interesting statistics and datasets).

It should be noted that, although the data collected by the ECB relates mainly to countries in the euro area, central banks in some (but not all) non-euro area countries have chosen to collect data and send it to the ECB. So, for example, the FVC data for Q2 2020 includes data from Bulgaria, Sweden and Denmark.

This code uses FVC data to generate a “choropleth” map, showing you which countries in the euro area are home to the most financial vehicle corporations. Calling the script as follows (assuming all of the data files are in the same directory):

$ python fvc.py CNTR_RG_20M_2020_3857.shp FVC_2020_Q2.xlsx fvcs_in_euro_area.png

… will generate an image like this:

© EuroGeographics for the administrative boundaries

Derivatives — EMIR reporting

Derivatives in the EU are regulated by the European Market Infrastructure Regulation (EMIR), which, among other things, requires that details of all derivatives entered into by EU counterparties must be reported to a trade repository. The trade reports themselves are not publicly available. However, trade repositories are obliged to make available to the public certain aggregated derivatives data — specifically, aggregate open positions, aggregated transaction volumes and aggregated values — broken down by class of derivatives. The aggregated data must be published on a website or an online portal which is easily accessible by the public, and must be updated at least weekly.

Unfortunately, there is, as far as I am aware, no central source (like the ESMA website) from which this data can be accessed; each trade repository publishes its own data on its website. It therefore seems that the only way to get the full set of aggregated data is to check ESMA’s list of trade repositories, go to each registered trade repository’s website, navigate to the appropriate section (it may be called something like “EMIR public data”) and access and download the data for the relevant date. As an example, here is DTCC’s page for accessing the data.

What’s more, although most of the trade repositories appear to allow you to download the data as a file, there isn’t much consistency across formats; some provide the data as comma-separate values, some provide it as an Excel spreadsheet and some provide it as XML. So some manual work is required not only to access all of the data, but also to consolidate it if desired.

Securities financing transactions — SFTR reporting

The Securities Financing Transaction Regulation (SFTR) requires that details of “securities financing transactions” (SFTs) be reported to a trade repository. SFTs include repo transactions, securities lending transactions and margin lending transactions, among other things.

The SFTR reporting obligation is very similar to the EMIR reporting obligation, and the publicly available data (and the means of accessing it) is also similar. Here is DTCC’s page with its SFTR data. The SFTR reporting obligation is quite new and still being phased in, so not all market participants are yet required to report.

Prospectuses

Under the Prospectus Regulation, companies issuing securities (such as bonds or shares) to the public, or which will be admitted to trading on a regulated market, must draw up a prospectus for those securities, which must comply with the requirements of the Prospectus Regulation and must be approved by the relevant competent authority. ESMA maintains a register of currently approved prospectuses on its website and provides machine-to-machine access via an API. A detailed explanation of how to use the API can be found here (see in particular sections 1, 2 and 5).

Working with the data

Below are a few things you should bear in mind with working with any of the datasets I have discussed above.

Know the scope of the data

Before working with any of these datasets it is important to understand exactly who is required to provide the relevant data, what they are required to provide and when they are required to provide it. Otherwise, you might find that your data is not as accurate or comprehensive as you think. An example is the SFTR data discussed above, which right now will not give a complete picture of the SFT market in the EU as not all market participants are required to report their SFTs yet.

Understanding the precise scope of these datasets inevitably involves some understanding of the underlying rules and their application. This is not always easy and is (far) beyond the scope of this article. ESMA and other regulators have published various Q&As and other forms of guidance intended to help market participants understand the scope of the reporting requirements that apply to them, which may help you understand what is and is not included in the data available to you.

It is also important to understand how Brexit will impact these datasets. Most market participants based in the UK will continue to be governed by equivalent or similar reporting requirements laid down by the Financial Conduct Authority (FCA). However, post-Brexit, data reported by UK-based entities will, in most cases, be published separately to data provided by EU-based entities. The new UK rules may also differ in some respects to the EU rules. The FCA’s website has more information about the Brexit transition — see in particular the “Markets policy” heading under the “Markets” tab.

Know the limitations of the data

All datasets are vulnerable to errors, gaps and inconsistencies, and the datasets we have discussed here are no different. You are likely to encounter errors in the data that you will need to clean.

As well as the possibility of errors rendering data incorrect, you should also be aware of the possibility of inconsistencies in the way data is reported. As we saw above, for example, there are many different ways to describe a benchmark interest rate such as EURIBOR. There are also many different ways to write a company’s name, depending, for example, on what is capitalised and how the company’s legal form (such as “public limited company”) is displayed or abbreviated. This is why, if possible, it is preferable to search and sort data by LEI or ISIN rather than company name.

Compliance with the regulations we have discussed is not always perfect and this can impact on the quality of the data. An example is the EMIR reporting obligation, first introduced in 2012. Levels of industry compliance with that obligation were quite low in the first few years after it was introduced, with the result that the data from that time does not paint a very comprehensive picture of the EU derivatives market. Compliance has improved over time but is still far from perfect.

You should also be aware of the legal restrictions and conditions that attach to the use of the various datasets I have discussed. For data published directly by the EU regulators, use of the data is generally permitted free of charge. However, some form of attribution or notice is often required to be included. The websites of the various data publishing entities (ESMA, ECB, GLEIF, the trade repositories, etc) will set out the specific restrictions and conditions.

Complementary data sources

Below are a few other sources of data that you might find helpful when trying to analyse or visualise the datasets I have discussed in this article. For example, in the below links you can find data on the population size, economy size and debt levels of each EU member state.

  • World Bank data: The World Bank’s DataBank contains a wide range of data relating to most countries in the world, including population, GDP, debt and any more.
  • Eurostat data: Eurostat is the statistical office of the EU, and its database contains many interesting datasets relating to the EU and its member states. You can also find geodata describing the locations and boundaries of countries and other administrative areas. This can be helpful in creating map-based visualisations.
  • Eurobarometer data: The Eurobarometer is a series of surveys conducted regularly on behalf of the EU, to gauge Europeans’ opinions on a broad range of topics. “Standard Eurobarometer” surveys are conducted on a regular basis and tend to ask broadly similar questions each time, whereas “Special Eurobarometer” and “Flash Eurobarometer” surveys are conducted less frequently (sometimes on a once-off basis) and focus on specific topics. The EU publishes reports based on the responses to Eurobarometer surveys, but it is also possible to download the primary data (in SPSS or STATA format) for analysis.

Conclusion

This has been a very brief overview of a small selection of publicly available regulatory data. There are plenty of other data sources that we didn’t have time to go into — see, for example, the other registers available on ESMA’s registers website. Hopefully, however, this article has pointed you in the right direction and you will find some of the explanations and examples helpful in your own projects. If you create anything with the data we have discussed here, please share it in the comments!

--

--

Lawyer with an interest in programming, data analysis and the financial markets. Based in Dublin, Ireland.