Analyse the migration of scientific researchers

Hannah Yan Han
Towards Data Science
4 min readNov 5, 2017

--

Today I looked into the inter- and intra-continental migration of scientific researchers based on ORCID ( Open Researcher and Contributor ID) data. Since not everyone has ORCID, the dataset is best seen as a directional sample of all researchers, and tracks their earliest/latest countries with research activities as well as their PhD countries.

Pre-processing

To clean the data, I filtered for is_migrated=True and where earliest country isn’t the same as latest country. I then converted country code to country name and continent names and shortened some countries’ names.

Movement by Continents

To show directions, the chords without space at the end points to the source and those with space at the end point to the target

We can see:

  • Europe is high in mobility with much intra-region movements.
  • Asia has more outbound move, mostly to Americas, followed by Europe, and then Oceania.
  • Americas has more inbound move, from Asia, Europe and same region.

Intra-Continent Move

For those moved to another country in the same continent, what are the major ones?

In Americas, biggest influx to US are Canada, followed by Mexico, Cuba and Columbia.

In Asia, India in almost entirely outbound, Saudi Arabia/Qatar are almost entirely inbound. Malaysia actually has more inbound than outbound.

Oceania (L) Africa (R)

In Europe, UK, Germany, France, Italy have the biggest movements. UK has a majority inbound while Italy has majority outbound.

In Oceania, most of researchers movement are from NZ to Australia.

In Africa, South Africa has more inflow of researchers from the region. Botswana has more influx than outflow too.

Inter-continent Movements

We can see the movement isn’t limited to developing countries to developed countries — researchers from developed countries may move on to work and live in other countries too.

Brain Drain or New Blood?

To understand the inbound vs outbound, I calculated these 3 metrics: outbound/inbound ratio, % of all researchers that moved out of a country and % of all researchers that moved in.

We use the height to indicate outbound/inbound ratio of migrations, and use the size of upward/downward triangle to show outbound/inbound move as % of all recorded researchers to normalize over population effect, for all countries with above-average movement counts.

Besides India, China has 2nd highest ratio, losing 5X more talents than gaining them. And Greek researchers have highest percentage of migrating out of the country.

In terms of attracting researchers, Qatar and Saudi Arabia are getting the most influx compared to very little outbound move. Singapore and HK in Asia are also attracting 2–3X more researchers than losing them.

Where did the migrated researchers do their PhDs

Among the migrated researchers, if we look at whether the researchers did their PhD in their earliest affiliated countries or latest ones by the earliest continents, we can see Asian/African researchers mostly continued to do research in their PhD countries, whereas American/Europeans moved after obtaining PhD.

Overall 49% of migrated researchers did their phDs in the earliest country, 39% in their latest country, 12% elsewhere.

Today I used circlize package to plot chord diagram and made marimekko based on faceted ggplot. I also realized the unique advantages of circular diagram vs sankey:

  • As Sankey shows sources and targets on different axis in sorted manner, it’s easy to see top outbounds and top inbounds
  • Circular diagram groups inbound and outbound together, making it more suitable to view overall movements of each node.

This is #day69 of my #100dayprojects on data science and visual storytelling. The code is on my github. Thanks for reading. If you like it, please share it. Feedbacks are always welcomed.

--

--