Distributing access not data: monitoring who can access patient data

What does the future for access to sensitive health data for research look like?

Richard Welpton
The Health Foundation Data Analytics

--

Photo by imgix on Unsplash

Whether about a trip to the GP or a hospital admission, routinely-collected patient data can be a rich and important resource for researchers looking to better understand our health and our health services. For many years, these data have been shared for research purposes. In this post, I’m going to talk about how access to these sensitive data is likely to look in the future.

Hospital Episodes Statistics data are routinely used by organisations like The Health Foundation for research into how the health service performs and how this affects patient outcomes. The data are collected and made available by NHS Digital and are used, among other things, to assess effective delivery of care, monitor trends and patterns in NHS hospital activity and evaluate health care policy.

In the same way, when someone visits a GP practice that has agreed to share patient data with the Clinical Practice Research Datalink (CPRD), information about their symptoms, prescribed treatments or referrals are captured, and then made available for researchers.

These data, while stripped of information that would directly identify patients, are made available to researchers to download and use in their own institution’s secure computing facilities. If researchers access data from NHS Digital then these facilities must comply with the information governance requirements laid out in the Data Security and Protection Toolkit (DSPT).

Even though many safeguards are stipulated, data is ‘distributed’ — it physically leaves an organisation and heads towards the researcher, so the supplier no longer has direct control. Instead, they must rely on enforcing standards laid out in contracts.

The move towards Trusted Research Environments (TREs)

A TRE is a secure computing facility. Many are accredited to the international information security standard ISO27001 (and in health, the DSPT). Data are stored inside a TRE and researchers are granted access, so they can log in and use the data they have requested. Normally a suite of software applications is provided that allows them to analyse the data and write up their statistical results.

These results are released to researchers subject to a check to make sure that no one can be identified in the data, and/or that they do not contain any confidential information. This check is known as statistical disclosure control: see my blog Maintaining data confidentiality in Trusted Research Environments that talks about this in more detail.

The key principle is that data stay inside the secure facility of the TRE; access can be granted to a researcher, so they can log in from wherever they normally work. But the data remains with the organisation responsible for the data; the data haven’t been moved or copied to anywhere else.

This method of distributing access to sensitive data rather than distributing the data itself has a number of advantages, as summed up by Tanvi Desai in Disseminate Access not Data. She explains how it allows data suppliers to:

· retain control as they know exactly where data are stored

· know exactly who has access to data, and why

· monitor the release of statistical analyses produced from the data

It’s an approach that has served social science researchers well (the UK Data Service Secure Lab and Office for National Statistics’ Secure Research Service are good examples), but with one or two exceptions, it has not been used in medical research very much.

In the health research world, Connected Health Cities was funded across the north of England to facilitate access to patient health records for a variety of projects. NHS Digital has recently launched its own TRE and CPRD is also designing a similar facility. So the days when organisations like NHS Digital send sensitive data directly to researchers may be numbered.

Advantages and challenges

The benefits of TREs are clear from an information governance perspective: confidential patient data remain in a single or small number of facilities where data suppliers have control and oversight. The number of electronic patient records that are held in different places by different researchers is dramatically reduced and so is the risk of data security events. Patients themselves can better understand where their data are kept and what is done with the data. For researchers, there are benefits too: for example, they do not have to invest in complex secure technology to access data.

However, there are obvious challenges for data suppliers, who need to create facilities that innovative researchers can use productively and efficiently. These facilities have to provide the requisite software and hardware capabilities that researchers need, and be in a position to evolve quickly to meet requirements. They also need to be able to cope with increased demand and a high number of users.

It will also require something of a mind shift as the organisation running the facility will effectively be ‘looking after’ the researcher and their work, whereas previously the researcher would use their own institution’s platform.

Good communication and engagement are needed on both sides for these facilities to work efficiently, as Tanvi Desai and Felix Ritchie pointed out in their paper on Effective Researcher Management delivered to a statisticians’ conference back in 2009:

“If researchers are seen as an active part of the security model, as opposed to something for the data to be protected against, then both more efficient and more secure operating models can be devised.”

An element of good ‘customer service’ also ensures that researchers and TRE staff work well together. In my experience, the TRE model works well when:

· expectations of the service and researchers are clearly communicated at first interaction

· staff understand research and understand why researchers want to access data

· staff recognise that research evolves: in terms of methodology, tools and the fact that good research always aims to push the boundaries of our understanding

· researchers understand that TREs often operate with finite resources

· researchers recognise that TRE staff may not always understand what they are trying to achieve and how communication is key

Concluding remarks

If these challenges can be met, then distributed access is how confidential data will be used safely and effectively for research in the future.

The benefits are clear: good information governance, assurance about how sensitive data are used and stored. Researchers can tap into these resources, rather than making their own investments in secure data access.

But TREs need to be set up and managed with researchers in mind. If not, then researchers will continue to demand to access data directly themselves. Working with researchers to build and manage TREs effectively is the key to success.

The TRE model has served the social sciences well, now it’s the turn of medical research.

Acknowledgments

I would like to thank Dr Hannah Knight, Senior Analytics Manager at The Health Foundation, and Christine Garrington, for their help and support to develop this article.

--

--

Richard Welpton
The Health Foundation Data Analytics

Head of Data Services Infrastructure, Economic and Social Research Council. Access to data for research, data confidentiality. Runner. @rwelpton