Life Sciences Regulations and Standards Data De-Identification & Anonymization

What is clinical trial de-identification?

Life Sciences Regulations and Standards Data De-Identification & Anonymization



Cathal Gallagher explains the growing importance of protecting the value of clinical trial findings - in addition to patient privacy - as clinical study reports are made public.


As part of a general trend to improve transparency and data sharing in life sciences, the European Medicines Agency now requires that clinical study reports (CSRs) are published publicly. But this has placed new pressures on companies to protect patient privacy – by ensuring they cannot be identified from reports entering the public domain. Under EMA Policy 0070, CSRs must be anonymised to prevent patients (and indeed professionals) who participated in clinical trials from being identified.


What is clinical trial de-identification?

Clinical trial de-identification refers to the process of eliminating reasonable risk of individual patients from being identified from references made in external reporting of clinical trial findings.


Why is it imperative to have your data de-identified?

If companies do not take sufficient measures to protect patient privacy as their clinical trial findings are published externally, they risk serious consequences. As well as undermining patient trust and damaging the brand, firms risk being fined for privacy breaches.

The requirement has been in place for over two years now, but to date most companies have taken the ‘easy’ route: redaction – blocking out any content deemed to be potentially sensitive. Yet this heavily manual process is fraught with problems. It is extremely time-consuming, and carries risk. If a single reference is left in by mistake, a trial participant’s privacy could be compromised. The other major problem is that heavily redacted reports leave little value for the reader. This high risk/low value ratio is at odds with the aims of clinical trial transparency. For all that effort, nobody wins.


What is risk mitigation – and how does it relate to de-identification?

Risk mitigation is about measuring, appreciating and taking appropriate steps to guard against a perceived risk. In the case of clinical trial de-identification, this process involves calculating the likely risk of a patient being identified from the information and references mentioned in any reports that are published publically. The more specific and unusual the trial, the greater the risk of individual participants being identified (eg if they suffer from a rare condition and are linked to a trial that took place in a particular geographic area).


Mitigating risk in data sharing initiatives

As the whole point of data sharing initiatives is to grow collective knowledge, accelerate clinical progress and save repetition of research, it is important that the value and integrity of published findings is retained as far as possible. So this needs to be balanced against the risk of patient re-identification.

To address this, EMA now recommends that companies use more sophisticated strategies for anonymising CSRs – ie. not by striking out anything sensitive (ie. via blanket redaction), but by systematically de-identifying references to trial subjects so that the integrity and value of reports remains intact.

Intelligent software can help with this, assigning the rules and automating much of the de-identification process, removing much of the heavy lifting from busy teams. Using techniques such as date offsetting, companies can confidently disguise giveaway information while retaining the value of the reported findings. An added benefit is that an occasional omission is very unlikely to point the reader to the original data: ie any errors are effectively hidden in plain sight. So there is much less risk with this approach to patient safeguarding.


Calculating and managing risk

The ability to calculate risk can be powerful in preserving the utility of publicly reported clinical findings. EMA has defined the acceptable risk level for patient re-identification to be 0.09 – meaning that each subject’s defining characteristics (country of residence, race, etc) must be in common with those of at least 11 other patients taking part in the trial. If needed, companies can refer to larger groups or equivalent classes – eg using ‘European’ in place of ‘Irish’, or including subjects from other local trials in the same therapeutic area.

One way to get to a quantitative risk measure is to approach de-identification at an underlying data level (ie go deeper than individual documents). As firms get to grips with the risk-utility ratio, they can adjust their measures to suit the context – a process that is relatively simple with the help of the right technology.


Systematic solutions to clinical data de-identification

None of this needs to be disruptive or expensive, either. De-identification systems and services can be provided via the cloud, to spread the costs and manage peak demand economically, while easing the burden on internal IT departments and medical writer/transparency teams.

There are additional good reasons to develop more sophisticated, systematic processes for clinical trial de-identification. For one thing, other geographical regions beyond Europe are embracing similar strategies to EMA’s, including North America and parts of Asia, so systematic de-identification is likely to become standard practice, warranting the investment.

The life sciences industry also has a chance to lead with its approach to data transparency. While other industries are going all out to collect data, very few are sharing it for the greater good, so there is a great opportunity for companies to show the way.