Life Sciences Transparency Services Data De-Identification & Anonymization

EMA Policy 0070: Data Utility - Striking the Balance

Life Sciences Transparency Services Data De-Identification & Anonymization

The democratization of clinical trial data: An unwanted gift?

For all the fuss about greater openness in life sciences, there have been fewer than 260 requests to see clinical study details since the EMA first enforced data sharing. "So is it worth all the aggro?" asks d-Wise’s Cathal Gallagher.

The EMA has been dabbling with data transparency for a while now, keen to improve the perception of the life sciences industry by encouraging manufacturers to show they have nothing to hide. A spirit of openness is of course also important to accelerate clinical breakthroughs – e.g. if researchers across the industry can interrogate and build on previous findings, they’ll be quicker to find effective new treatments.datasharing.png

So it’s surprising that the Agency’s early attempts at promoting clinical trial information sharing haven’t been embraced with the enthusiasm expected.

Even though Policy 0043 was made law, forcing firms to share their findings with other researchers on request, as few as 254 content requests have been made to, the largest clinical study data request site, since processes were established three years ago.

So what’s going on? Was the EMA wrong - is this something nobody wants after all?

Well, that’s probably over-simplifying things. For the wider industry to benefit from a company’s research findings, these have to be meaningful. But that has implications for the more sensitive aspects of a clinical trial – the identities of the people who agreed to take part, the competitive advantage the manufacturer derived from the study, and other insights into that company’s strategy and processes.

Maximum utility, minimum risk

Until now, the EMA has allowed life sciences brands to play safe – to submit certain extracts or tables from clinical trial documents if formally requested, but in a format that reliably removes any reference to anything remotely sensitive. The downside is that this has left very little of value. Which probably has a lot to do with why more researchers haven’t applied to gain access - that, plus the fact they have typically had to wait months to get the content.

The EMA had the right idea, but in practice it wasn’t delivering what anyone needed.

Policy 0070, now up and running, is an attempt to address those issues and strike a better balance between availability, utility and data safeguarding. The idea is to end up with something that offers maximum value, without excessive risk of over-disclosure. But this has put the cat among the pigeons – especially if there is a chance that it may all be for nothing.

Rejecting redaction is one measure designed to improve data utility. Although it remains important that patient A can’t be identified, data integrity demands that patient A’s outcomes are traceable to his or her characteristics, condition and treatment across the study. So consistent formulae need to be applied which allow an individual candidate, though not an individual person, to be followed across the study lifecycle.

Doing the maths

One of the problems is disagreement about the chance of inadvertent disclosure, particularly in the case of rare conditions – where human samples are small and variants more easily identifiable. EMA has attempted to fix its requirements accordingly, setting the bar at a 9 percent chance of identifying an individual. As samples get smaller, companies can adjust criteria so that someone becomes European rather than Irish, or is lumped into a broader age category – elevating the parameters until there are at least 10 comparable datasecurity.pngothers. Date offsetting can be important too, so that patients can’t be identified from when they were known to be in hospital, for example.

The ideal balance the industry is aiming for is something we call the Goldilocks Zone, between 0.05 and 0.09, where data has maximum utility while maintaining an acceptably low risk of a patient being re-identified from their coordinates.

The trouble is that each company currently has its own set of rules for making these kinds of adjustments, further adding to the complexity.

If the industry is able to reach consensus and companies can get this right, we may be getting somewhere and EMA’s vision might be achievable. The danger comes when the balance shifts too far in one direction (towards utility at the expense of candidate comfort, or towards subject reassurance at the expense of data’s usefulness), or when each company plays by a different set of rules.

We would advocate harmonisation of data manipulation methods, then a systematic way of anonymizing clinical trial data centrally - at a source level. By that we mean creating two sets of data (one original, and one EMA Policy 0070 compliant equivalent) that can be used to create safe, useable documents ad infinitum.

If the master data sets are robust, there is no reason why the public-facing versions of CSRs created from them wouldn't be watertight.