Privacy Preserving Record Linkage (PPRL): Coming Soon to the Virtual Pooled Registry

Narrative-Cat-Batch2hero_Registries_1060x597-3

The current Virtual Pooled Registry (VPR) workflow requires registries to download an encrypted study data file and perform a clear-text linkage using Personally Identifiable Information (PII). To broaden the scope, and enhance the security of the VPR, NCI has encouraged a transition to Privacy Preserving Record Linkage (PPRL).

What is PPRL?

PPRL is a technique that allows organizations to link data without releasing sensitive information. There are various PPRL techniques, including secure multi-party computation, bloom filter encoding, and hashing. Hashing was the first technique developed and remains one of the most widely used today. It is the process employed by Match*Pro, the software used in linkages associated with the VPR.

During the hashing process, algorithms are applied to the data to irrevocably transform PII, resulting in the creation of one or more hash tokens. The hash tokens do not disclose any identifiers, and the original data cannot be derived from the hashed value. For added security, PPRL systems insert random data alongside the input values when they are hashed. This random data is referred to as the salt, or key. The use of a salt guarantees a unique output from the hash function even when the input values are the same.

When linking data files both parties involved need to use the same hashing algorithm and the same salt when creating their hash tokens. Once one party is in possession of both sets of hash tokens, they can perform a record linkage to compare the hash tokens and identify the pairs of records that are believed to refer to the same patient.

Why use PPRL?

PPRL has many benefits, including the following:

  • Allows linkages to be performed without exchanging sensitive patient information (PII).
  • Allows linkages to be performed with a high degree of accuracy and with extremely few or no false positives.
  • Given the fact that false positives are exceptionally rare (on the order of 1 in tens or hundreds of millions), the linkage results do not need to be manually reviewed (nor is such a review possible, since no clear text was exchanged).
  • Expands the participation of central cancer registries in linkages with the NCCR.
  • Allows VPR linkages to be automated.
When and How will PPRL be Implemented for VPR Linkages?

Beginning in January 2026, all VPR linkages will be performed using PPRL. Registries will have the option of either performing the PPRL linkages in house or having Information Management Services, Inc. (IMS) perform automated PPRL linkages on their behalf. Registries that elect to have IMS to perform their PPRL linkages will execute a data use agreement prior to submitting their hashed registry data file to IMS.

Once a study file has been uploaded, validated, and hashed, linkages performed by IMS will run automatically against available registry PPRL files. IMS will provide aggregate match counts to the study (via the VPR) and will provide the registry with a crosswalk of the unique registry patient ID to the study subject ID for all matched cases. Registries that perform the PPRL linkage in house will upload their resulting aggregate match count report to the VPR. The resulting number of matches, from both registry- and IMS-linked PPRL data, will be viewable in the VPR, allowing the researcher to select registries from which to request individual-level data on the matched cases. Once all necessary registry/IRB approvals are in place and all agreements have been signed, the registry will send the requested data for each patient identified in the linkage, just as it does now.

What to Read Next

Massachusetts Cancer Registry Releases Updated Childhood Cancer Report

The latest report on childhood cancer in Massachusetts spans from 2009 to 2018. The Massachusetts Cancer Registry (MCR) is pleased…

Wyoming Cancer Surveillance System: A Succesful Interstate Data Exchange Project

A closer look at Wyoming Cancer Surveillance Program’s 2021 NPCR Success Story and its partnership with the neighboring Utah Cancer…

Cancer Registry Processes for Cancer Pathology Data in Canada

Most PTCR information collected is from provincial/territorial level health information systems that are attributed to universal health care systems in...