The 2025 NAACCR Annual Conference was held in Hartford, CT from June 3-5, 2025. Our theme for this year was, “Honoring the Past, Embracing the Future” as we celebrated the Connecticut Tumor Registry’s 90th anniversary.
Two hundred and eighty-eight people attended the conference, and of those, 53 were attending a NAACCR Conference for the first time.
The conference included sessions about advancements in cancer informatics, policies and partnerships on cancer surveillance data, research using cancer data, differences in cancer burden, and more. The conference also included a concurrent session specifically highlighting the National Childhood Cancer Registry (NCCR) initiative.
Here, we briefly share the highlights from the NCCR session. Slides from all the presentations are now available to NAACCR members.
Concurrent Session 1.C. National Childhood Cancer Registry
The NCCR panel featured four distinguished speakers who explored key aspects of the NCCR, from its integration within the Childhood Cancer Data Initiative (CCDI) to the preparation and analysis of de-identified data. Dr. Austin (Coy) Fitts presented an evaluation of data completeness and interoperability across pediatric populations using the CCDI Participant Index. Will Howe demonstrated the strong performance of Match*Pro’s Privacy-Preserving Record Linkage, showing high accuracy and zero false positives when linking NCCR and Virtual Pooled Registry (VPR) data. Rebecca Ottesen highlighted best practices for navigating and preparing data on the NCCR Data Platform. Finally, Dr. Fernanda Michels analyzed disparities in clinical trial participation among children, adolescents, and young adults using the newly launched platform. Anca Preda, a recognized expert on the NCCR from the National Cancer Institute, moderated the session. Below is an in-depth look at each of these engaging presentations.
Evaluation of Data Completeness of Multimodal Data for Longitudinal, Patient Outcomes Analyses Using the NCCR Data Platform and Interoperability Resources with the CCDI Data Ecosystem
Coy Fitts1, Johanna Goderre1, Subhashini Jagu1, Betsy Hsu1
1National Cancer Institute – NCI
Pediatric cancer research has long been challenged by fragmented data systems, with information spread across registries, research repositories, and clinical records. Addressing this fragmentation, the National Childhood Cancer Registry (NCCR) has taken a significant step forward through its integration with the Childhood Cancer Data Initiative, enabled via the CCDI Hub and the Childhood Participant Index.
This integration is designed to support more comprehensive, patient-centered research by linking population-based cancer registry data with biospecimen, genomic, and clinical trial datasets curated across CCDI-supported studies. The cornerstone of this effort is the Childhood Participant Index, which acts as a unifying framework that connects data points for individual participants across otherwise siloed systems. The result is a longitudinal, multi-dimensional view of a child’s cancer experience, from diagnosis and treatment through survivorship or recurrence.
The process begins with structured data collected through standard registry mechanisms in the NCCR, capturing variables such as diagnosis codes, stage, and treatment details. The Childhood Participant Index works to match study-specific patient identifiers between datasets so that data can be shared across data domains. This linkage is performed without releasing any PII and with strict privacy controls and identity resolution protocols to ensure data integrity and participant confidentiality.
Once matched, data can flow bidirectionally. For example, registry data can enrich CCDI studies by adding context on real-world outcomes, while genomic or biospecimen data from CCDI repositories can be layered onto registry records to support advanced research questions. This level of interoperability enables researchers to query the system holistically, asking not just what happened to a patient, but why, how, and what came next.
CCDI’s federated data ecosystem represents a fundamental shift in how pediatric cancer data is connected and used. Rather than building one central database, it aims to establish a network of systems that communicate through shared identifiers, standards, and data infrastructure. The integration is ongoing, with iterative improvements to the participant index, expanding linkages across studies, and continued input from the pediatric oncology community to refine use cases and support research priorities.
An Evaluation of Privacy-Preserving Record Linkage Results Using NCCR and VPR Data
William Howe1
1Information Management Services – IMS
As part of the annual NAACCR Call for Data, NCCR registries create and submit a standardized NCCR file which contains diagnosis and treatment information for all cases belonging to patients diagnosed with at least one cancer between the ages of 0-39. The NCCR files also contain PII and a registry-specific patient identifier, which are leveraged in an intra-NCCR linkage process. At the end of the linkage process, each patient is assigned a unique NCCR ID that consolidates all the information reported by the registries, resulting in a comprehensive, unified view of the patient’s data.
Officially launched in 2022, the Virtual Pooled Registry Cancer Linkage System (VPR-CLS) is a secure, web-based service that streamlines the process of connecting researchers with multiple U.S. population-based cancer registries. The VPR-CLS enables researchers to submit a single cohort file for linkage with each of the VPR registries using standardized linkage software and uniform matching algorithms and simplifies how researchers apply for and monitor the release of detailed, individual-level data from the registries. The system is managed by NAACCR with support from the National Cancer Institute.
By offering a centralized point of entry and employing a standardized approach, the VPR-CLS reduces the level of effort registries and researchers need to dedicate to the linkage and approval processes. The platform was developed by IMS, which also serves as the system’s neutral third-party, or honest broker, ensuring data integrity and privacy throughout the process.
As part of the annual NAACCR Call for Data, VPR registries create a standardized VPR-CLS registry linkage file. This file contains PII and diagnosis and treatment information for every reportable case owned by the registry.
On a biennial basis, patients in the NCCR cohort are linked to VPR registries through a series of linkages that are collectively referred to as the NCCR-VPR linkage. The NCCR-VPR linkage aims to identify previous and subsequent cancers diagnosed in other states among patients in the NCCR analytic dataset and to detect duplicate cases to supplement staging and treatment information.
Up until recently, the standard protocol for performing this linkage required a registry to provide its VPR file, which contains PII, to IMS.IMS would then conduct the linkage between the VPR file and the NCCR-Combined file, which also contains PII to produce a set of potential matches, which are then manually reviewed to identify the likely matches.
Some registries are unable to send a file containing clear text PII to IMS for linkage with the NCCR-Combined file. Given the significant role that the NCCR-VPR linkage plays in enhancing the overall quality and completeness of information pertaining to the diagnosis, treatment, and outcome of each childhood cancer in the NCCR, NCI was eager to seek alternative approaches to performing the linkage that might allow them to circumvent that issue.
Privacy-Preserving Record Linkage (PPRL) techniques allow institutions/organizations to link data without releasing sensitive information through a broad range of processes, including secure multi party computation, bloom filter encoding, and hashing, among others.
During the hashing process, a series of cryptographic functions are applied to the input data to generate a set of hash tokens. After the input data have been hashed/tokenized, the linkage process compares the hash tokens in two or more files to identify sets of records that are believed to belong to the same entity. The hash tokens do not disclose any identifiers because a hash function is, by definition, a one-way function; meaning the original data cannot be derived from the hashed value.
To conduct the evaluation, IMS used Match*Pro to link the raw data from the NCCR combined file with the raw data from 20 registry VPR files. The raw data linkages were manually reviewed and the patients that matched were identified. The set of patient matches obtained from the raw data linkages constituted our “gold standard” for the evaluation.
Next, IMS tokenized/hashed the NCCR combined file and the VPR files using Match*Pro’s default token set and linked the tokenized datasets with Match*Pro. Finally, IMS compared the set of PPRL patient matches against the gold standard set of patient matches to see how well the privacy-preserving record linkage stacked up against the raw data linkage.
Match*Pro’s PPRL module performed well. It identified a total of 22,520 matches, 100% of which were true positives. We observed 3,573 false negatives, which aligns with PPRL’s known challenges in dealing with typos and misspellings.
The sensitivity rate, which measures how well Match*Pro’s PPRL module can identify matches, was 0.86. Specificity, which measures how well Match*Pro’s PPRL module classifies non-matches, was 1.0. In terms of workflow efficiency, the clear text linkage process took over 9 hours, while the PPRL process took nearly 6 hours.
Match*Pro’s PPRL module is an effective tool for linking data without revealing sensitive information.
Best Practices for Managing Data Sets from the National Childhood Cancer Registry (NCCR) Data Platform
Rebecca Ottesen1, Austin Fitts2, Olalekan Adeyemi2, Betsy Hsu2, Johanna Goderre2
1 The Emmes Company LLC, 2 National Cancer Institute – NCI
The NCCR Data Platform consolidates public health surveillance data from 22 NCCR participating cancer registries into a central location linked to real world data sources. This robust resource offers approved researchers access to pooled longitudinal data including clinical, treatment, claims, demographic, and outcomes data. Researchers can interactively explore data, define customized cohorts, and submit formal requests to download analytic data sets through the NCCR Data Platform’s request process. To effectively work with this complex resource, consider the following five best practices:
- Cohort Development
When building cohorts using the interactive cohort discovery tool, it is recommended to:
- Utilize the index cancer (tumor/diagnosis) and ICCC (International Classification of Childhood Cancer) fields to refine the cohort to the relevant diagnoses and disease classifications of interest.
- The definitions used in creating the customized cohort will be classified by the cohort definition record variable.
- Understanding Raw Data Sets
The downloaded data are structured into seven relational tables, each linked by unique identifiers and important fields such as:
- Data request patient ID (all tables)
- Tumor record number (select tables)
- Cohort definition record (present in all data sets)
- The main clinical data set (CTC) includes all tumor/diagnosis records associated with a patient.
- Data Preparation
Key considerations during data cleaning and preparation include:
- Consulting the behavior field for accurate tumor classification as benign or malignant.
- Noting that the index tumor/diagnosis may not always be malignant, which impacts the usage and potential conversion of date-related fields.
- Evaluating missing codes by consulting the data dictionaries. Decide how best to handle them in the context of the analysis.
- Aligning with Research Questions
Start with a clear research question:
- Develop a flow diagram outlining inclusion and exclusion criteria.
- Consider record selection criteria based on primary vs. secondary diagnoses, cohort definition records, and tumor/diagnosis behavior.
- Be mindful when joining clinical data with other tables that may reduce the sample size due to missing or unavailable data.
- Recognize the limitations of certain variables, such as treatment records, which may suffer from completeness and bias issues that can affect interpretability.
- Drawing Meaningful Conclusions
Interpretation of results must take data limitations into account:
- Assess the completeness and potential biases associated with the data that was used for analysis.
- Consider the timing of events to allow enough time to pass for receipt of care, survival outcomes, and the development of comorbid conditions.
- Where applicable, consider the use of surrogate variables or links to external data sources to enhance interpretation and strengthen findings.
The aggregated data from the NCCR Data Platform is a powerful resource for the research community that can be deployed to investigate diagnostic and treatment patterns, survival trends, healthcare utilization, and epidemiologic changes over time. A solid understanding of the platform’s structure, strengths, and limitations is essential to conduct sound, meaningful analyses.
Socio-Demographic Characteristics of Children and AYA Cancer Patients Enrolled and Not Enrolled in COG Clinical Trials Using the NCCR Data Platform
Fernanda Silva Michels1, Gonçalo Forjaz2, Stephanie Hill1, Karen Knight1
1NAACCR 2Westat
Cancer death rates have decreased over the years for children (0-14 years) and adolescents and young adults (AYA, 15-39 years). A significant portion of this improvement in survival rates can be attributed to treatment advances from cooperative clinical trials in the US. However, according to a Children’s Oncology Group (COG) study, only 20% of US cancer patients aged 0 to 19 years old enrolled in COG trials between 2004-2015. Many studies reported under-representation of racial/ethnic minorities as well as AYA in clinical trials.
According to NCCR data from 2021, over 54,000 new cancer cases were diagnosed among children and adolescents/young adults (AYA). The vast majority—more than 48,000 cases—among 15-39 years old. Cancers in children and AYA account for about 5% of all cancer cases across all age groups. The age-standardized incidence rate for all cancer types in the 0–39 age group was about 532 cases per million people (same age group).
Last November, the National Cancer Institute (NCI) Childhood Cancer Data Initiative (CCDI) created the first national resource, the National Childhood Cancer Registry (NCCR) Data Platform, that links childhood and AYA records across population-based cancer registries and real-world data partners (including COG).
According to COG, more than 80% of children and adolescents diagnosed with cancer each year in the United States are cared for at COG member institutions. COG has more than 100 active clinical trials. There are about 12,000 patients registered on COG trials each year. COG collects and manages information about pediatric clinical trials through its network of affiliated institutions and centralized data systems.
Using the NCCR Data Platform, this study aims to describe and compare socio-demographic characteristics between children and AYA enrolled in COG clinical trials with those who did not enroll. For the statistical analysis, we ran basic descriptive statistics like the Z-test for two proportions, Chi-square test, and calculated confidence intervals.
While we didn’t have access to trial cancer types, or eligibility criteria, we were still able to get a meaningful overview of several important aspects. For example, we checked the proportion of clinical trial enrollment by year and found that 2009 had the highest enrollment rate at 7.33%. A 2020 study analyzing COG trials from 2004 to 2015 observed a decline in enrollment rates over time. They suggested that that trend may be caused by many challenges, including difficulties in pediatric drug development, the complexity of designing feasible trials, and limited trial availability for solid tumors and CNS tumors, as opposed to hematologic cancers.
Our findings align with previous research showing that younger patients have higher rates of enrollment in COG trials, especially the 1-4 years age group with 53.35% of enrollment (95% CI: 52.8%-53.9%). One COG study reported that children under 15 were enrolled in clinical trials at a rate 3 times higher than those aged 15–19. In our data, 78% of patients under 15 were enrolled in a clinical trial, compared to just 19% among the 15–19 age group.
The ICCC groups with the highest enrollment rate were Neuroblastomas (61.6%), Leukemias (32.2%), Retinoblastoma (31.5%) and Malignant Bone tumor (28.7%). However, we didn’t have information about the criteria of each trial, or even which trial was available at the time of the study 2007-2018, maybe there wasn’t clinical trials available for GCT tumors, or very specific lymphomas trials.
Regarding race, White had the highest percentage of clinical trial enrollment when compared to black for two cancer groups:
- Leukemias: White 31.5% (95% CI: 30.9%-32.0% ) versus Black 23.0% (95% CI: 22.0%-24.1%)
- Malignant bone tumors: White 29.2% (95% CI: 28.0%-30.5% ) versus Black 25.4% (95% CI: 22.8%-27.9%)
Our results confirm that the data platform is a valuable resource that enables researchers to address a wide range of scientific questions, including many of those highlighted in this presentation.
Its accessibility and flexibility make it a unique tool for the children and AYA cancer community.
More research on clinical trial enrollment patterns is needed to better understand barriers and develop strategies to improve participation rates among pediatric and AYA populations.
About the NCCR
NAACCR serves as the coordinating center for the National Childhood Cancer Registry (NCCR), which is part of the National Cancer Institute’s (NCI) Childhood Cancer Data Initiative (CCDI) Data Ecosystem. To date, the NCCR includes data on nearly 1.8 million cancers diagnosed in patients under the age of 40 from 1995-2021 in 28 state cancer registries. Data from the NCCR is currently available in NCCR*Explorer, SEER*Stat, and the NCCR Data Platform. For more information on the NCCR and its data products.
Mark your Calendar
September 29 and 30, 2025: Data Jamboree: Enhancing Childhood Cancer Data Sharing and Utility.
September 30 and October 1, 2025: NCI Office of Data Sharing’s Annual Data Sharing Symposium 2025: How Data Advances the Impact of Cancer Research.
Tags: AYA, CCDI, CCDI Participant Index, COG, Childhood Cancer, Featured, NCCR, NCCR Data Platform., PPRL, VPR
What to Read Next
Become a Charter Member of the Upkeep WG!
The Professional Development Steering Committee is looking for three to four volunteers to join the Upkeep Work Group (WG). We…
It Takes a Village: The Collaborative Review Process That Makes VPR Linkages Possible
Since its official launch in 2022, NAACCR’s Virtual Pooled Registry (VPR) has been used in more than 30 minimal risk…
CDC’s Work to Transform Childhood Cancer Case Reporting
The Centers for Disease Control and Prevention (CDC) is working on an ambitious initiative to increase the speed and accuracy…