Spring 2022 NAACCR Narrative Leave a comment

David O’Brien, PhD, GISP
Data Analyst, Alaska Cancer Registry


How many times have you edited an incoming file of abstracts from a facility and happened to notice that the registrar accidently miscoded the patient sex field? Or perhaps you were working on a GenEdits report in preparation for the Call for Data and discovered that the consolidated case had the wrong sex because one of the associated source abstracts had the sex miscoded. Wouldn’t it be great if there was an automated way to check your data so you can catch this type of miscoding error? Well, you’re in luck because there is!

If you take the time to look at the current NAACCR Edits Metafile, you will see quite a large collection of edits sets. Most registries are used to working with just a few of them on a routine basis, and for NPCR registries these are usually:

  • “Central: NCPR Required – Consol-All Edits” for consolidated cases, and
  • “Central: State Example – Incoming Abstracts” for incoming facility abstracts.

But these are just two of 28 different edits sets (!) that are available for your use. Buried in this long list of edit sets is this one:

  • “Sex, Name–First, Date of Birth”

This is the NAACCR Sex Edit Set. If you open it in EditWriter, you will see it consists of a single edit named “Sex, Name–First, Date of Birth (NAACCR)”. You can use this “NAACCR Sex Edit” to help you with identifying miscoded patient sex.


The New York State Cancer Registry developed a new edit routine that checked for sex miscoding and presented it at the 2010 NAACCR annual meeting1. The Florida Cancer Registry tested the edit on a subset of their data and presented their findings at the 2011 NAACCR annual meeting2. I attended that particular session and thought this was a great idea. After consulting with one of the original developers, I created an MS Access tool for the edit. After several years of successful use and the desire to get the information presented at the two NAACCR meetings out to a wider audience, several NAACCR member registries led by Florida collaborated to further test and promote the edit in 2014. The result was the publication of an article3 in the Journal of Registry Management (JRM) on the assessment of this sex edit by Florida, Alabama, and Alaska; the posting of the MS Access tool to the NAACCR website; and the incorporation of the edit into the NAACCR Edits Metafile (starting with v15A).


The purpose of this edit is to identify likely errors in sex based on first name. The edit compares the patient’s first name against a list of known name/sex pairs and the birth decade for which they are most common. If a match on name and decade is found but the sex code differs, an error is generated. If upon review the coded sex and first name are found to be accurate and in conformance with coding rules, the fields may be left as coded and the Over-ride Name/Sex flag coded to 1.


First and foremost, I need to stress that the “errors” that this edit produces are not necessarily errors, but a list of names that REQUIRES manual review. It might be as simple as referring to the “Text-Dx Proc-PE” field of the associated abstract and seeing if the sex of the patient was mentioned. Many registries have access to external databases such as DMV where sex is one of the data items and so can be used as a source of sex verification.

If you have never done a verification like this before, it can potentially take a long time to perform this task, especially if your registry is large. The Alaska Cancer Registry is relatively small in comparison to most of the other states. Back in 2011, I ran this edit for the very first time against our entire database of 46,645 consolidated cases. The edit routine flagged 88 cases (0.2%) as potential errors. After review, I found that 17 cases had miscoded sex and were corrected. The JRM article describes the experience of two other registries besides Alaska that used the edit. For both Florida and Alabama, the edit flagged 0.5% of the cases that were analyzed by the edit for review. When New York originally developed the edit, it flagged 0.3% of their analyzed cases for review1. Depending on the size of your database, 0.2-0.5% can be a hundred or it can be thousands.

Large registries may want to only submit a subset of their data to the edit for analysis. For the JRM article, Florida analyzed data for 3 “sex-skewed” primary sites (that is, sites for which one sex has much higher rates than the other) – breast, thyroid, and liver, in addition to one site, colorectal, that has more similar rates between sexes as a comparison. For breast cancer, it might be tempting to only analyze male cases, which is a relatively small cohort compared to females. However, what this does is artificially depress male breast cancer rates by removing miscoded males but does not add back in miscoded females. For this reason, it is important to analyze both sexes for a given site cohort for a given range of diagnosis years.

Note that this edit is based on a list of the 1000 most common gender-specific first names from the U.S. Census Bureau. So the edit won’t work if the patient’s first name is not on the list. Sometimes the sex is coded correctly but there is a spelling error in the name, such as Francis vs. Frances or Jean vs. Gene. Some names, such as Andrea, Angel, Carmen, Jean, Michele, Marian, and Vivian, are commonly female in the U.S. but male in other countries, so these specific names are excluded from the edit for foreign-born patients. So as an added check after I run my data through the edit, I run a frequency cross-tabulation of my data for first name by sex but exclude all records for which the sex edit over-ride is coded as 1. I sort the list by one sex in descending order, then look at the number of cases that appear in the other sex column for the most popular names. Then I sort the list in descending order by the other sex and do the same review. For example, for the name George, if there are 1000 males and 1 female, I would look up that one female to verify the patient sex.


This article is meant to inspire other central cancer registry analysts to try out this sex edit on their registry’s database. As the Alaska database is relatively small, this edit is run on our entire database annually just before the Call for Data submission deadline. For larger registries, it is recommended to start out with a data subset, such as the 24-month Call for Data dataset or sex-skewed primary sites. Whatever cohort is chosen, the data will benefit from the effort.


1 Soloway, L., F. Boscoe, A. Kahn, 2010. “A New Edit for Identifying Potential Gender Misclassification in Central Cancer Registry Databases (abstract).” In: Annual Conference and Workshops of the North American Association of the Central Cancer Registries, Final Program and Abstracts, June 19-25, 2010, Quebec City, Quebec, Canada. North American Association of Central Cancer Registries, Springfield, Illinois, p.78.

2 Sherman, R.L., J. Button, L. Soloway, F.P. Boscoe, 2011. “Sex Misclassification in Central Cancer Registries (abstract).” In: Annual Conference and Workshops of the North American Association of the Central Cancer Registries, Final Program and Abstracts, June 18-24, 2011, Louisville, Kentucky. North American Association of Central Cancer Registries, Springfield, Illinois, p.74.

3 Sherman, R.L., F.P. Boscoe, D.K. O’Brien, J.T. George, K.A. Henry, L.E. Soloway, and D.J. Lee, 2014. “Misclassification of Sex in Central Cancer Registries.” Journal of Registry Management, 41(3):120-124.




Copyright © 2016 NAACCR, Inc. All Rights Reserved | See NAACCR Partners and Sponsors