Skip to contents

Motivation

The dissemination of aggregate health statistics derived from clinical and administrative data carries an inherent tension between analytic utility and patient confidentiality. When reported frequencies are sufficiently small, individuals within a subgroup may be vulnerable to re-identification — particularly in stratified or cross-tabulated outputs where demographic, geographic, or clinical covariates intersect. Such vulnerabilities have prompted data use agreement (DUA) obligations by federal agencies and clinical research networks alike.

Federal agencies — including CMS, AHRQ, NCI, and CDC — each maintain formal small-cell suppression requirements as a condition of data access and publication. Clinical research networks and large-scale real-world data platforms similarly enforce these standards: PCORnet® and PEDSnet both require a minimum cell-size threshold of 11 and 5 respectively across all distributed data queries under their respective data sharing agreements. All of Us Research Program and N3C Data Enclave prohibits the dissemination of any participant count between 1 and 20; Epic Cosmos, and TriNetX each adopt the CMS standard, requiring that cells with counts of 10 or fewer be masked as <11. Furthermore, studies utilizing the CPRD in the United Kingdom requires masking of cells with counts fewer than 5.

In any aggregated tabular output — whether produced for a formal research publication, an institutional report, a regulatory submission, or an operational dashboard — manually identifying and suppressing all qualifying primary and complementary cells across large tables is a cumbersome and error-prone process. countmaskr automates this workflow end-to-end, improving participant privacy as well as assisting end-users to meet their DUA obligations consistently and in a reproducible manner across institutional data sharing pipelines.

Definitions of small and secondary cells in one-dimensional frequency table

Original

Age N
0 - 1 4
2 - 9 71
10 - 19 925
20 - 29 0
30 - 39 0

 

  • small cell (primary cell) : A cell with a value below the defined threshold which requires suppression.
  • secondary cell : A cell that must also be suppressed to prevent reverse-engineering of the primary cell through arithmetic operations.

 

Solution

Age N
0 - 1 <11
2 - 9 <80
10 - 19 925
20 - 29 0
30 - 39 0

 


Definitions of small and secondary cells in two-dimensional frequency table

Original (Sex × Ethnicity)

Totals Not Hispanic Hispanic Other
Totals 1,678 1,377 296 5
Male 931 923 8 0
Female 740 452 283 5
Other 7 2 5 0

 

Cell-type definitions

In two-dimensional tables, suppression propagates iteratively across rows and columns until no cell value can be recovered via arithmetic. The cascade follows this order:

  1. Primary Cell (PC) — Any cell whose value falls below the defined threshold. These are identified and suppressed first.

  2. Column-wise Secondary Cell (CSC) — For each PC, a cell in the same column is suppressed so that column arithmetic (e.g. col_total − other_cells = PC) cannot recover the PC.

  3. Row-wise Secondary Cell (RSC) — Suppressing a PC and CSC creates a new vulnerability: row arithmetic (e.g. row_total − other_cells = CSC) may now recover the masked CSC. A cell in the same row is therefore suppressed to break this path.

The newly suppressed RSC is itself now a masked value that may be recoverable via column arithmetic, requiring a further CSC. Steps 3–4 repeat — alternating between row-wise and column-wise checks — until no remaining combination of visible values and suppressed upper bounds can recover any masked cell. At convergence, the table is safe to release.

Totals Not Hispanic Hispanic Other
Totals 1,678 1,377 RSC PC
Male 931 RSC PC 0
Female CSC CSC 283 PC
Other PC PC PC 0

 

Masked version (threshold = 11)

Totals Not Hispanic Hispanic Other
Totals 1,678 1,377 <300 <11
Male 931 <930 <11 0
Female <750 <460 283 <11
Other <11 <11 <11 0

 

Installation

Install countmaskr from CRAN with:

install.packages("countmaskr")

You can also install it using pak:

pak::pkg_install("countmaskr")

Development version

Install the development version from GitHub with:

devtools::install_github("Query-Fulfillment/countmaskr")

Or use pak:

pak::pkg_install("Query-Fulfillment/countmaskr")