Motivation
The dissemination of aggregate health statistics derived from clinical and administrative data carries an inherent tension between analytic utility and patient confidentiality. When reported frequencies are sufficiently small, individuals within a subgroup may be vulnerable to re-identification — particularly in stratified or cross-tabulated outputs where demographic, geographic, or clinical covariates intersect. Such vulnerabilities have prompted data use agreement (DUA) obligations by federal agencies and clinical research networks alike.
Federal agencies — including CMS, AHRQ, NCI, and CDC — each maintain formal small-cell suppression requirements as a condition of data access and publication. Clinical research networks and large-scale real-world data platforms similarly enforce these standards: PCORnet® and PEDSnet both require a minimum cell-size threshold of 11 and 5 respectively across all distributed data queries under their respective data sharing agreements. All of Us Research Program and N3C Data Enclave prohibits the dissemination of any participant count between 1 and 20; Epic Cosmos, and TriNetX each adopt the CMS standard, requiring that cells with counts of 10 or fewer be masked as <11. Furthermore, studies utilizing the CPRD in the United Kingdom requires masking of cells with counts fewer than 5.
In any aggregated tabular output — whether produced for a formal research publication, an institutional report, a regulatory submission, or an operational dashboard — manually identifying and suppressing all qualifying primary and complementary cells across large tables is a cumbersome and error-prone process. countmaskr automates this workflow end-to-end, improving participant privacy as well as assisting end-users to meet their DUA obligations consistently and in a reproducible manner across institutional data sharing pipelines.
Definitions of small and secondary cells in one-dimensional frequency table
Original
| Age | N |
|---|---|
| 0 - 1 | 4 |
| 2 - 9 | 71 |
| 10 - 19 | 925 |
| 20 - 29 | 0 |
| 30 - 39 | 0 |
- small cell (primary cell) : A cell with a value below the defined threshold which requires suppression.
- secondary cell : A cell that must also be suppressed to prevent reverse-engineering of the primary cell through arithmetic operations.
Definitions of small and secondary cells in two-dimensional frequency table
Original (Sex × Ethnicity)
| Totals | Not Hispanic | Hispanic | Other | |
|---|---|---|---|---|
| Totals | 1,678 | 1,377 | 296 | 5 |
| Male | 931 | 923 | 8 | 0 |
| Female | 740 | 452 | 283 | 5 |
| Other | 7 | 2 | 5 | 0 |
Cell-type definitions
In two-dimensional tables, suppression propagates iteratively across rows and columns until no cell value can be recovered via arithmetic. The cascade follows this order:
Primary Cell (PC) — Any cell whose value falls below the defined threshold. These are identified and suppressed first.
Column-wise Secondary Cell (CSC) — For each PC, a cell in the same column is suppressed so that column arithmetic (e.g.
col_total − other_cells = PC) cannot recover the PC.Row-wise Secondary Cell (RSC) — Suppressing a PC and CSC creates a new vulnerability: row arithmetic (e.g.
row_total − other_cells = CSC) may now recover the masked CSC. A cell in the same row is therefore suppressed to break this path.
The newly suppressed RSC is itself now a masked value that may be recoverable via column arithmetic, requiring a further CSC. Steps 3–4 repeat — alternating between row-wise and column-wise checks — until no remaining combination of visible values and suppressed upper bounds can recover any masked cell. At convergence, the table is safe to release.
| Totals | Not Hispanic | Hispanic | Other | |
|---|---|---|---|---|
| Totals | 1,678 | 1,377 | RSC | PC |
| Male | 931 | RSC | PC | 0 |
| Female | CSC | CSC | 283 | PC |
| Other | PC | PC | PC | 0 |
Installation
Install countmaskr from CRAN with:
install.packages("countmaskr")You can also install it using pak:
pak::pkg_install("countmaskr")Development version
Install the development version from GitHub with:
devtools::install_github("Query-Fulfillment/countmaskr")Or use pak:
pak::pkg_install("Query-Fulfillment/countmaskr")