Disclosure Avoidance

Case Study

Problem:

Supporting Infrastructure was needed for building out a system to produce privatized information for public disclosure from the 2020 Census.


Solution:

Develop a system that can produce private Census data for public disclosure.

    • Build tool to convert 2010 data into the 2020 Census format and produce a test dataset that can be used for rapid experimentation.
    • Built and managed a Jenkins system that integrates with all Disclosure Avoidance Census code to ensure smoother production and code iteration.
    • Leveraged AWS Cloud Infrastructure to create versioned testing tools that allowed for reproducibility of tests on any current or historical version of the DAS system.
    • Brought scientific research code to production ready standards for improved performance, legibility, and flexibility.

Outcome:

    • Census has been able to publish multiple privatized datasets to the public using our tool.
    • Saved data scientists on average 300 hours of labor per 2000 runs by automating the process dedicated to running privacy tests on data.