PII detection using Private AI
Section 1: Use Case Identifiers
Use Case ID: HHS-CDC-00055Agency: HHS
Op Div/Staff Div: CDC
Use Case Topic Area: Mission-Enabling (internal agency support)
Is the AI use case found in the below list of general commercial AI products and services?
None of the above.
What is the intended purpose and expected benefits of the AI?
Numerous data sources contain personally identifiable information (PII) and protected health information (PHI), which cannot be publicly published or shared among federal partners due to confidentiality concerns. This includes sensitive records like death certificates, survey responses, and electronic health records. To enable broader utilization of this data and enhance insights, a PII detection system was explored in this project. Specifically, the availability of an off-the-shelf tool called Private AI was investigated for identifying PII. However, it is important to note that the license for this tool expired after one year and there are no current plans to renew the contract.
At present, the presence of PII within data and the labor intensive process to ensure PII redaction and removal has prevented broader sharing of data available within CDC. However, implementing an automated system which performs as well as humans and more quickly could significantly reduce the time required for human review, enabling faster dissemination of crucial public health information and data sources.
Describe the AI system's outputs.
CDC's National Center for Health Statistics (NCHS) has been assessing the NLP solution provided by Private AI, which is specifically designed to detect, mask, and substitute personally identifiable information (PII) within textual data. This collection of models aims to securely identify and eliminate PII from unstructured text datasets across various platforms within the CDC network.
Stage of Development: Retired
Is the AI use case rights-impacting, safety-impacting, both, or neither?
Neither
Section 2: Use Case Summary
Date Initiated: N/ADate when Acquisition and/or Development began: N/A
Date Implemented: N/A
Date Retired: 03/2024
Was the AI system involved in this use case developed (or is it to be developed) under contract(s) or in-house?
N/A
Provide the Procurement Instrument Identifier(s) (PIID) of the contract(s) used.
N/A
Is this AI use case supporting a High-Impact Service Provider (HISP) public-facing service?
N/A
Does this AI use case disseminate information to the public?
N/A
How is the agency ensuring compliance with Information Quality Act guidelines, if applicable?
N/A
Does this AI use case involve personally identifiable information (PII) that is maintained by the agency?
N/A
Has the Senior Agency Official for Privacy (SAOP) assessed the privacy risks associated with this AI use case?
ongoing
Section 3: Data and Code
Do you have access to an enterprise data catalog or agency-wide data repository that enables you to identify whether or not the necessary datasets exist and are ready to develop your use case?N/A
Describe any agency-owned data used to train, fine-tune, and/or evaluate performance of the model(s) used in this use case.
N/A
Is there available documentation for the model training and evaluation data that demonstrates the degree to which it is appropriate to be used in analysis or for making predictions?
N/A
Which, if any, demographic variables does the AI use case explicitly use as model features?
N/A
Does this project include custom-developed code?
N/A
Does the agency have access to the code associated with the AI use case?
N/A
If the code is open-source, provide the link for the publicly available source code.
N/A
Section 4: AI Enablement and Infrastructure
Does this AI use case have an associated Authority to Operate (ATO) for an AI system?N/A
System Name: N/A
How long have you waited for the necessary developer tools to implement the AI use case?
N/A
For this AI use case, is the required IT infrastructure provisioned via a centralized intake form or process inside the agency?
N/A
Do you have a process in place to request access to computing resources for model training and development of the AI involved in this use case?
N/A
Has communication regarding the provisioning of your requested resources been timely?
N/A
How are existing data science tools, libraries, data products, and internally-developed AI infrastructure being re-used for the current AI use case?
N/A
Has information regarding the AI use case, including performance metrics and intended use of the model, been made available for review and feedback within the agency?
N/A