Improving Metadata Retrieval and Transformation for Metadata Management
Section 1: Use Case Identifiers
Use Case ID: HHS-NIH-00039Agency: HHS
Op Div/Staff Div: NIH
Use Case Topic Area: Government Services (includes Benefits and Service Delivery)
Is the AI use case found in the below list of general commercial AI products and services?
None of the above.
What is the intended purpose and expected benefits of the AI?
The Dataset Catalog (catalog) is a catalog of biomedical datasets from various repositories for users to search, discover, retrieve, and connect with datasets to accelerate scientific research. Metadata from included repositories and their datasets are needed to appropriately index them in the catalog. This manual process can be labor intense, and thus NLM is pilot testing the development of a working interface for internal users to retrieve the metadata information from external repositories' websites and then transform the data to a specific format used in internal metadata management for the Dataset Catalog. This is currently being developed behind NIH's firewall for a select internal user base. The artificial intelligence (AI) will automate the retrieval and transformation of metadata but will not replace human oversight in critical metadata curation tasks. Additional safeguards would make sense in a production environment. The AI uses information from publicly available biomedical data repository websites and internal metadata management systems. The AI uses metadata from biomedical repositories and will be deployed within data management and IT departments, particularly those involved in cataloging and metadata management.
Primary Objectives: Streamline metadata retrieval and transformation to improve efficiency and accuracy.
Anticipated Positive Outcomes: Resource savings, enhanced data quality, and improved management of the Data Set Catalog.
Describe the AI system's outputs.
Inputs: Metadata from various biomedical repositories, internal schemas.
Outputs: Transformed metadata, Python scripts, user-friendly tools for metadata retrieval.
Frequency: Continuous retrieval and transformation as new metadata is ingested.
Stage of Development: Acquisition and/or Development
Is the AI use case rights-impacting, safety-impacting, both, or neither?
Neither
Section 2: Use Case Summary
Date Initiated: 11/2023Date when Acquisition and/or Development began: 05/2023
Date Implemented: N/A
Date Retired: N/A
Was the AI system involved in this use case developed (or is it to be developed) under contract(s) or in-house?
Developed with contracting resources.
Provide the Procurement Instrument Identifier(s) (PIID) of the contract(s) used.
75N97023A00004/75N97023F00007
Is this AI use case supporting a High-Impact Service Provider (HISP) public-facing service?
N/A
Does this AI use case disseminate information to the public?
N/A
How is the agency ensuring compliance with Information Quality Act guidelines, if applicable?
N/A
Does this AI use case involve personally identifiable information (PII) that is maintained by the agency?
N/A
Has the Senior Agency Official for Privacy (SAOP) assessed the privacy risks associated with this AI use case?
ongoing
Section 3: Data and Code
Do you have access to an enterprise data catalog or agency-wide data repository that enables you to identify whether or not the necessary datasets exist and are ready to develop your use case?Yes
Describe any agency-owned data used to train, fine-tune, and/or evaluate performance of the model(s) used in this use case.
Website data (HTML, CSS, JavaScript), accessibility standards. Measures taken to ensure data accuracy and reliability: grounding, testing, evaluation, NIST framework.
Is there available documentation for the model training and evaluation data that demonstrates the degree to which it is appropriate to be used in analysis or for making predictions?
Documentation is complete
Which, if any, demographic variables does the AI use case explicitly use as model features?
N/A
Does this project include custom-developed code?
N/A
Does the agency have access to the code associated with the AI use case?
N/A
If the code is open-source, provide the link for the publicly available source code.
N/A
Section 4: AI Enablement and Infrastructure
Does this AI use case have an associated Authority to Operate (ATO) for an AI system?No
System Name: N/A
How long have you waited for the necessary developer tools to implement the AI use case?
Less than 6 months
For this AI use case, is the required IT infrastructure provisioned via a centralized intake form or process inside the agency?
Yes
Do you have a process in place to request access to computing resources for model training and development of the AI involved in this use case?
Yes
Has communication regarding the provisioning of your requested resources been timely?
Yes
How are existing data science tools, libraries, data products, and internally-developed AI infrastructure being re-used for the current AI use case?
Re-use production level code from a different use-case
Has information regarding the AI use case, including performance metrics and intended use of the model, been made available for review and feedback within the agency?
Limited documentation for review