From 8c1acffdc236e13f8178362bcaac7bfb44457060 Mon Sep 17 00:00:00 2001
From: Jeremy Zilar
- Overview
-
+## Overview
Data.gov is primarily a federal open government data site. However, state, local, and tribal governments can also publish metadata describing their open data resources on Data.gov for greater discoverability. Data.gov does not host data directly (with a few exceptions), but rather aggregates metadata about open data resources in one centralized location. Once an open data source meets the necessary format and metadata requirements, the Data.gov team can harvest the metadata directly, synchronizing that source’s metadata on Data.gov as often as every 24 hours.
**Background**
-From 2009-2013, agency updates to the Data.gov catalog were not automated. Federal agencies submitted metadata for individual datasets to Data.gov through a central Dataset Management System (DMS). At present, pursuant to the Federal [Open Data Policy](https://project-open-data.cio.gov/policy-memo/) discussed in more detail below, all metadata is added to Data.gov through the federated “harvest” model.
- Dataset Updates -
- -- Additions, updates, and deletions occur through a Harvest Source rather than within Data.gov directly. Data.gov synchronizes those changes through a daily Harvest Job.
- -- Federated Metadata Harvest Architecture -
- - - -- Getting your data source ready for harvesting by the Data.gov catalog depends on your data source type: -
- -- The steps for all three types of data sources are described in detail below. -
- -- Project Open Data is the name of the implementation guidance and associated resources for the Federal Open Data Policy, OMB M-13-13. This policy requires all Federal CFO-Act agencies to publish a Public Data Listing, provided as a data.json file, using the standard Project Open Data metadata schema. Non-CFO-Act agencies are not covered by this policy, but the process for including their data on Data.gov is the same.
- -- Project Open Data requires agencies to list and describe all agency data in the Public Data Listing. If a dataset is not public or restricted access, the metadata for that dataset is still included in the Public Data Listing, but any portion of the metadata that cannot be made public is redacted. The full, unredacted version of the metadata is provided in the Enterprise Data Inventory which is submitted to OMB and not made public. -
- -- Agencies must provide a human readable Public Data Listing at agency.gov/data and a machine readable listing, as a standalone JSON file on the agency’s website at agency.gov/data.json. This data.json file is what gets harvested to the Data.gov catalog.
- -- Federal agencies that do not have a platform to inventory their metadata can make use of a free service hosted by Data.gov called inventory.data.gov (see the }}">separate guide). Contact the Data.gov team via email if you’re interested in using this service.
- -- You can find more information about what is required by the policy in the Data Catalog Requirements for Project Open Data, but the requirements relevant to Data.gov are outlined here:
- -- An update to the Project Open Data Metadata Schema, Version 1.1 was released on November 6, 2014. -
- -- The Data.gov catalog supports version 1.1 as of December 2, 2014. -
- -- When an agency is ready for Data.gov to harvest their data.json for the first time, they should notify Data.gov via email and the Data.gov team will create a new Data.gov harvest source for their data.json. The Data.gov team is available to assist agencies in generating the Public Data Listing data.json file and provide tools that may help agencies prepare their data listings.
- -
- Federal data only
There should be one single harvest source per agency. If a federal agency aggregates data from non-federal sources, they must ensure the agency’s data.json includes data produced by the agency only. Data.gov harvests all metadata directly from publishers including many non-federal sources and works to prevent dataset duplication through intermediaries. It is also important to remember that OMB assesses an agency’s data.json file under the assumption it is comprised of data exclusively from that agency.
- Transitioning to data.json
When an agency transitions to data.json harvesting for the first time, any existing data on the Data.gov catalog is archived. Data.gov will also provide an export of the existing metadata on Data.gov and note whether any of these datasets were associated with a Data.gov Topic.
-
- Replacing datasets
When replacing any dataset in your data.json file it is important to maintain same title and identifier associated with the dataset to ensure consistent discoverability of that dataset going forward. When replacing datasets in your data.json harvest source, using the same identifier will ensure that the URL for the dataset on Data.gov stays the same keeping cited links working and reinforcing the open data principle of permanence. It should be noted, however, that when replacing datasets on Data.gov with a brand new harvest source (see Transitioning to data.json above) that using the same identifier or title may not retain the same URL.
- Recovering deleted datasets
After a dataset has been deleted from Data.gov (i.e., the agency posted an updated data.json file that does not include a particular dataset), there is a grace period of up to 24 hours where it can be easily restored by working with the Data.gov team.
-
- Error log reports
Every time the data.json is harvested, an error log is generated that identifies any issues that occurred during the harvest process. If requested, an agency point of contact can receive a daily harvest report with this error log via email.
-
- Background -
- -- Several federal agencies maintain and manage geospatial data and geographic information systems (GIS). The documentation of geospatial data is subject to authorities pre-dating the Open Data Policy. are required to develop metadata as outlined in Executive Order 12906 and OMB Circular A-16, revised (2002) to support the National Spatial Data Infrastructure (NSDI). The Federal Geographic Data Committee (FGDC) is the interagency group responsible for facilitating these federal activities and collaboration with non-federal organizations on geospatial data efforts. The FGDC has endorsed several geospatial metadata standards, as directed by OMB Circular A-119,including the Content Standard for Digital Geospatial Metadata (CSDGM), ISO 19115:2003 Geographic Information – Metadata and several related ISO geospatial standards. Since ISO 19115 and the associated standards are voluntary consensus standards (vs. federally-authored) and endorsed by the FGDC, federal agencies are encouraged to transition to ISO metadata as their agencies are able to do so. While the selection of appropriate standards is dependent on the nature of your metadata collection and publication process, ISO metadata should be considered an option now. For more information, see the FGDC website.
- -- In the past, geospatial metadata records were maintained and discoverable via separate catalogs and tools, including Geospatial One Stop. In 2013, these sources were merged so that Data.gov (specifically catalog.data.gov) indexes both geospatial and non-geospatial metadata in one place. -
- -- Metadata for geospatial datasets in catalog.data.gov is also made available in GeoPlatform.gov. GeoPlatform.gov provides access and management of geospatial resources through common geospatial data, services, and applications contributed and administered by trusted sources and hosted on shared infrastructure for use by federal agencies, agency partners, and the public. Geospatial metadata is made available to GeoPlatform.gov from the metadata harvested by catalog.data.gov and is displayed on GeoPlatform.gov via an application programming interface (API) on catalog.data.gov. In other words, the datasets discoverable on GeoPlatform.gov are from the geospatial metadata collected by the catalog using the following API call:
- -- http://catalog.data.gov/api/3/action/package_search?q=metadata_type:geospatial -
- -- The majority of open government datasets have some relationship to spatial data (e.g. jurisdiction, address). For the purposes of this document and learning how data gets published in Data.gov, “geospatial data” here specifically refers to spatial data that has historically been included as part of the Federal Geographic Data Committee and GeoPlatform.gov and utilizes robust geospatial metadata standards such as the the suite of ISO standards or the FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM).These geospatial metadata standards are needed to properly display data and utilize the spatial functionality on GeoPlatform.gov.
- -- Getting geospatial metadata into Data.gov -
- -- Federal agencies that manage geospatial data should make their geospatial metadata holdings available to Data.gov using a consolidated geospatial harvest source, preferably one single CSW endpoint for the entire agency. For example, all offices and bureaus within the Department of Interior would make their metadata available through one consolidated CSW covering all of the Department of the Interior. (Non-geospatial metadata should be provided separately. See section 3 below.) -
- -- While a CSW endpoint and traditional geospatial metadata standards are needed for GeoPlatform.gov and Data.gov to consume the data, the Project Open Data (M-13-13) policy still requires metadata for the agency’s geospatial datasets to be provided within the Enterprise Data Inventory data.json file submitted to OMB with the Project Open Data metadata. -
- -- In order to facilitate these requirements, the FGDC and Data.gov have developed a mapping of elements between the Project Open Data metadata schema v1.1 and the geospatial metadata standards including FGDC CSDGM, ISO 19115:2003, and ISO 19115-1:2014. This crosswalk enables federal agencies with geospatial data to more efficiently meet both metadata requirements. -
- -- For agencies that provide geospatial data to GeoPlatform.gov and Data.gov, the following harvest sources must be provided: -
- -- Datasets Displayed on GeoPlatform.gov -
- -- All datasets included in the CSW will be displayed on GeoPlatform.gov. Datasets included in data-nonspatial-harvest.json will only be displayed on Data.gov, but not GeoPlatform.gov unless the datasets are specially tagged for inclusion there. -
- -- If an agency has a geospatial dataset in the data-nonspatial-harvest.json that should be part of GeoPlatform.gov, but is not included in the CSW harvest source, or if an agency has geospatial holdings and is only able to provide a data.json file and not the CSW, it should denote the geospatial dataset using “geospatial” as a value within the “theme” field. For example: “theme”: [“geospatial”]
- -- Data.gov incorporates data sources from state, local, and tribal governments. Non-federal sources are not covered by the Federal Open Data Policy, but can be included in the Data.gov catalog voluntarily. Depending on your local government open data platform, you may already have a harvest source that is Data.gov-ready, or it could take a little more work. Either way, the Data.gov team is available to answer questions about these requirements. For non-federal data to be connected to Data.gov, the following items are required: -
- -- Once you have coordinated with Data.gov on these two items, automated nightly updates to Data.gov can be set up very quickly. Non-federal organizations can provide the necessary information through the form. -
- -- Contact the Data.gov team -
- -- Contact the Data.gov team via email to let them know you’d like to get started. Please include a link to your metadata in the data.json format (see Step 1: Organize your open data for the Data.gov Pipeline) or let us know if you have questions about how to create a data.json file from your current database along with any relevant links.
- -- Connecting the pipes -
- -- The Data.gov team will create a new Harvest Source that will automatically collect information about your datasets and update Data.gov whenever changes are made on your data catalog. Depending on your platform, creating this harvester might just be the push of a button or it could take a little more work, but the team will walk you through it either way.
- -- Creating Harvest Sources -
- -- For federal agencies with only a data.json and for non-federal entities without geospatial harvest sources, contacting the Data.gov team to create the new harvest source is recommended. -
- -- If you are geospatial data publisher and there is a need for you to directly create a harvest source instead of a consolidated CSW endpoint as indicated above, please follow the steps outlined in the section below: -
- -- Creating a Metadata Publishing Account and Harvesting your Metadata to catalog.data.gov -
- -- In order to create a harvest source, you will need to have a login account through the OMB MAX authentication service. -
- -- Go to catalog.data.gov and click on “Login” at the bottom of the page. You will be sent to OMB MAX. Log in to OMB MAX using your OMB MAX credentials. Two-factor authentication (2FA) is now required for Data.gov access, so enable 2FA through the instructions on the OMB MAX log in page. You can add a device by clicking on “Manage SMS 2-Factor Devices” under your profile settings. This is not necessary if you use your PIV/CAC card to log in to MAX; it is already considered 2FA.
- -- After you have logged in to OMB MAX, email us to let us know that you have completed the initial login. In the email provide the organization name and access permission (Admin or Editor) that is required. We will then associate requested permissions to your account.
- -- (If you manage a non-federal geospatial harvest source, follow the instructions (2.5 MB PDF) from GeoPlatform.gov to obtain access to OMB MAX.) -
- -- Once your Data.gov account is created, and your permissions are in place, you can log in and follow the steps below to create your harvest source: -
- -- Once your harvest source is created, the Harvest Source page will display. To test your new harvest source, click on the “Admin” button, then click on the “Reharvest” button when it displays. The harvesting process should take few minutes to hours based on the number of datasets on the source. Once the harvesting is complete, a job report will be provided that will display any errors that have occurred, if any, during the harvesting process. -
- -- Testing -
- -- The Data.gov team will test to ensure the harvester works properly. If anything seems wrong, the team will help you configure your data catalog so that Data.gov can collect your datasets without any errors. -
- -- For harvest sources that are directly setup by data publishers, the job report will display any errors that have occurred for review and resolution at the source, if you have any questions on the errors, please email the Data.gov team. -
- -- Live within 24 hours! -
- -- Once the harvester has been tested successfully, Data.gov will start automatically consuming information about your datasets and all the basic details of your datasets will be available on Data.gov with links to the source and your open data policy. -
- -- What is “open data” for purposes of Data.gov federation? -
- -- Open data for the purposes of Data.gov federation is US government data that is public, accessible, described, reusable, complete, timely, and managed post-release. Read more about these principles on Project Open Data.
- -- What entities are eligible for federation on Data.gov? -
- -- Non-federal governments (state, local, tribal) can federate to the Data.gov catalog by meeting the requirements discussed in the Non-federal format section. The Data.gov catalog does contain metadata for some datasets produced by non-government sources (some international organizations, academic institutions) under the authorities of the FGDC or when a federal agency has included the non-government data as its own dataset for Information Quality Act purposes. -
- -- What steps are required to have my government entity federated onto Data.gov? -
- -- To have your government’s open data federated onto Data.gov, you must: 1. Organize your open data for the Data.gov Pipeline by complying with format and standardized metadata requirements, and 2. Coordinate with Data.gov to establish a harvest point for automatic daily updates. -
- -- Does my government entity need to have an existing open data catalog to be added to Data.gov? -
- -- Data.gov is agnostic about whether you have to have an open data catalog of any type to be included on to Data.gov. You simply need to have the metadata of your open data sets organized in a centralized /data.json file in accordance with the required specifications. -
- -- If our site is a CKAN instance, do we still need data.json and/or data-nonspatial-harvest.json? -
- -- Yes — you need both. -
- -- What if you have a .Net Web service already? Does that work? -
- -- For the open data policy and harvesting on to Data.gov, it needs to be a JSON formatted file. -
- -- Are there checks in place/planned to verify that the referenced data sources/services are indeed available over time? -
- -- The Project Open Data dashboard at labs.data.gov/dashboard provides more detailed information on the agency datasets on Data.gov. For instance, if a dataset is no longer maintained and becomes a broken link, that will be noted on the dashboard. As a result, required updates will be more easily identified and addressed. -
- -- I saw two admirable example data.json files: http://www.gsa.gov/data.json and http://www.nist.gov/data.json. Are there more to survey? -
- -- Check out the Project Open Data Dashboard or your favorite agency’s website /data.json -
- -- What is Open Data listserv? -
- -- The Open Data listserv is a listserv with over 800 .gov email addresses of people working on open data issues in the federal government. It is a simple way to reach the broad audience of people in federal government working on open data. Anyone with a .gov email can email the listserv with “subscribe open-data” in the body of the message. -
- -- How do you find your Data.gov federal agency point of contact? -
- -- Find your agency on the Open Data Points of Contact list. If your agency isn’t listed, contact Data.gov and ask. -
- -- What does CSW stand for? (re: CSW endpoint) -
- -- CSW stands for Catalog Service for the Web — You can read more at http://geoplatform.gov/csw-resources -
- -- Is a CSW endpoint required if you have geospatial data, or is the data.json sufficient? And should data-nonspatial-harvest.json just contain a subset of data.json? -
- -- Good question! 1. You need both. 2. Yes, correct. -
- -- So geospatial data developers will have to produce two separate metadata records for the life of the program? -
- -- Geospatial data developers should manage their metadata as ISO. Using an XSLT and other tools the ISO metadata can be served in multiple formats including HTML and the Project Open Data JSON. There is no need to manage multiple copies of the metadata simply to publish in multiple formats. -
- -- Can geospatial data also be harvested through a WAF? -
- -- Yes, but using the new harvesting model, the preferred harvest source is a consolidated agency-wide CSW. -
- -- For geospatial data, can you delete a harvest source? -
- -- Yes, but agencies are expected to provide a consolidated harvest source for their entire agency. Individual upstream harvest sources should be managed within each agency. -
- -- What is the native metadata format in data.gov? Is it ISO? for Geo and non-geo? -
- -- Geospatial metadata on Data.gov is converted to ISO if it’s not already provided as ISO. If the metadata is provided to Data.gov in CSDGM format, the source CSDGM metadata record will be available to users, along with the converted ISO metadata record. The common metadata format across Data.gov is a data model specific to CKAN, but all metadata is also available as DCAT RDF and Schema.org microdata. Data.gov is looking into providing other common metadata schemas across the site including the current Project Open Data schema which is a profile of DCAT using JSON-LD. -
- -- Seems an easy approach for the system to check for the geographic extent metadata element to identify geospatial data. Can this be expedited? Better for one solution that works for all then the temp theme keyword solution that requires work from all. -
- -- This has been suggested in the past, but it was decided that simply providing geospatial extent would not serve as an accurate indicator for whether something should be included as part of geoplatform.gov. You can see the discussion at https://github.com/project-open-data/project-open-data.github.io/issues/303
- -- Do I understand correctly that you are going to automate CSDGM/ISO to JSON transform and add the auto-detect for geographic information? If so, why not wait for these solutions versus having all metadata producers to dedicate such effort to a temporary fixes (create two records and add ‘geospatial’ theme keyword)? -
- -- Data.gov does not and will not be providing any transformations from CSDGM/ISO to Project Open Data JSON nor will it be relying on those transformations for data.gov’s purposes. However, as part of the OMB policy, all metadata will need to be provided as JSON including metadata that is currently provided as CSDGM/ISO. Data.gov does help maintain the CKAN extension for generating Project Open Data JSON (ckanext-datajson) and this can be used to provide automated conversion from CSDGM/ISO metadata. We are working to coordinate with geospatial agencies interested in using this extension. -
- -- Can you tell us the source of the CSDGM validation schema that is currently being used to validate records being harvested by data.gov? We are having some problems understanding why certain records are ‘failing’ validation, and the data.gov harvest report error messages are very difficult to decipher in many cases, even by experienced metadata creators. -
- -- The metadata is not validated as CSDGM. It is converted to ISO 19115 then validated against the ISO 19115 schema. CSDGM metadata is transformed to ISO using the following XSLT: https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/fgdc2iso/fgdcrse2iso19115-2.xslt
- -- What if I need help getting my geospatial metadata into Data.gov and GeoPlatform.gov? -
- -- The FGDC has assistance available if you email USGS.gov. They will also coordinate with the Data.gov team to help address your issue if needed. If you have specific metadata questions, contact FGDC via email. -
- -- What are the advantages of federating my local government’s open data to Data.gov? -
- -- Federating your local open government data on Data.gov makes it more discoverable by Data.gov visitors as well as folks using search engines powered by schema.org. It also makes it easier for people to discover similar datasets across different levels of government, e.g., a citizen can find data about her location from city, state, and federal sources in one place. -
- -- If non-federal data is being shared through Data.gov, is there someone that can help validate that the posted data policy/terms of use are in compliance? -
- -- We make sure that non-federal sources have terms of use/data policies so that we can make it clear to end users that those datasets are governed by the terms of use/data policies of the non-federal entity (state, city, county, etc.) and not the federal policies.
- -- Can you address data repositories that are federally funded but not hosted at the agency itself? Can they be harvested directly by data.gov? -
- -- Yes. -
- -- If non-federal metadata is shared with the NSGIC GIS Inventory, is it then harvested by data.gov? -
- -- Yes, the NSGIC GIS Inventory (GISI)is a Data.gov registered harvest sources. -
\ No newline at end of file +From 2009-2013, agency updates to the Data.gov catalog were not automated. Federal agencies submitted metadata for individual datasets to Data.gov through a central Dataset Management System (DMS). At present, pursuant to the Federal [Open Data Policy](https://project-open-data.cio.gov/policy-memo/) discussed in more detail below, all metadata is added to Data.gov through the federated “harvest” model. + +**Dataset Updates** +Additions, updates, and deletions occur through a _Harvest Source_ rather than within Data.gov directly. Data.gov synchronizes those changes through a daily _Harvest Job_. + +**Federated Metadata Harvest Architecture** + +{{< legacy-img src="/2014/12/harvest-diagram-departmentwide-gis-separate.png" alt="federated harvest architecture diagram" >}} + +## Step 1. Organize your open data for the Data.gov Pipeline + +Getting your data source ready for harvesting by the Data.gov catalog depends on your data source type: + +1. **Federal Data with Project Open Data (non-geospatial):** The most common source is the Public Data Listing as required by the Federal Open Data Policy. +2. **Federal Geospatial Data:** Federal maps, images, GIS products, and other location-based data resources. +3. **Non-federal Data:** Non-federal government sources are not covered by the Federal Open Data Policy, but can be included in the Data.gov catalog voluntarily. + +The steps for all three types of data sources are described in detail below. + +### Federal Data with Project Open Data + +[Project Open Data](https://project-open-data.cio.gov/) is the name of the implementation guidance and associated resources for the [Federal Open Data Policy, OMB M-13-13](https://project-open-data.cio.gov/policy-memo/). This policy requires all [Federal CFO-Act agencies](http://www.gpo.gov/fdsys/pkg/USCODE-2011-title31/html/USCODE-2011-title31-subtitleI-chap9-sec901.htm) to publish a [Public Data Listing](https://project-open-data.cio.gov/catalog/), provided as a [data.json file](https://project-open-data.cio.gov/catalog/#machine-readable-format), using the standard [Project Open Data metadata schema](https://project-open-data.cio.gov/v1.1/schema/). Non-CFO-Act agencies are not covered by this policy, but the process for including their data on Data.gov is the same. + +Project Open Data requires agencies to list and describe all agency data in the Public Data Listing. If a dataset is not public or restricted access, the metadata for that dataset is still included in the Public Data Listing, but any portion of the metadata that cannot be made public is redacted. The full, unredacted version of the metadata is provided in the Enterprise Data Inventory which is submitted to OMB and not made public. + +Agencies must provide a human readable Public Data Listing at _agency.gov/_data and a machine readable listing, as a standalone JSON file on the agency’s website at _agency.gov_/data.json. This data.json file is what gets harvested to the Data.gov catalog. + +Federal agencies that do not have a platform to inventory their metadata can make use of a free service hosted by Data.gov called inventory.data.gov (see the [}}">separate guide)]({{< link ). Contact the Data.gov team via email if you’re interested in using this service. + +You can find more information about what is required by the policy in the [Data Catalog Requirements](https://project-open-data.cio.gov/catalog/) for Project Open Data, but the requirements relevant to Data.gov are outlined here: + +1. **Open Data Policy Requirements** +2. All CFO-Act agencies must provide an Enterprise Data Inventory and Public Data Listing in accordance with the [Project Open Data metadata schema](https://project-open-data.cio.gov/v1.1/schema/) for the purposes of measuring compliance with OMB’s M-13-13 +3. **Required:** Enterprise Data Inventory provided to OMB MAX +4. **Required:** Public Data Listing at: _https://________.gov/data.json_ + +**_An update to the Project Open Data Metadata Schema,_** [**_Version 1.1_**](https://project-open-data.cio.gov/v1.1/schema/) **_was released on November 6, 2014._** + +The Data.gov catalog supports version 1.1 as of December 2, 2014. + +When an agency is ready for Data.gov to harvest their data.json for the first time, they should notify Data.gov via [email](mailto:datagovhelp@gsa.gov) and the Data.gov team will create a new Data.gov harvest source for their data.json. The Data.gov team is available to assist agencies in generating the Public Data Listing data.json file and provide tools that may help agencies prepare their data listings. + +**Federal data only** +There should be one single harvest source per agency. If a federal agency aggregates data from non-federal sources, they must ensure the agency’s data.json includes data produced by the agency only. Data.gov harvests all metadata directly from publishers including many non-federal sources and works to prevent dataset duplication through intermediaries. It is also important to remember that OMB assesses an agency’s data.json file under the assumption it is comprised of data exclusively from that agency. + +**Transitioning to data.json** +When an agency transitions to data.json harvesting for the first time, any existing data on the Data.gov catalog is archived. Data.gov will also provide an export of the existing metadata on Data.gov and note whether any of these datasets were associated with a Data.gov Topic. + +**Replacing datasets** +When replacing any dataset in your data.json file it is important to maintain same title and identifier associated with the dataset to ensure consistent discoverability of that dataset going forward. When replacing datasets in your data.json harvest source, using the same identifier will ensure that the URL for the dataset on Data.gov stays the same keeping cited links working and reinforcing the open data principle of [permanence](https://opengovdata.io/2014/permanence-trust-provenance/). It should be noted, however, that when replacing datasets on Data.gov with a brand new harvest source (see _Transitioning to data.json_ above) that using the same identifier or title may not retain the same URL. + +**Recovering deleted datasets** +After a dataset has been deleted from Data.gov (i.e., the agency posted an updated data.json file that does not include a particular dataset), there is a grace period of up to 24 hours where it can be easily restored by working with the Data.gov team. + +**Error log reports** +Every time the data.json is harvested, an error log is generated that identifies any issues that occurred during the harvest process. If requested, an agency point of contact can receive a daily harvest report with this error log via email. + +### Federal Geospatial Data + +**Background** + +Several federal agencies maintain and manage geospatial data and geographic information systems (GIS). The documentation of geospatial data is subject to authorities pre-dating the Open Data Policy. are required to develop metadata as outlined in [Executive Order 12906](http://www.archives.gov/federal-register/executive-orders/pdf/12906.pdf) and [OMB Circular A-16, revised (2002)](https://obamawhitehouse.archives.gov/omb/circulars_a016_rev/) to support the National Spatial Data Infrastructure (NSDI). The Federal Geographic Data Committee (FGDC) is the interagency group responsible for facilitating these federal activities and collaboration with non-federal organizations on geospatial data efforts. The FGDC has endorsed several geospatial metadata standards, as directed by [OMB Circular A-119](https://obamawhitehouse.archives.gov/omb/circulars_a119/),including the Content Standard for Digital Geospatial Metadata ([CSDGM](http://www.fgdc.gov/metadata/csdgm)), [ISO 19115:2003 Geographic Information – Metadata](http://www.fgdc.gov/metadata/geospatial-metadata-standards) and several related ISO geospatial standards. Since ISO 19115 and the associated standards are voluntary consensus standards (vs. federally-authored) and endorsed by the FGDC, federal agencies are encouraged to transition to ISO metadata as their agencies are able to do so. While the selection of appropriate standards is dependent on the nature of your metadata collection and publication process, ISO metadata should be considered an option now. For more information, [see the FGDC website](https://www.fgdc.gov/metadata). + +In the past, geospatial metadata records were maintained and discoverable via separate catalogs and tools, including Geospatial One Stop. In 2013, these sources were merged so that Data.gov (specifically catalog.data.gov) indexes both geospatial and non-geospatial metadata in one place. + +Metadata for geospatial datasets in [catalog.data.gov](http://catalog.data.gov/dataset) is also made available in [GeoPlatform.gov](http://www.geoplatform.gov/). GeoPlatform.gov provides access and management of geospatial resources through common geospatial data, services, and applications contributed and administered by trusted sources and hosted on shared infrastructure for use by federal agencies, agency partners, and the public. Geospatial metadata is made available to GeoPlatform.gov from the metadata harvested by catalog.data.gov and is displayed on GeoPlatform.gov via an application programming interface (API) on catalog.data.gov. In other words, the datasets discoverable on GeoPlatform.gov are from the geospatial metadata collected by the catalog using the following API call: + +http://catalog.data.gov/api/3/action/package_search?q=metadata_type:geospatial + +The majority of open government datasets have some relationship to spatial data (e.g. jurisdiction, address). For the purposes of this document and learning how data gets published in Data.gov, “geospatial data” here specifically refers to spatial data that has historically been included as part of the Federal Geographic Data Committee and GeoPlatform.gov and utilizes robust geospatial metadata standards such as the the suite of ISO standards or the FGDC’s Content Standard for Digital Geospatial Metadata ([CSDGM](http://www.fgdc.gov/metadata/csdgm)).These geospatial metadata standards are needed to properly display data and utilize the spatial functionality on GeoPlatform.gov. + +**Getting geospatial metadata into Data.gov** + +Federal agencies that manage geospatial data should make their geospatial metadata holdings available to Data.gov using a consolidated geospatial harvest source, preferably one single CSW endpoint for the entire agency. For example, all offices and bureaus within the Department of Interior would make their metadata available through one consolidated CSW covering all of the Department of the Interior. (Non-geospatial metadata should be provided separately. See section 3 below.) + +While a CSW endpoint and traditional geospatial metadata standards are needed for GeoPlatform.gov and Data.gov to consume the data, the Project Open Data (M-13-13) policy still requires metadata for the agency’s geospatial datasets to be provided within the Enterprise Data Inventory data.json file submitted to OMB with the Project Open Data metadata. + +In order to facilitate these requirements, the FGDC and Data.gov have developed a [mapping of elements](https://project-open-data.cio.gov/v1.1/metadata-resources/#crosswalks-for-geospatial-metadata) between the Project Open Data metadata schema v1.1 and the geospatial metadata standards including FGDC CSDGM, ISO 19115:2003, and ISO 19115-1:2014. This crosswalk enables federal agencies with geospatial data to more efficiently meet both metadata requirements. + +For agencies that provide geospatial data to GeoPlatform.gov and Data.gov, the following harvest sources must be provided: + +1. **Open Data Policy Requirements** + All CFO-Act agencies must provide an Enterprise Data Inventory in accordance with the Project Open Data metadata schema (see _Federal Data with Project Open Data above_). This includes geospatial **_and_** non-spatial data. + **Required: Enterprise Data Inventory provided to OMB MAX** + +1. **Geospatial Harvest Source — Public Data Listing Requirements** _(for GeoPlatform.gov and Data.gov)_To be successfully harvested by GeoPlatform.gov and Data.gov, all geospatial data should be provided via one [Catalog Service for the Web (CSW)](https://www.geoplatform.gov/csw-resources) endpoint.Required: A CSW endpoint, e.g.: [_https://data.doi.gov/csw?service=CSW&version=2.0.2&request=GetCapabilities_](https://data.doi.gov/csw?service=CSW&version=2.0.2&request=GetCapabilities) + +1. **Data without a Geospatial Harvest Source — Public Data Listing Requirements** _(for GeoPlatform.gov and Data.gov)_ Lastly, to prevent duplication on Data.gov, all agencies that provide a CSW geospatial harvest source to GeoPlatform.gov and Data.gov should create an additional JSON file (called /_data-nonspatial-harvest.json) _to include all datasets that are **not** available via the consolidated Geospatial Harvest Source.**Required:** Datasets without a Geospatial Harvest Source for the Public Data Listing at: _https://________.gov/data-nonspatial-harvest.json_ + +**Datasets Displayed on GeoPlatform.gov** + +All datasets included in the CSW will be displayed on GeoPlatform.gov. Datasets included in data-nonspatial-harvest.json will only be displayed on Data.gov, but not GeoPlatform.gov unless the datasets are specially tagged for inclusion there. + +If an agency has a geospatial dataset in the data-nonspatial-harvest.json that should be part of GeoPlatform.gov, but is not included in the CSW harvest source, or if an agency has geospatial holdings and is only able to provide a data.json file and not the CSW, it should denote the geospatial dataset using “geospatial” as a value within the “theme” field. For example:_ “theme”: [“geospatial”]_ + +### Non-Federal Data + +Data.gov incorporates data sources from state, local, and tribal governments. Non-federal sources are not covered by the Federal Open Data Policy, but can be included in the Data.gov catalog voluntarily. Depending on your local government open data platform, you may already have a harvest source that is Data.gov-ready, or it could take a little more work. Either way, the Data.gov team is available to answer questions about these requirements. For non-federal data to be connected to Data.gov, the following items are required: + +1. **A Data Harvest Source** + Some open data catalog platforms already have a Data.gov harvest source built in (see these examples from [Socrata](https://nycopendata.socrata.com/data.json) and [ArcGIS Open Data](http://opendata.dc.gov/data.json)), but it is possible to set up a harvest source with any data management system (see [this CKAN example](https://www.opendataphilly.org/data.json)). The metadata required from non-federal sources does not include the [USG noted fields](https://project-open-data.cio.gov/v1.1/schema/#USG-note) and additional fields can be left out on a case-by-case basis. To learn more about metadata best practices and validators, check out the [}}>Resources and Tools]({{<) below.**Required:** A Harvest Source at: _https://________.gov/data.json, e.g. __https://data.brla.gov/data.json_ + +1. **A Terms of Use URL** + A publicly accessible Terms of Use (or Data Policy) URL or similar information in order to make it clear to Data.gov users when they are viewing datasets that are not covered by federal statutory and regulatory requirements. + **Required:** A Terms of Use URL, e.g._ https://data.brla.gov/terms-of-service_ + +Once you have coordinated with Data.gov on these two items, automated nightly updates to Data.gov can be set up very quickly. Non-federal organizations can provide the necessary information through [the form](http://www.data.gov/local/add). + +## Step 2. Coordinate with Data.gov + +**Contact the Data.gov team** + +Contact the Data.gov team via email to let them know you’d like to get started. Please include a link to your metadata in the data.json format (see _Step 1: Organize your open data for the Data.gov Pipeline_) or let us know if you have questions about how to create a data.json file from your current database along with any relevant links. + +**Connecting the pipes** + +The Data.gov team will create a new [Harvest Source](https://www.data.gov/developers/harvesting) that will automatically collect information about your datasets and update Data.gov whenever changes are made on your data catalog. Depending on your platform, creating this harvester might just be the push of a button or it could take a little more work, but the team will walk you through it either way. + +**Creating Harvest Sources** + +For federal agencies with only a data.json and for non-federal entities without geospatial harvest sources, contacting the Data.gov team to create the new harvest source is recommended. + +If you are geospatial data publisher and there is a need for you to directly create a harvest source instead of a consolidated CSW endpoint as indicated above, please follow the steps outlined in the section below: + +**Creating a Metadata Publishing Account and Harvesting your Metadata to catalog.data.gov** + +In order to create a harvest source, you will need to have a login account through the OMB MAX authentication service. + +Go to catalog.data.gov and click on “Login” at the bottom of the page. You will be sent to OMB MAX. [Log in to OMB MAX](https://catalog.data.gov/user/login) using your OMB MAX credentials. Two-factor authentication (2FA) is now required for Data.gov access, so enable 2FA through the instructions on the OMB MAX log in page. You can add a device by clicking on “Manage SMS 2-Factor Devices” under your profile settings. This is not necessary if you use your PIV/CAC card to log in to MAX; it is already considered 2FA. + +After you have logged in to OMB MAX, [email us](mailto:datagov@gsa.gov) to let us know that you have completed the initial login. In the email provide the organization name and access permission (Admin or Editor) that is required. We will then associate requested permissions to your account. + +_**(If you manage a non-federal geospatial harvest source, follow the [instructions](https://cms.geoplatform.gov/sites/default/files/document_library/Non-Federal_Geospatial_OMB_MAX_Registration_and_Catalog_Data_gov_Account.pdf) (2.5 MB PDF) from GeoPlatform.gov to obtain access to OMB MAX.)**_ + +Once your Data.gov account is created, and your permissions are in place, you can log in and follow the steps below to create your harvest source: + +1. Navigate to [http://catalog.data.gov](http://catalog.data.gov/) +2. Scroll to the bottom of the page and click the “Login” link +3. [Log in using OMB MAX](https://catalog.data.gov/user/login). +4. Navigate to [http://catalog.data.gov/harvest](http://catalog.data.gov/harvest) +5. Click the “Add Harvest Source” button +6. Enter information into the fields provided on the page. They include the following fields: + * CKAN + * Data.json + * CSW Server + * Web Accessible Folder (WAF) + * Single spatial metadata document + * Geoportal Server + * Web Accessible Folder (WAF) Homogeneous Collection + * Z39.50 + * ArcGIS Rest API + * Autodetect + * ISO 19115 Metadata (ISO 19139 XSD) + * FGDC Minimal Validation + * FGDC CSDGM Version 2.0, 1998 (FGDC-STD-001-1998) + * FGDC CSDGM Biological Data Profile (FGDC-STD-001.1-1999) + * FGDC CSDGM Metadata Profile for Shoreline Data (FGDC-STD-001.2-2001) + * FGDC Extensions for Remote Sensing (FGDC-STD-012-2002) + 1. URL – The harvest source URL that contains your data + 2. Title – The title for your harvest source. This title is the name that will be displayed as the access link for the harvest source on Catalog.data.gov + 3. Description – A summary of the data provided by your harvest source + 4. Source Type – The format of the data provided by your harvest source (Note: Some data formats will undergo validation. You will be able to select the validation schema for your data after selecting the Source Type.) Data formats that are supported include: + 5. Update Frequency – How often you want your data to be harvested + 6. Validation: For Geospatial data below validation formats are available, please choose the validation that is applicable for the metadata you are harvesting. + 7. Dataset Visibility – You can display the dataset to the public or have it remain only visible to your organization members + 8. Organization – Select your organization in this field to associate your harvest source to it. +7. Once you have entered all of your information, click the “Save” button to create your harvest source. + +Once your harvest source is created, the Harvest Source page will display. To test your new harvest source, click on the “Admin” button, then click on the “Reharvest” button when it displays. The harvesting process should take few minutes to hours based on the number of datasets on the source. Once the harvesting is complete, a job report will be provided that will display any errors that have occurred, if any, during the harvesting process. + +**Testing** + +The Data.gov team will test to ensure the harvester works properly. If anything seems wrong, the team will help you configure your data catalog so that Data.gov can collect your datasets without any errors. + +For harvest sources that are directly setup by data publishers, the job report will display any errors that have occurred for review and resolution at the source, if you have any questions on the errors, please [email the Data.gov team](mailto:datagovhelp@gsa.gov). + +**Live within 24 hours!** + +Once the harvester has been tested successfully, Data.gov will start automatically consuming information about your datasets and all the basic details of your datasets will be available on Data.gov with links to the source and your open data policy. + +## Resources & Tools + +### Resources + +* Continually updated Federal Open Data Policy guidance and technical standards, commonly referred to as [Project Open Data](http://project-open-data.github.io/) +* Federal Open Data Policy Memo, [OMB M-13-13](https://www.whitehouse.gov/sites/whitehouse.gov/files/omb/memoranda/2013/m-13-13.pdf) +* [FGDC geospatial metadata information](http://www.fgdc.gov/metadata) +* [GeoPlatform.gov initiative](http://www.fgdc.gov/initiatives/geospatial-platform) +* [Catalog.Data.gov CKAN Publisher Tutorial to facilitate GeoPlatform Geospatial Data Discovery](http://www.geoplatform.gov/sites/default/files/document_library/catalog_data_gov_publishing_tutorial.pdf) +* Metadata Recommendations Supporting Data Discovery and Use in Data.gov and GeoPlatform.gov (_add link when documented is updated at https://cms.geoplatform.gov/sites/default/files/document_library_) + +### Testing Tools + +* [Project Op](https://www.data.gov/labs/)[en Data Dashboard](http://labs.data.gov/dashboard/offices): A dashboard to track the progress of agencies implementing Project Open Data on a quarterly basis +* [Project Open Data data.json Validator](http://labs.data.gov/dashboard/validate): A validator for v1.0 and v1.1 of the Project Open Data Metadata Schema +* Project Open Data Converters: A [converter](http://labs.data.gov/dashboard/datagov/csv_to_json) to transform CSV files into JSON as defined by the Project Open Data Metadata Schema and an [export API](http://labs.data.gov/dashboard/export) to export from the CKAN API and transform the metadata into JSON as defined by the Project Open Data Metadata Schema +* [Project Open Data Changeset Preview](http://labs.data.gov/dashboard/changeset) — Agencies can use the Project Open Data Dashboard [changeset tool](http://labs.data.gov/dashboard/changeset) to compare their data.json file to data that currently exists on Data.gov to see what will be replaced, added, or removed. +* Agencies can test the process of harvesting their data.json by [requesting access to Data.gov’s test harvester](https://github.com/GSA/data.gov/wiki/Guide-to-creating-an-account-for-and-using-the-test-harvester). + +## Frequently Asked Questions + +* **[General](#faq-general)** +* **[Federal Data with Project Open Data](#faq-data)** +* **[Federal Geospatial](#faq-geospatial)** +* **[Non-federal](#faq-non-federal)** + +### General + +**What is “open data” for purposes of Data.gov federation?** + +Open data for the purposes of Data.gov federation is US government data that is_ public, accessible, described, reusable, complete, timely_, and_ managed post-release_. Read more about these principles on [Project Open Data](http://project-open-data.github.io/principles/). + +**What entities are eligible for federation on Data.gov?** + +Non-federal governments (state, local, tribal) can federate to the Data.gov catalog by meeting the requirements discussed in the Non-federal format section. The Data.gov catalog does contain metadata for some datasets produced by non-government sources (some international organizations, academic institutions) under the authorities of the FGDC or when a federal agency has included the non-government data as its own dataset for Information Quality Act purposes. + +**What steps are required to have my government entity federated onto Data.gov?** + +To have your government’s open data federated onto Data.gov, you must: 1. Organize your open data for the Data.gov Pipeline by complying with format and standardized metadata requirements, and 2. Coordinate with Data.gov to establish a harvest point for automatic daily updates. + +**Does my government entity need to have an existing open data catalog to be added to Data.gov?** + +Data.gov is agnostic about whether you have to have an open data catalog of any type to be included on to Data.gov. You simply need to have the metadata of your open data sets organized in a centralized /data.json file in accordance with the required specifications. + +**If our site is a CKAN instance, do we still need data.json and/or data-nonspatial-harvest.json?** + +Yes — you need both. + +**What if you have a .Net Web service already? Does that work?** + +For the open data policy and harvesting on to Data.gov, it needs to be a JSON formatted file. + +### Federal Data with Project Open Data + +**Are there checks in place/planned to verify that the referenced data sources/services are indeed available over time?** + +The Project Open Data dashboard at labs.data.gov/dashboard provides more detailed information on the agency datasets on Data.gov. For instance, if a dataset is no longer maintained and becomes a broken link, that will be noted on the dashboard. As a result, required updates will be more easily identified and addressed. + +**I saw two admirable example data.json files: http://www.gsa.gov/data.json and http://www.nist.gov/data.json. Are there more to survey?** + +Check out the [Project Open Data Dashboard](http://labs.data.gov/dashboard/offices) or your favorite agency’s website /data.json + +**What is Open Data listserv?** + +The Open Data listserv is a listserv with over 800 .gov email addresses of people working on open data issues in the federal government. It is a simple way to reach the broad audience of people in federal government working on open data. Anyone with a .gov email can [email the listserv](mailto:listserv@listserv.gsa.gov) with “subscribe open-data” in the body of the message. + +**How do you find your Data.gov federal agency point of contact?** + +Find your agency on the [Open Data Points of Contact list](https://project-open-data.cio.gov/points-of-contact/). If your agency isn’t listed, [contact Data.gov](http://data.gov/contact) and ask. + +### Federal Geospatial Data + +**What does CSW stand for? (re: CSW endpoint)** + +CSW stands for Catalog Service for the Web — You can read more at [http://geoplatform.gov/csw-resources](http://geoplatform.gov/csw-resources) + +**Is a CSW endpoint required if you have geospatial data, or is the data.json sufficient? And should data-nonspatial-harvest.json just contain a subset of data.json?** + +Good question! 1. You need both. 2. Yes, correct. + +**So geospatial data developers will have to produce two separate metadata records for the life of the program?** + +Geospatial data developers should manage their metadata as ISO. Using an XSLT and other tools the ISO metadata can be served in multiple formats including HTML and the Project Open Data JSON. There is no need to manage multiple copies of the metadata simply to publish in multiple formats. + +**Can geospatial data also be harvested through a WAF?** + +Yes, but using the new harvesting model, the preferred harvest source is a consolidated agency-wide CSW. + +**For geospatial data, can you delete a harvest source?** + +Yes, but agencies are expected to provide a consolidated harvest source for their entire agency. Individual upstream harvest sources should be managed within each agency. + +**What is the native metadata format in data.gov? Is it ISO? for Geo and non-geo?** + +Geospatial metadata on Data.gov is converted to ISO if it’s not already provided as ISO. If the metadata is provided to Data.gov in CSDGM format, the source CSDGM metadata record will be available to users, along with the converted ISO metadata record. The common metadata format across Data.gov is a data model specific to CKAN, but all metadata is also available as DCAT RDF and Schema.org microdata. Data.gov is looking into providing other common metadata schemas across the site including the current Project Open Data schema which is a profile of DCAT using JSON-LD. + +**Seems an easy approach for the system to check for the geographic extent metadata element to identify geospatial data. Can this be expedited? Better for one solution that works for all then the temp theme keyword solution that requires work from all.** + +This has been suggested in the past, but it was decided that simply providing geospatial extent would not serve as an accurate indicator for whether something should be included as part of geoplatform.gov. You can see the discussion at [https://github.com/project-open-data/project-open-data.github.io/issues/303](https://github.com/project-open-data/project-open-data.github.io/issues/303) + +**Do I understand correctly that you are going to automate CSDGM/ISO to JSON transform and add the auto-detect for geographic information? If so, why not wait for these solutions versus having all metadata producers to dedicate such effort to a temporary fixes (create two records and add ‘geospatial’ theme keyword)?** + +Data.gov does not and will not be providing any transformations from CSDGM/ISO to Project Open Data JSON nor will it be relying on those transformations for data.gov’s purposes. However, as part of the OMB policy, all metadata will need to be provided as JSON including metadata that is currently provided as CSDGM/ISO. Data.gov does help maintain the CKAN extension for generating Project Open Data JSON (ckanext-datajson) and this can be used to provide automated conversion from CSDGM/ISO metadata. We are working to coordinate with geospatial agencies interested in using this extension. + +**Can you tell us the source of the CSDGM validation schema that is currently being used to validate records being harvested by data.gov? We are having some problems understanding why certain records are ‘failing’ validation, and the data.gov harvest report error messages are very difficult to decipher in many cases, even by experienced metadata creators.** + +The metadata is not validated as CSDGM. It is converted to ISO 19115 then validated against the ISO 19115 schema. CSDGM metadata is transformed to ISO using the following XSLT: [https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/fgdc2iso/fgdcrse2iso19115-2.xslt](https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/fgdc2iso/fgdcrse2iso19115-2.xslt) + +**What if I need help getting my geospatial metadata into Data.gov and GeoPlatform.gov?** + +The FGDC has assistance available if you [email USGS.gov](mailto:geodata@usgs.gov). They will also coordinate with the Data.gov team to help address your issue if needed. If you have specific metadata questions, [contact FGDC via email](mailto:fgdc@fgdc.gov). + +### Non-federal Data + +**What are the advantages of federating my local government’s open data to Data.gov?** + +Federating your local open government data on Data.gov makes it more discoverable by Data.gov visitors as well as folks using search engines powered by schema.org. It also makes it easier for people to discover similar datasets across different levels of government, e.g., a citizen can find data about her location from city, state, and federal sources in one place. + +**If non-federal data is being shared through Data.gov, is there someone that can help validate that the posted data policy/terms of use are in compliance?** + +We make sure that non-federal sources have terms of use/data policies so that we can make it clear to end users that those datasets are governed by the terms of use/data policies of the non-federal entity (state, city, county, etc.) and not the federal policies. + +**Can you address data repositories that are federally funded but not hosted at the agency itself? Can they be harvested directly by data.gov?** + +Yes. + +**If non-federal metadata is shared with the NSGIC GIS Inventory, is it then harvested by data.gov?** + +Yes, the NSGIC GIS Inventory (GISI)is a Data.gov registered harvest sources. From 911724cfe59c3e87b4cd370e0fe5afed3884c8a9 Mon Sep 17 00:00:00 2001 From: Jeremy Zilar- The platform is primarily used to generate two different data.json files for each agency, one representing the Unredacted Inventory / Enterprise Data Inventory (EDI) that contains unredacted datasets and another representing the Redacted Inventory / Public Data Listing (PDL) that contains datasets with redactions as agency indicated. The Unredacted Inventory/EDI file is a comprehensive inventory for an agency. This file is sent directly to omb.max.gov and not publicly accessed. The Redacted Inventory/PDL file (often referred to as data.json) is made public and is posted on agency.gov/data.json and harvested by the Data.gov catalog. -
- - is based on CKAN (Comprehensive Knowledge Archive Network), an open source technology that powers many government open data sites. Inventory.data.gov is a separate instance of CKAN hosted at GSA on the same infrastructure as the Data.gov catalog. To avoid confusion, neither site should simply be referred to as the “CKAN site”, but instead by the full URL. - -- This guide describes the features available through inventory.data.gov and provides instructions for agencies interested in using inventory.data.gov for dataset management. At the current level of usage, Data.gov/GSA has been able to provide inventory.data.gov to interested agencies at no cost. This could change in the future
with expanded usage and services. - -- Inventory.data.gov as a dataset management system requires a user account and is not publicly accessible (with the exception of a few datasets that are hosted on inventory.data.gov, discussed in more detail below). Inventory.data.gov allows the registered users to manage the datasets for their agency only. -
- -- Inventory.data.gov provides the following features for dataset management/Open Data Policy compliance purposes: -
- -- Once an agency has consulted with the Data.gov PMO and has decided to use inventory.data.gov for dataset management, the next steps are: -
- -- Logins to use inventory.data.gov are integrated with
OMB MAX login. You should log in to OMB MAX using your OMB MAX credentials on your first use of inventory.data.gov. - -- Two-factor authentication (2FA) is now required, so enable 2FA through the instructions on the
OMB MAX login page. You can add a device by clicking on “Manage SMS 2-Factor Devices” under your profile settings. This is not necessary if you use your PIV/CAC card to log in to MAX; it is already considered 2FA. - -- {{< legacy-img src="/2015/01/600-Data.gov-page-MAX-Login.png" alt="Screen shot of MAX login page" >}} -
- -- After you have logged in to OMB MAX, email us
to let us know you have completed the initial login, in the email provide the organization name and access permission (Admin or Editor) required. We will then associate requested permissions to your account. On subsequent logins to inventory.data.gov (through OMB MAX), you should be able to use inventory.data.gov to manage metadata without further assistance. - -- Once you have an account as an Editor or Administrator for your agency (Organization), you can use inventory.data.gov to add or edit datasets for your Organization. -
- -- Once you log in, you’ll see an activity page showing your recent activity on inventory.data.gov. -
- -- {{< legacy-img src="/2015/01/600-x-376-recent-activity.jpg" alt="Screen capture of recent activity list" >}} -
- -- The main tabs you will be using are “Datasets” and “Organizations.” -
- -- Datasets: -
- -- {{< legacy-img src="/2015/01/600-x-336-datasets.jpg" alt="Screen capture of Datasets tab" >}} -
- -- Organizations: -
- -- {{< legacy-img src="/2015/01/600-x-347-organizations.jpg" alt="Screen capture of Organizations tab" >}} -
- -- When you click on the “Organizations” tab, you’ll go to the main Organizations page that lists all the Organizations in inventory.data.gov. To find your Organization, you can search for it in the search bar on the page. -
- -- When you click on your Organization, all the datasets for the Organization will be displayed. -
- -- There’s an Admin button on the upper right for Administrators. By clicking on it, you get the main information about the Organization, including a tab for “Members.” -
- -- Clicking on “Members” brings up all the Members of the Organization who can add and edit datasets. Clicking on the Add Members button gives you the option of adding an existing user of inventory.data.gov to your Organization.
Before you can add Members to your organization, the new Member has to complete initial log in on inventory.data.gov using the OMB MAX, 2FA or PIV authentication. - -- {{< legacy-img src="/2015/01/600-x-376-three-levels-of-access.jpg" alt="Screen capture showing access levels" >}} -
- -- There are three levels of access. Member is read only access. Editor allows you to add and edit datasets. Admin can perform the same actions as an Editor and also add members to the Organization. -
- -- Click on the Dataset tab. -
- -- Then click on Add a Dataset. That takes you to the Create Dataset page. -
- -- The Create Dataset page features the metadata following the Project Open Data metadata schema version 1.1. -
- -- {{< legacy-img src="/2015/01/600-x-376-Adding-a-Dataset.jpg" alt="Screen capture for Adding a Dataset" >}} -
- -- There are examples displayed in each field, and there is also pop-up help information for each field. Full details are on the
Project Open Data page. Fields marked by the red asterisk are required. - -- There is validation for each field, so if you try to skip a required field or enter improper formats for a particular field, you’ll get an error message when you try to add the dataset. -
- -- Entering Metadata -
- -- Enter the
Title for the dataset, as you would like the Title to be displayed. - -- The URL for the dataset will be generated automatically based on the Title. -
- -- Enter the
Description for the dataset as a summary about the dataset. - -- Enter the
Keywords for the dataset. Your dataset will be included when a Data.gov user searches for those keywords. - -- Modified/Last Update – most recent date on which the dataset was changed, updated or modified -
- -- For
Publisher, enter the name of your agency. Use the Add sub-agency button to enter a sub-agency to more specifically identify the source of the dataset. These additional sub-agency fields are optional. - -- Access Level: Choose public, restricted public, or non-public. All type of “Access Level” datasets are now included in both Redacted Inventory (PDL) and Unredacted Inventory (EDI),
The redaction feature is not available for public datasets, but is available for restricted public and non-public datasets. In editing metadata for restricted public and non-public datasets, the applicable individual fields can be fully or partially redacted. - -- See the
Redaction and Partial Redaction feature for more information. - -- Bureau Code/Program Code. These are required fields following specific formats. Your agency POC should have this information. See the
Project Open Data page for more information about these fields. - -- Contact Name/Email. Enter the name and email information for the contact for this dataset. Questions about this dataset will be referred to the contact listed. -
- -- There are several Required If Applicable fields that are optional. Example text is provided in each field. -
- -- {{< legacy-img src="/2015/10/579-x-276-common-core-metadata.jpg" alt="Screen capture of Required-if-applicable common core metadata" >}} -
- -- Rights. You have the option to enter a short explanation for why the value in the
Access Level field was selected here. - -- Meets Agency Data Quality. This field shows if the dataset meets your agency’s data quality standards if any apply. -
- -- Please note that in the
License field, if applicable, the URL for the license should be provided. - -- Spatial. If the dataset is about a specific location, enter that information in this field. -
- -- Temporal. If the dataset covers a specific date range, enter that information following the format provided. -
- -- Theme. Enter words for themes or categories. This supports search and faceted viewing of datasets. -
- -- Data Dictionary. If there is a data dictionary or schema for the dataset, provide the URL here. -
- -- Described by Type. Choose the data dictionary type from the formats listed. -
- -- Accrual Periodicity (Frequency). Choose the frequency from the options listed. -
- -- Conforms to (Data Standard). If there is an applicable data standard, enter the URL here. -
- -- Landing Page (Homepage URL). If there is a landing page/home page associated with the dataset, enter the URL here. -
- -- Language. The default language for datasets is English. If the dataset is presented in another language, select it here. -
- -- Primary IT Investment UII. If there is IT Unique Investment Identifier associated with the dataset, enter it here. -
- -- References (Related Documents). URLs for additional related resources can be entered here. Multiple URLs can be entered separated by commas. -
- -- Issued (Release Date). Enter the release date in the date format provided. -
- -- System of Records. If there is a Privacy Act System of Records Notice (SORN) associated with the dataset, enter the URL for the published SORN here. -
- -- The last two fields are to indicate if this dataset is part of a collection or if you want to designate this dataset as the “parent” of a collection. -
- -- Is parent. Yes or No. If you want to make this dataset the “parent” of a collection, choose yes. Otherwise choose no. -
- -- {{< legacy-img src="/2015/01/600-x-130-parent-options.jpg" alt="Screen capture of parent fields" >}} -
- -- Note that once you make a dataset a “parent” dataset, it cannot be a “child” dataset, or part of another collection of datasets. After a dataset is selected as parent, this dataset will be available for the subsequent datasets in this organization to choose a parent dataset (for isPartOf) -
- -- If you chose No and made the dataset a non-parent dataset, you would then be able to choose in the next field (
Parent dataset) – the datasets that you want to make this dataset as part of a collection, note that this field is optional and can be left empty if you would like the dataset to be an independent dataset. - -- Once you are finished entering the metadata, hit
Next: Add Data on the bottom right. - -- {{< legacy-img src="/2015/01/251-x-83-Next-Add-Data-button.jpg" alt="Screen capture of Next: Add Data button" >}} -
- -- You are then taken to another screen with more fields about accessing the dataset. -
- -- For most inventory.data.gov users, the dataset is actually hosted on the agency.gov site, so you will hit the “Link to a file” radio button and provide the URL. If you would like to provide a link to API, choose the “Link to an API” radio button, provide the URL, and enter the value “API” in the format field, to provide a web page URL where dataset information is available user the radio button option “Access URL.” -
- -- {{< legacy-img src="/2015/10/600-x-512-add-a-new-resource-dataset.jpg" alt="Screen cap of Add a New Resource screen." >}} -
- -- Provide a
description of the dataset. - -- Media type. Choose the relevant format. -
- -- Format. Type in the format in this field. (API value for “Link to API” option) -
- -- While creating datasets in inventory.data.gov, users can set the “Publishing Status” value as “Draft,” which indicates that the dataset is still being worked on or reviewed. Any dataset saved with “Draft” status will not be included in the generated Redacted/Unredacted data.json files. The draft feature allows users to enter preliminary data and finalize the dataset at a later point by changing the “Publishing Status” value to “Published.” -
- -- {{< legacy-img src="/2015/01/unnamed-1.jpg" alt="unnamed-1" >}} -
- -- Users can create a clone of a particular dataset by clicking on the “Clone” button displayed on the top right of the dataset page. The Clone feature is useful if you are entering lots of datasets manually, and most of the metadata like Bureau Code, Program Code, Publisher Name, Contact Name etc. is the same for all of them. -
- -- After you click on the Clone button, make sure to correct the Dataset title and Unique Identifier at a minimum on the cloned dataset, and review all the other field values to make sure they are correct before updating the dataset. -
- -- Please note that resources like the CSV file uploaded or linked to the original dataset will not be cloned as it is expected that resources will be different for the cloned datasets. Resources need to be entered manually for the cloned datasets. -
- -- {{< legacy-img src="/2015/01/unnamed-2.jpg" alt="unnamed-2" >}} -
- -- Once you have finished entering and editing the metadata for your organization, you are ready to export the datasets to generate the data.json files which comply with Project Open Data Schema 1.1 version. This is done from the main Organization page, by clicking on your agency Organization icon on the left or by clicking on your agency Organization name on the top of the page. -
- -- There are three choices at the top. -
- -- {{< legacy-img src="/2015/01/600-x-162-exporting-data-json.jpg" alt="Screen capture of button choices" >}} -
- -- The Redacted Inventory button (formerly labeled Public Data Listing) will give you the redacted inventory of all of the datasets from your organization. This is the listing that you will post on your agency.gov/data.json and will be harvested by the Data.gov catalog to populate your organization’s listing in Data.gov catalog. -
- -- The Unredacted Inventory (formerly labeled Enterprise Data Inventory) button will give you a file of all of your organization’s datasets without any redactions. This is the file that is submitted to OMB and is not publicly shared unless an agency has made the decision to share it. -
- -- The Export Drafts button will export all datasets that are currently in “draft” status to a file. -
- -- The process for each is the same. Depending on your browser, clicking the button will either give you a prompt to open or save the file or it will be downloaded automatically. -
- -- {{< legacy-img src="/2015/10/600-x-453-GSA-153-datasets-found.jpg" alt="Screen capture of 153 datasets found for GSA" >}} -
- -- The file that is delivered is a zip file. It contains both the data.json and (if errors were found) an errorlog.txt and errors.json file. -
- -
- It is very important to make sure that all datasets were successfully exported when downloading the Unredacted (EDI) or Redacted (PDL) data.json file. This can be done by checking to make sure that an errorlog.txt file was not included in the zip file. If there are errors in the errorlog.txt file, those need to be reviewed and resolved before posting the Redacted (PDL) data.json file in the agency.gov/data.json location. This is critical. Ignoring this step can potentially remove datasets from data.gov since they will no longer be present in the data.json file. Metadata that is directly entered into inventory.data.gov will be validated when the information is first entered, however some invalid metadata may have been imported directly into inventory.data.gov and will not be checked until this final export. The data.gov team is aware that this is not very user friendly or streamlined way of indicating which datasets need to be fixed and is in the process of improving inventory.data.gov so that these errors can be viewed directly on the inventory.data.gov website rather than as a separate log file. However, for now it is very important to review this file as explained earlier. - |
-
- The errorlog.txt will describe the datasets that did not validate against the Project Open Data Schema as needed to be included in the data.json file. If there were no errors encountered in creating the data.json, then errorlog.txt file will not appear in the zip file. -
- -- {{< legacy-img src="/2015/01/600-x-326-error-log-location.jpg" alt="Screen capture of error log location" >}} -
- -- {{< legacy-img src="/2015/01/600-x-391-error-log.jpg" alt="Screen capture of error log" >}} -
- -- The error log indicates the type of problem, the dataset ID, and the dataset title, so that you can find the problematic dataset in inventory.data.gov and fix the problem. Search for the dataset, choose Edit, and make the required changes. -
- -- Once you fix the errors, go back to the Organization page and re-download the Redacted Inventory (PDL) or Unredacted Inventory (EDI) datasets, and you should get a zip file with just the data.json and no error log. -
- -- For the Unredacted Inventory (EDI), the data.json file can be submitted to OMB Max. For the Redacted Inventory (PDL), the data.json will need to be posted on the agency’s website at agency.gov/data.json in order to be harvested by the Data.gov catalog. If the data.json is being posted on the agency.gov website for the first time, please contact the Data.gov team via email
to set the harvest source on catalog.data.gov. - -- Rather than log in to inventory.data.gov and manually click the buttons to download the Redacted Inventory (PDL) and Unredacted Inventory (EDI) files, it is possible to use the API and your API key in order to automate the process of adding the data.json file to the agency website. -
- -- Here’s what the request would look like using the curl-like tool
httpie - -- Redacted Inventory (PDL) -
- -- http -v POST https://inventory.data.gov/organization/[organization-id]/redacted.json Authorization:xxx-your-api-key-xxx Content-Type:application/x-www-form-urlencoded Cookie:auth_tkt=foo -
- -- Unredacted Inventory (EDI) -
- -- http -v POST https://inventory.data.gov/organization/[organization-id]/unredacted.json Authorization:xxx-your-api-key-xxx Content-Type:application/x-www-form-urlencoded Cookie:auth_tkt=foo -
- -- You’d need to replace xxx-your-api-key-xxx with your actual API key which you can get in the bottom of the left sidebar of your user account, e.g. https://inventory.data.gov/user/[username] -
- -- The above process downloads the Redacted.zip file containing the data.json and errorlog.txt. If there are any errors, you should review and correct them. When you reach a point where there is no errorlog.txt file, you can upload the data.json file to the agency.gov/data.json public data listing for data.gov catalog harvesting purposes. It is recommended that you manually review and upload the data.json file a few times, before completely automating the entire process to ensure there are no errors that might be excluding datasets from the data.json file (potentially removing existing datasets from data.gov). -
- -- {{< legacy-img src="/2015/01/600-x-376-Dataset-Hosting.jpg" alt="Screen capture of dataset hosting" >}} -
- -- The primary use of inventory.data.gov is to provide a dataset management system for agencies that need a tool to meet the requirements of the Open Data Policy. It is a separate instance of CKAN and requires a user account and password.
- -- Data.gov uses inventory.data.gov for a second purpose. From 2011-2014, Data.gov featured data hosting using the Socrata platform at explore.data.gov. About a dozen agencies used explore.data.gov to host their datasets directly on Data.gov. At the conclusion of the Socrata task order in 2014, several agencies indicated a continuing need for hosting capability for these datasets (approximately 100). To accommodate these agencies, the Data.gov PMO migrated the datasets previously hosted at explore.data.gov to a hosting capability using the CKAN DataStore extension, on the CKAN instance at inventory.data.gov. As a result, the access URLs for this small number of datasets begin with inventory.data.gov and are publicly accessible without needing a user account.
- -- For the time being, the Data.gov PMO will continue to provide this hosting capability on inventory.data.gov while it remains small scale and does not require significant additional Data.gov resources.
- -- Data.gov, working with DigitalGov University, presented a webinar for agencies interested in using inventory on December 16, 2014. For more information, consult the recording of the webinar.
- -- [youtube=http://www.youtube.com/watch?v=ciLYjDsF4lo&w=600] -
- -- Here are some questions and answers raised during the webinar: -
- -- If someone at your agency already has an admin account on inventory.data.gov, that person can grant you an admin account. There are no limits on how many admin accounts an agency can have. -
- -- Contact the DataGov team via email.
- -- When you contact the Data.gov team (see contact info above) about using inventory.data.gov, we can create the organization for you. -
- -- Yes. That designation also ensures that record is included in geoplatform.gov. -
- -- For importing existing data.json, please contact the Data.gov team and we will work with you to migrate into inventory.data.gov. -
- -- For Data.gov/Open Data Policy purposes, if you don’t know your agency’s POC, contact the Data.gov team and we will connect you. You can also find your agency POC at https://project-open-data.cio.gov/points-of-contact/ -
- -- Inventory.data.gov is a separate system from the Data.gov catalog, using the same technology as the catalog (CKAN) to provide a service that agencies can use to create and maintain data inventories that are ultimately harvested by the Data.gov catalog from agency.gov/data.json -
- -- Yes, but not through the user interface on inventory.data.gov currently. Please contact the Data.gov team if you have this situation. -
- -- Search does not cover the actual content. -
- -- See the “access level” field in Project Open Data: The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public).
- -- “Central CKAN” was a term used in 2014 to refer to what we call inventory.data.gov – it is one and the same. This webinar focused on the primary purpose of inventory.data.gov – to provide a dataset management system to help agencies create and maintain their data inventories and arrive at a data.json. We do use inventory.data.gov for a second purpose – to host a small number of datasets for agencies that needed a hosting capability at the conclusion of the Socrata platform contract that Data.gov had until July 2014. -
- -- Inventory.data.gov in its current form is not designed to handle geospatial metadata. As discussed in our harvesting }}">documentation, Data.gov continues to harvest geospatial datasets directly from geospatial harvest sources. Agencies should have the remainder of their agencies in a “non-geospatial” data.json that the Data.gov catalog can harvest.
- -- When you are ready to export your agency’s datasets to data.json and hit the button on inventory.data.gov, you will be prompted to download a zip file containing the data.json. If there are errors, the zip file will also contain an error log containing information about the datasets with errors. We are working on improving validation within inventory.data.gov so that more errors are identified as you are adding and editing datasets, before you reach the export stage. -
\ No newline at end of file +## Introduction + +[Inventory.data.gov](http://inventory.data.gov/) is a data management tool established in 2013 by the Data.gov Program Management Office (PMO) in what is now the Technology Transformation Service of the U.S. General Services Administration (GSA). Inventory.data.gov supports the implementation of the 2013 Federal [Open Data Policy](http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf) (M-13-13) by providing all of the relevant metadata fields and export formats required for agencies to have their data harvested by Data.gov and to comply with the Federal Open Data Policy. Inventory.data.gov is not required to be used by all agencies, but rather offered in order to assist agencies that do not have another solution for creating and maintaining data inventories. + +The platform is primarily used to generate two different data.json files for each agency, one representing the Unredacted Inventory / Enterprise Data Inventory (EDI) that contains unredacted datasets and another representing the Redacted Inventory / Public Data Listing (PDL) that contains datasets with redactions as agency indicated. The Unredacted Inventory/EDI file is a comprehensive inventory for an agency. This file is sent directly to omb.max.gov and not publicly accessed. The Redacted Inventory/PDL file (often referred to as data.json) is made public and is posted on agency.gov/data.json and harvested by the Data.gov catalog. + +[The Data.gov catalog](http://catalog.data.gov/) is based on [CKAN](http://ckan.org/) (Comprehensive Knowledge Archive Network), an open source technology that powers many government open data sites. [Inventory.data.gov](http://inventory.data.gov/) is a separate instance of CKAN hosted at GSA on the same infrastructure as the Data.gov catalog. To avoid confusion, neither site should simply be referred to as the “CKAN site”, but instead by the full URL. + +This guide describes the features available through inventory.data.gov and provides instructions for agencies interested in using inventory.data.gov for dataset management. At the current level of usage, Data.gov/GSA has been able to provide inventory.data.gov to interested agencies at no cost. This could change in the future with expanded usage and services. + +Inventory.data.gov as a dataset management system requires a user account and is not publicly accessible (with the exception of a few datasets that are hosted on inventory.data.gov, discussed in more detail below). Inventory.data.gov allows the registered users to manage the datasets for their agency only. + +## Features + +Inventory.data.gov provides the following features for dataset management/Open Data Policy compliance purposes: + +* Existing data.json and enterprise data inventory metadata can be imported directly into inventory.data.gov. +* User group organizations can be created either for an entire agency or for individual bureaus so that users and permissions can be managed and delegated as needed by the agency +* A platform to manage public/non-public datasets: create new entries, modify existing ones, and delete any datasets as needed +* Unredacted Inventory / Enterprise Data Inventory (EDI) and Redacted Inventory / Public Data Listing (PDL) functionality exports the successfully validated datasets in the form of a data.json file complying to the POD Schema 1.1 version. Error records are provided in a log file for review and resolution by agency users. + +## Using inventory.data.gov + +Once an agency has consulted with the Data.gov PMO and has decided to use inventory.data.gov for dataset management, the next steps are: + +1. Setting up an “organization” on inventory.data.govFirst, the agency should decide how they want to present their structure within inventory.data.gov. An agency could make the “organization” the entire agency (for example, make the entire Department of Veterans Affairs one “organization). In that case, everyone with a user account associated to VA on inventory.data.gov would be able to view, add, edit or delete datasets for all of VA within inventory.data.gov. An agency could create an “organization” for sub-agencies, for instance creating an organization for the Farm Service Agency and all the other units of the Department of Agriculture. In that scenario, a user account for Farm Service Agency could view, add, edit or delete datasets for the Farm Service Agency only.The Data.gov PMO will set up organizations in the manner requested by the agency. + +1. Migrating existing datasets to inventory.data.govMost agencies have already created data inventories pursuant to the Open Data Policy. To begin using inventory.data.gov to manage the datasets, the Data.gov team will migrate the existing data listings to inventory.data.gov so that agencies can start editing from the current versions. For agencies that do not have data inventories under the Open Data Policy, but do have datasets listed in the current Data.gov catalog from the old Dataset Management System, the Data.gov PMO can migrate these old listings into inventory.data.gov for the agencies to update. + +1. Creating user accounts for the agency representatives who will be using inventory.data.gov to manage the agency datasets.Inventory.data.gov, for dataset management purposes, is not publicly accessible. Users must have accounts on the system. There are three levels of accounts for any “organization” on inventory.data.gov: Administrator, Editor, and Member. For dataset management, agency users will either have an Administrator or Editor account. Agencies should appoint at least one Administrator. Administrators can add additional members from that agency (organization), including additional Administrators.Agencies should nominate the Administrator(s). The Data.gov PMO will create these accounts and ensure that the Administrators are familiar with the process of adding additional Administrators and Members for their agency (organization). + +### Creating a User Account + +Logins to use inventory.data.gov are integrated with [OMB MAX login](https://login.max.gov/cas/login?service=https%3A%2F%2Fmax.gov%2Fmaxportal%2Fsa%2Flogin.action%3Bjsessionid%3D994E6FE4CC968B89D599672C9F9DDAAE). You should log in to OMB MAX using your OMB MAX credentials on your first use of inventory.data.gov. + +Two-factor authentication (2FA) is now required, so enable 2FA through the instructions on the [OMB MAX login](https://login.max.gov/cas/login?service=https%3A%2F%2Fmax.gov%2Fmaxportal%2Fsa%2Flogin.action%3Bjsessionid%3D994E6FE4CC968B89D599672C9F9DDAAE) page. You can add a device by clicking on “Manage SMS 2-Factor Devices” under your profile settings. This is not necessary if you use your PIV/CAC card to log in to MAX; it is already considered 2FA. + +{{< legacy-img src="/2015/01/600-Data.gov-page-MAX-Login.png" alt="Screen shot of MAX login page" >}} + +After you have logged in to OMB MAX, [email us](mailto:datagov@gsa.gov) to let us know you have completed the initial login, in the email provide the organization name and access permission (Admin or Editor) required. We will then associate requested permissions to your account. On subsequent logins to inventory.data.gov (through OMB MAX), you should be able to use inventory.data.gov to manage metadata without further assistance. + +### Using inventory.data.gov to manage datasets + +Once you have an account as an Editor or Administrator for your agency (Organization), you can use inventory.data.gov to add or edit datasets for your Organization. + +Once you log in, you’ll see an activity page showing your recent activity on inventory.data.gov. + +{{< legacy-img src="/2015/01/600-x-376-recent-activity.jpg" alt="Screen capture of recent activity list" >}} + +The main tabs you will be using are “Datasets” and “Organizations.” + +Datasets: + +{{< legacy-img src="/2015/01/600-x-336-datasets.jpg" alt="Screen capture of Datasets tab" >}} + +Organizations: + +{{< legacy-img src="/2015/01/600-x-347-organizations.jpg" alt="Screen capture of Organizations tab" >}} + +When you click on the “Organizations” tab, you’ll go to the main Organizations page that lists all the Organizations in inventory.data.gov. To find your Organization, you can search for it in the search bar on the page. + +When you click on your Organization, all the datasets for the Organization will be displayed. + +There’s an Admin button on the upper right for Administrators. By clicking on it, you get the main information about the Organization, including a tab for “Members.” + +Clicking on “Members” brings up all the Members of the Organization who can add and edit datasets. Clicking on the Add Members button gives you the option of adding an existing user of inventory.data.gov to your Organization. Before you can add Members to your organization, the new Member has to complete initial log in on inventory.data.gov using the OMB MAX, 2FA or PIV authentication. + +{{< legacy-img src="/2015/01/600-x-376-three-levels-of-access.jpg" alt="Screen capture showing access levels" >}} + +There are three levels of access. Member is read only access. Editor allows you to add and edit datasets. Admin can perform the same actions as an Editor and also add members to the Organization. + +### Adding a Dataset + +Click on the Dataset tab. + +Then click on Add a Dataset. That takes you to the Create Dataset page. + +The Create Dataset page features the metadata following the [Project Open Data](https://project-open-data.cio.gov/v1.1/schema/) metadata schema version 1.1. + +{{< legacy-img src="/2015/01/600-x-376-Adding-a-Dataset.jpg" alt="Screen capture for Adding a Dataset" >}} + +There are examples displayed in each field, and there is also pop-up help information for each field. Full details are on the [Project Open Data](https://project-open-data.cio.gov/v1.1/schema/) page. Fields marked by the red asterisk are required. + +There is validation for each field, so if you try to skip a required field or enter improper formats for a particular field, you’ll get an error message when you try to add the dataset. + +**Entering Metadata** + +Enter the **Title** for the dataset, as you would like the Title to be displayed. + +The URL for the dataset will be generated automatically based on the Title. + +Enter the **Description** for the dataset as a summary about the dataset. + +Enter the **Keywords** for the dataset. Your dataset will be included when a Data.gov user searches for those keywords. + +**Modified/Last Update** – most recent date on which the dataset was changed, updated or modified + +For **Publisher**, enter the name of your agency. Use the Add sub-agency button to enter a sub-agency to more specifically identify the source of the dataset. These additional sub-agency fields are optional. + +**Access Level**: Choose public, restricted public, or non-public. All type of “Access Level” datasets are now included in both Redacted Inventory (PDL) and Unredacted Inventory (EDI), The redaction feature is not available for public datasets, but is available for restricted public and non-public datasets. In editing metadata for restricted public and non-public datasets, the applicable individual fields can be fully or partially redacted. + +See the [Redaction and Partial Redaction feature](https://github.com/GSA/data.gov/wiki/Intentory.data.gov-User-Guide-for-Redaction-and-Partial-Redaction-Feature) for more information. + +**Bureau Code/Program Code**. These are required fields following specific formats. Your agency POC should have this information. See the [Project Open Data](https://project-open-data.cio.gov/v1.1/schema/) page for more information about these fields. + +**Contact Name/Email**. Enter the name and email information for the contact for this dataset. Questions about this dataset will be referred to the contact listed. + +There are several Required If Applicable fields that are optional. Example text is provided in each field. + +{{< legacy-img src="/2015/10/579-x-276-common-core-metadata.jpg" alt="Screen capture of Required-if-applicable common core metadata" >}} + +**Rights**. You have the option to enter a short explanation for why the value in the **Access Level** field was selected here. + +**Meets Agency Data Quality**. This field shows if the dataset meets your agency’s data quality standards if any apply. + +Please note that in the **License** field, if applicable, the URL for the license should be provided. + +**Spatial**. If the dataset is about a specific location, enter that information in this field. + +**Temporal**. If the dataset covers a specific date range, enter that information following the format provided. + +**Theme**. Enter words for themes or categories. This supports search and faceted viewing of datasets. + +**Data Dictionary**. If there is a data dictionary or schema for the dataset, provide the URL here. + +**Described by Type**. Choose the data dictionary type from the formats listed. + +**Accrual Periodicity (Frequency)**. Choose the frequency from the options listed. + +**Conforms to (Data Standard)**. If there is an applicable data standard, enter the URL here. + +**Landing Page (Homepage URL)**. If there is a landing page/home page associated with the dataset, enter the URL here. + +**Language**. The default language for datasets is English. If the dataset is presented in another language, select it here. + +**Primary IT Investment UII.** If there is IT Unique Investment Identifier associated with the dataset, enter it here. + +**References (Related Documents)**. URLs for additional related resources can be entered here. Multiple URLs can be entered separated by commas. + +**Issued (Release Date)**. Enter the release date in the date format provided. + +**System of Records.** If there is a Privacy Act System of Records Notice (SORN) associated with the dataset, enter the URL for the published SORN here. + +The last two fields are to indicate if this dataset is part of a collection or if you want to designate this dataset as the “parent” of a collection. + +**Is parent**. Yes or No. If you want to make this dataset the “parent” of a collection, choose yes. Otherwise choose no. + +{{< legacy-img src="/2015/01/600-x-130-parent-options.jpg" alt="Screen capture of parent fields" >}} + +Note that once you make a dataset a “parent” dataset, it cannot be a “child” dataset, or part of another collection of datasets. After a dataset is selected as parent, this dataset will be available for the subsequent datasets in this organization to choose a parent dataset (for isPartOf) + +If you chose No and made the dataset a non-parent dataset, you would then be able to choose in the next field (**Parent dataset**) – the datasets that you want to make this dataset as part of a collection, note that this field is optional and can be left empty if you would like the dataset to be an independent dataset. + +Once you are finished entering the metadata, hit **Next: Add Data** on the bottom right. + +{{< legacy-img src="/2015/01/251-x-83-Next-Add-Data-button.jpg" alt="Screen capture of Next: Add Data button" >}} + +You are then taken to another screen with more fields about accessing the dataset. + +For most inventory.data.gov users, the dataset is actually hosted on the agency.gov site, so you will hit the “Link to a file” radio button and provide the URL. If you would like to provide a link to API, choose the “Link to an API” radio button, provide the URL, and enter the value “API” in the format field, to provide a web page URL where dataset information is available user the radio button option “Access URL.” + +{{< legacy-img src="/2015/10/600-x-512-add-a-new-resource-dataset.jpg" alt="Screen cap of Add a New Resource screen." >}} + +Provide a **description** of the dataset. + +**Media type**. Choose the relevant format. + +**Format**. Type in the format in this field. (API value for “Link to API” option) + +### Draft Feature + +While creating datasets in [inventory.data.gov](http://inventory.data.gov/), users can set the “Publishing Status” value as “Draft,” which indicates that the dataset is still being worked on or reviewed. Any dataset saved with “Draft” status will not be included in the generated Redacted/Unredacted data.json files. The draft feature allows users to enter preliminary data and finalize the dataset at a later point by changing the “Publishing Status” value to “Published.” + +{{< legacy-img src="/2015/01/unnamed-1.jpg" alt="unnamed-1" >}} + +### Clone Feature + +Users can create a clone of a particular dataset by clicking on the “Clone” button displayed on the top right of the dataset page. The Clone feature is useful if you are entering lots of datasets manually, and most of the metadata like Bureau Code, Program Code, Publisher Name, Contact Name etc. is the same for all of them. + +After you click on the Clone button, make sure to correct the Dataset title and Unique Identifier at a minimum on the cloned dataset, and review all the other field values to make sure they are correct before updating the dataset. + +Please note that resources like the CSV file uploaded or linked to the original dataset will not be cloned as it is expected that resources will be different for the cloned datasets. Resources need to be entered manually for the cloned datasets. + +{{< legacy-img src="/2015/01/unnamed-2.jpg" alt="unnamed-2" >}} + +### Exporting Data.json + +Once you have finished entering and editing the metadata for your organization, you are ready to export the datasets to generate the data.json files which comply with Project Open Data Schema 1.1 version. This is done from the main Organization page, by clicking on your agency Organization icon on the left or by clicking on your agency Organization name on the top of the page. + +There are three choices at the top. + +{{< legacy-img src="/2015/01/600-x-162-exporting-data-json.jpg" alt="Screen capture of button choices" >}} + +The Redacted Inventory button (formerly labeled Public Data Listing) will give you the redacted inventory of all of the datasets from your organization. This is the listing that you will post on your agency.gov/data.json and will be harvested by the Data.gov catalog to populate your organization’s listing in Data.gov catalog. + +The Unredacted Inventory (formerly labeled Enterprise Data Inventory) button will give you a file of all of your organization’s datasets without any redactions. This is the file that is submitted to OMB and is not publicly shared unless an agency has made the decision to share it. + +The Export Drafts button will export all datasets that are currently in “draft” status to a file. + +The process for each is the same. Depending on your browser, clicking the button will either give you a prompt to open or save the file or it will be downloaded automatically. + +{{< legacy-img src="/2015/10/600-x-453-GSA-153-datasets-found.jpg" alt="Screen capture of 153 datasets found for GSA" >}} + +The file that is delivered is a zip file. It contains both the data.json and (if errors were found) an errorlog.txt and errors.json file. + +It is very important to make sure that all datasets were successfully exported when downloading the Unredacted (EDI) or Redacted (PDL) data.json file. This can be done by checking to make sure that an errorlog.txt file was not included in the zip file. If there are errors in the errorlog.txt file, those need to be reviewed and resolved before posting the Redacted (PDL) data.json file in the agency.gov/data.json location. This is critical. Ignoring this step can potentially remove datasets from data.gov since they will no longer be present in the data.json file. +Metadata that is directly entered into inventory.data.gov will be validated when the information is first entered, however some invalid metadata may have been imported directly into inventory.data.gov and will not be checked until this final export. The data.gov team is aware that this is not very user friendly or streamlined way of indicating which datasets need to be fixed and is in the process of improving inventory.data.gov so that these errors can be viewed directly on the inventory.data.gov website rather than as a separate log file. However, for now it is very important to review this file as explained earlier. | + +