The NASA Socioeconomic Data and Applications Center (SEDAC), managed by the Center for International Earth Science Information Network (CIESIN) of Columbia University, archives and disseminates scientific data for applied and research uses. Here we describe the data acquisition criteria, data quality standards, and data policies for the benefit of those wishing to disseminate their data via SEDAC.
There are many benefits to publishing your data in an open data repository. By publishing through a reputable domain-specific repository such as SEDAC, you will likely increase the citations of your published work and your data. SEDAC’s process of quality assurance (QA) and documentation preparation adds value and may catch errors in your data or documentation. Users can more readily discover, understand and use your data both in the near future and in the long-term. Sponsors maximize their investment in data and knowledge generation and increase scientific transparency and replicability. In addition, scientists outside your project or discipline will be able to find, understand, and use your data to address additional questions, potentially in new or interdisciplinary research areas. As part of its commitment to support community efforts to understand human interactions in the environment, SEDAC preserves the data and related information resources that it disseminates to ensure their continuing access and use. Preservation of data and related information resources is accomplished by actively engaging in archiving, curation, and ongoing stewardship to support the current and future community of users in their quest to discover, explore, access, use, and cite the data and information resources that are managed by SEDAC.
Submission of candidate data sets is a two-step process. In the first step we request basic information on the data (e.g., nature of the data set, purpose for creating the data set, etc.) that will help us to evaluate suitability for SEDAC archiving and dissemination. If the data are deemed appropriate by SEDAC staff, in a second step we will request a copy of the data and may require additional information for review by the SEDAC User Working Group (UWG), a group that provides ongoing advice and guidance regarding SEDAC activities and plans. Should the data set be approved for dissemination by the UWG, and if data and documentation are provided in a timely manner, the data set should be released within 3-4 months. Authors may request expedited service and the setup of a Digital Object Identifier (DOI) if they are working with a journal and need release to coincide with publication. Note that if the data set has already been published with a DOI we are unlikely to be able to consider it for SEDAC dissemination, though we welcome requests to archive and disseminate updated versions.
Submit candidate data set / Provide me with more information on the benefits of data dissemination via SEDAC
SEDAC Data Acquisition Criteria
SEDAC specializes in spatial data in support of human-environment research, in the context of NASA’s Earth science mission and the overall U.S. Global Change Research Program. SEDAC prioritizes and strongly encourages data submissions that meet one or more of the following criteria:
- Data sets with global coverage
- Time series data
- Spatial (geo-referenced) data, gridded or vector with subnational resolution, including:
- Demographic data (e.g. population counts, demographic characteristics)
- Population dynamics (e.g. mortality, fertility, migration)
- Economic development data (e.g. poverty, GDP grids, income, infant mortality rates)
- Infrastructure and human settlements data
- Administrative boundaries or other reference layers
- Remote sensing-derived environmental indicators
- Environmental health data
- Land use / land cover / human impacts data
- Sustainable development / sustainability science data
- Country-level environmental indicators
SEDAC welcomes, but will give lower priority, to non-global coverage data submissions (e.g. data covering major world regions or demographically significant countries), and to data representing a single time slice.
Data Quality Review
As a NASA repository and a member of the International Science Council World Data System (WDS), SEDAC strives to maintain high quality standards across all data sets in its collections. Data submissions need to be accompanied by data documentation that describes in detail the methods and inputs used to develop the data set, as well as data quality information (e.g., known errors and limitations, sampling biases, validation results, statistical reliability and validity, accuracy and precision, etc.).
SEDAC prefers data that have been peer-reviewed, in the following order of preference: 1. A data set for which a journal article in a reputable journal describing the data and methods used to develop the data has been published, submitted, or is in preparation; 2. A data set produced for a report or book chapter that engaged expert external reviewers; or 3. A data set for which there is a detailed technical working paper produced by a reputable organization or team or researchers with demonstrable expertise.
SEDAC will review other data submissions on a case-by-case basis, including data that have already been published in a peer-reviewed data journal that is archived elsewhere but could benefit from dissemination and support through SEDAC. Valuable data that do not meet the above criteria may be considered after internal review of the data and associated documentation. SEDAC’s decision to archive and disseminate any data set is subject to review by SEDAC’s external User Working Group and by NASA, and is conditional on the availability of resources.
SEDAC supports open scientific data, per the WDS Data Sharing Principles. We request that data providers accept the dissemination of their data under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) unless there are extenuating circumstances such as data restrictions inherited from input data. Providers need to clearly indicate if the data are in the public domain, have a CC-BY license or equivalent fully open license, or have some other license applied to them.
SEDAC can work with data providers if, under certain circumstances, they are unable to provide the data under the CC-BY license, but under no circumstances can SEDAC publish the data in the absence of a signed statement granting SEDAC permission to disseminate the data. If your data are accepted for dissemination by SEDAC, you will need to download CIESIN’s Open Data and Information Agreement, fill out the information required, and submit a signed copy to firstname.lastname@example.org.
When completing the agreement, SEDAC recommends composing a concise data set title (125 characters or less) that is memorable, contains scientific context, geographical coverage (e.g. global) temporal extent (e.g. 1920-2020), and a version number if the version is greater than 1 (e.g. Version 2.01). Examples of data set titles can be viewed on the SEDAC website (https://sedac.ciesin.columbia.edu/) under Data.
Data Authorship Policy
We consider a data set to be a first-class research product. As such, the author list should include anyone who contributed substantially to the data collection, processing, and analysis. The data author list may not necessarily be the same as a related journal publication. A person who made a minimal contribution to the data, or who contributed only to a paper that used or analyzed the data, should not be listed as an author. Gathering funds for the project, paying salaries, providing a conducive environment, or being the spokesperson are not activities that warrant authorship without a significant contribution to the scientific content of the data. SEDAC can provide an acknowledgement in the dataset documentation in these cases.
SEDAC aims to publish each dataset within four (4) months of receipt of the complete data package. This allows one month for quality review, one month for documentation, one month for internal review, and one month for provider review. If there are problems with the dataset format or questions about data provenance, methodology, or other issues, we will contact you by email. A delayed response to these questions will delay the release of your data. If you intend for your data to be released in conjunction with the publication of a paper, please submit your data to SEDAC when you submit your paper and keep us informed via email as the paper passes through review at the journal. Additional details on SEDAC data curation.
As described above, there are two steps to the data submission process. In the first step we request basic information on the data (e.g., nature of the data set, purpose for creating the data set, spatial and temporal resolution, file sizes, etc.) that will help us to evaluate suitability for SEDAC archiving and dissemination. If the data are deemed appropriate, in a second step we will request data files and may require additional information for review by the SEDAC User Working Group, a group that provides ongoing advice and guidance regarding SEDAC activities and plans. If you desire to submit a candidate data set, please click on the link below.
Files expected in a dataset submission:
- Data files
- Supplemental files (including metadata and reports)
- Code (if applicable)
- Published paper or manuscript draft (if applicable)