February 2023 Annual Mean PM2.5 Components Trace Elements (TEs) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., v1 (2000-2019) PURPOSE To provide annual PM2.5 component trace elements concentration data for the contiguous U.S. at resolutions of 50m in urban areas and 1km in non-urban areas for public health research to estimate effects on human health, and for other related research. DESCRIPTION The Annual Mean PM2.5 Components Trace Elements (TEs) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., 2000-2019, v1 data set contains annual predictions of trace elements concentrations at a hyper resolution (50m x 50m grid cells) in urban areas and a high resolution (1km x 1km grid cells) in non-urban areas, for the years 2000 to 2019. Particulate matter with an aerodynamic diameter of less than 2.5 µm (PM2.5) is a human silent killer of millions worldwide, and contains many trace elements (TEs). Understanding the relative toxicity is largely limited by the lack of data. In this project, ensembles of machine learning models were used to generate approximately 163 billion predictions estimating annual mean PM2.5 TEs, namely Bromine (Br), Calcium (Ca), Copper (Cu), Iron (Fe), Potassium (K), Nickel (Ni), Lead (Pb), Silicon (Si), Vanadium (V), and Zinc (Zn) across 3,535 urban areas at 50m spatial resolution, and in non-urban areas at 1km spatial resolution, for 2000 to 2019. The results highlight substantial intra-urban and inter-urban variations, shrinkages, stagnations, or expansions of hotspots, and trends across the 20-year period. The monitored data from approximately 600 locations were integrated with more than 160 predictors, such as time and location, satellite observations, composite predictors, meteorological covariates, and many novel land use variables using several machine learning algorithms and ensemble methods. The monitored data were divided into training (70%) and test (30%) sets. Multiple machine-learning models, including Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGB), Cubist, and K-Nearest Neighbors (KNN), were developed covering 3,535 urban areas where the majority (approximately 80%) of the U.S. population lives at 50m spatial resolution, and non-urban areas at 1km spatial resolution. Their predictions were then ensembled using either a Generalized Additive Model (GAM) Ensemble Geographically-Weighted-Averaging (GAM-ENWA), or Super-Learners (SLs: RF-SL, GBM-SL, XGB-SL, Cubist-SL, KNN-SL, or Support Vector Machines [SVM-SL]). The overall best model R-squared values for the test sets ranged from 0.79 for Copper using RF-SL to 0.88 for Zinc using GAM-ENWA in non-urban areas. In urban areas, the R-squared model values ranged from 0.80 for Copper using SVM-SL to 0.88 for Zinc using GAM-ENWA. The Coordinate Reference System (CRS) used in the predictions is the World Geodetic System 1984 (WGS84) and the units for the PM2.5 Components TEs are ng/m^3. ACCESSING THE DATA The data may be downloaded at https://sedac.ciesin.columbia.edu/data/set/aqdh-pm2-5-component-trace-elements-50m-1km-contiguous-us-2000-2019/data-download DATA FORMAT This archive contains data in RDS (tabular) format, a file format native to the R programming language, but can also be opened by other languages such as Python. The data files are compressed zipfiles. Downloaded files need to be uncompressed in a single folder using either WinZip (Windows file compression utility) or similar application. Users should expect an increase in the size of downloaded data after decompression. DATA UNITS The unit for PM2.5 Components TEs is nanograms (one-billionth of a gram) per cubic meter air (ng/m^3). SPATIAL EXTENT Contiguous United States, 50m in urban areas and 1km in non-urban areas. DISCLAIMER CIESIN follows procedures designed to ensure that data disseminated by CIESIN are of reasonable quality. If, despite these procedures, users encounter apparent errors or misstatements in the data, they should contact SEDAC User Services at ciesin.info@ciesin.columbia.edu. Neither CIESIN nor NASA verifies or guarantees the accuracy, reliability, or completeness of any data provided. CIESIN provides this data without warranty of any kind whatsoever, either expressed or implied. CIESIN shall not be liable for incidental, consequential, or special damages arising out of the use of any data provided by CIESIN. USE CONSTRAINTS This work is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0). Users are free to use, copy, distribute, transmit, and adapt the work for commercial and non-commercial purposes, without restriction, as long as clear attribution of the source is provided. CITATION(S) Data Set: Amini, H.1, 2*, M. Danesh-Yazdi1,3, Q. Di4, W. Requia5, Y. Wei1, Y. AbuAwad6, L. Shi7, M. Franklin8, C.-M. Kang1, J. M. Wolfson1, P. James9,1, R. Habre10, Q. Zhu7, J. S. Apte11,12, Z. J. Andersen2, X. Xing13, C. Hultquist13,14, I. Kloog15, F. Dominici1,16, P. Koutrakis1, and J. Schwartz1. 2023. Annual Mean PM2.5 Components Trace Elements (TEs) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., 2000-2019, v1. Palisades, New York: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/1x94-mv38. Accessed DAY MONTH YEAR. 1 Harvard T.H. Chan School of Public Health, Boston, MA, United States 2 Department of Public Health, University of Copenhagen, Copenhagen, Denmark 3 Stony Brook University, Stony Brook, NY, United States 4 Vanke School of Public Health, Tsinghua University, Beijing, China 5 Fundação Getúlio Vargas, Brasilia, Brazil 6 PERFORM Centre, Concordia University, Montreal, Quebec, Canada 7 Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, United States 8 Department of Statistical Sciences, University of Toronto, Toronto, Canada 9 Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States 10 Keck School of Medicine, University of Southern California, Los Angeles, CA, United States 11 Department of Civil and Environmental Engineering, University of California, Berkeley, CA, United States 12 School of Public Health, University of California, Berkeley, CA, United States 13 Center for International Earth Science Information Network (CIESIN), Columbia University, Palisades, NY, United States 14 School of Earth and Environment, University of Canterbury, Christchurch, New Zealand 15 Icahn School of Medicine at Mount Sinai, NY, United States 16 Harvard Data Science Initiative, Cambridge, MA, United States Scientific Publication: Amini, H., M. Danesh-Yazdi, Q. Di, W. Requia, Y. Wei, Y. AbuAwad, L. Shi, M. Franklin, C.-M. Kang, J. M. Wolfson, P. James, R. Habre, Q. Zhu, J. S. Apte, Z. J. Andersen, X. Xing, C. Hultquist, I. Kloog, F. Dominici, P. Koutrakis, and J. Schwartz. 2022. Hyperlocal US PM2.5 Trace Elements Super-learned. Research Square. https://doi.org/10.21203/rs.3.rs-2052258/v1. ACKNOWLEDGEMENTS This work was supported by the Cyprus Harvard Endowment Program for the Environment and Public Health, Novo Nordisk Foundation Challenge Programme grant NNF17OC0027812, U.S. Environmental Protection Agency (EPA) grants RD-8358720 and RD-835872, National Institutes of Health (NIH) grants R01AG074357, R01 HL150119, R01MD012769, R01 ES028033, 1R01AG060232-01A1, 1R01ES030616, 1R01AG066793-01R01, R01ES028033-S1, P30 ES000002, and R01ES032418-01, and the Fernholz Foundation. The contents are solely the responsibility of the grantees and do not necessarily represent the official views of the U.S. EPA. Further, the U.S. EPA does not endorse the purchase of any commercial products or services mentioned in the data or documents. The authors also thank Gregory Yetman (CIESIN) for his help on data coversion process.