NASA Website Socioeconomic Data and Applications Center (sedac)
A Data Center in NASA's Earth Observing System Data and Information System (EOSDIS) — Hosted by CIESIN at Columbia University
Data
  • Data Sets
  • Maps
  • Web Pages
  • home
  • Data
    • · Data Sets
    • · Data Collections
    • · Featured Data Uses
    • · Data Citations
    • · Citations Database
    • · Data Submission
  • Maps
    • · Map Gallery
    • · Map Viewer
    • · Map Services
    • · Mapping Tools
  • Themes
    • · Agriculture
    • · Climate
    • · Conservation
    • · Governance
    • · Hazards
    • · Health
    • · Infrastructure
    • · Land Use
    • · Marine and Coastal
    • · Population
    • · Poverty
    • · Remote Sensing
    • · Sustainability
    • · Urban
    • · Water
  • Resources
    • · Guides
    • · Multimedia
    • · Networks
    • · News
    • · Publications
    • · Related Sites
    • · Remote Sensing
    • · Tools
  • Social Media
    • · Twitter
    • · FaceBook
    • · YouTube
    • · Flickr
    • · Blog Posts
    • · Communities
  • About
    • · About SEDAC
    • · User Working Group
    • · Privacy
    • · User Registration
  • Help
 

Archive of Census Related Products (ACRP)

Follow Us: Twitter Follow Us on Facebook YouTube Flickr   |   Share: Twitter Facebook
  • Collection Overview
  • Data Sets (11)
    • Public Use Microdata Sample Areas (PUMA) Boundary Files, v1 (1990)
    • Boundary Files, v1 (1980 – 1992)
    • Census Block Statistics, v1 (1990)
    • Enhanced Migration Files, v1 (1985 – 1990)
    • Public Use Microdata Samples (PUMS), v1 (1940 – 1990)
    • SAS Transport Files, v1 (1990)
    • SAS Transport Files, v1 (1980)
    • Standard Extract Files, v1 (1990)
    • Street Intersections, v1 (1990)
    • Summary Tape File (STF1B), v1 (1990)
    • ZIP Code Equivalency Files, v1 (1990)
  • Citations
  • FAQs

Public Use Microdata Sample Areas (PUMA) Boundary Files, v1 (1990)

  • Set Overview
  • Data Download
  • Documentation
  • Metadata

Generation of 5% and 1% PUMA Boundaries 

Introduction

For some time now, actually since the early 90's, the absence of an authoritative boundary layer for the geographies associated with the Public Use Micro Sample (PUMS) data files has been a problem. The PUMS data files represent the decennial "long-form" questionaire and are frequently used for national research.  However, mapping these data or being able to compare them to other data sources has been tedious to say the least. From the very onset of our design to build a geographic correspondence engine (Geocorr), we explored the option to generate PUMA boundary layers.  This engine provides the ability to query the relationship of geographic layers, or a combination of layers, to each other. It primarily performs this function by accessing the Master Area Block Level Equivalency (MABLE) database utilitzed by Geocorr. This database is essentially a collection of census block records identifying the geographic relationship of each census block to other geographic layers. From this database equivalency files, or correspondence files, can be created for any source to any target geographic layer selected.  The generation of the areas representing Public Use Micro Sample units (PUMAs) started with the addition of the 5% and 1% PUMA geographies to the MABLE database. In effect, the equivalency files of any PUMA type to all geographies listed in the source/target geography selection windows of Geocorr are now provided online.  Simply load the Geocorr URL and follow the "Samples" and "PUMA Matrixes" links; or create your own. 

The general idea of this project is then to create these PUMA boundaries and post them on the "Archive of Census Related Products" (ACRP). Once they are available, we will attempt to aggregate census summary data and create standard extract files pre-linked to these boundaries. These files consist of about 225 frequently asked for variables that describe an area in demographic and economic terms. For more information consult the supporting documentation on the ACRP and the "Basic Tables" reports at the Urban Information Center (UIC). There are also cross tabulation engines accessible via the internet (see "Information Access" below) which provide the ability to generate custom tabulations of the PUMS microdata. Any tabulations generated by state for PUMA areas can also be linked to these boundaries for mapping purposes.

Building The Equivalency Files

The only geographic areas identified on the 1990 PUMS files are states and PUMAs (and some metropolitan areas). PUMS data files are released in three samples (representing the equivalent size of the total population): 5%, 1% and 3%.  On the 5% sample ("A" Sample; we'll refer to its geography as APUMA), every effort was made to keep meaningful socioeconomic or planning areas together. On the 1% sample ("B" Sample; we'll refer to its geography as BPUMA), every effort was made to separate metropolitan areas from non-metropolitan areas. In both geographies PUMAs may contain noncontiguous parts to meet the minimum population requirements. In addition, BPUMAs may span state lines, whereas, APUMAs are always bounded by state lines. The data records for the BPUMAs which span state lines carry as state identification, the code "99". In conjunction with this discussion: the "elderly" PUMS, 3% sample ("E" Sample; we'll refer to its geography as EPUMA), has the same geography as the 5% sample.

PUMA boundaries were proposed by state or local officials within each state, with final approval by the Census Bureau. Boundaries of PUMA areas had to be defined in terms of counties, places, county subdivisions or census tracts. In the large majority of cases, PUMAs consist of one or more counties. In larger metro counties, they are frequently broken down along the smaller geographic area lines. A strict guideline for defining a PUMA is that it had to have a minimum population of 100,000 persons (75,000 in New England) during the 1990 census. The Census Bureau has distributed several products in an attempt to define the boundaries of these entities, none of which are complete; and which in many cases, obscure the fairly simple nature of the PUMA assignment, especially in metropolitan areas. From the "third generation" equivalency file we received (from Carmen Campbell at the Census Bureau), known as"pumgef"; an attempt was made to assign every census block to a PUMA.  Establishing such relationships would allow for the creation of any equivalency files using the census blocks as atomic units.  Approximately 1,300 census blocks remained unassigned (down from 70,000 in the "second generation" file). Some detective work was required to fill in the holes.  Once the census block to APUMA/BPUMA relationships were established in the MABLE database, Geocorr was invoked to generate a national equivalency file. The process of merging the geography of census blocks into APUMA and BPUMA coverages was deemed too large of a project for the resources available, so a coarser geography was selected. The geography selected was census tracts; the "140" census tracts within the summary level designations of the Summary Tape Files (STF).  Level"140" tracts may be split by PUMA geography.   An analysis of a Geocorr generated national equivalency file, weighted by land area, revealed:    

APUMA

    of 60,897 tracts, 2,736 were split by PUMAs (4.5%)

        the mean overlap for all tracts is 99.37%

    of the 2,736 split tracts, the mean overlap was 86.08%

             the range of overlap spans from 37% to 99.9%

    of 51 states, 20 contained no split tracts at all

BPUMA

     of 60,897 tracts, 3,521 were split by PUMAs (5.8%)

        the mean overlap for all tracts was 99.16%

     of 3,521 split tracts, the mean overlap was 85.53%

             the range of overlap spans from 37% to 99.9%

     of 51 states, 14 contained no split tracts at all

These statistics were deemed robust enough, considering the land area overlap between census tract and PUMA geography, to proceed with merging some 60,000 census tracts into coarser levels of geographies.  In effect, the PUMA boundaries were "rounded off" to census tract.  For the 1990 PUMS files, there are a total of 1,726 APUMAs and 1,760 BPUMAs (including unique "99" areas).

Merging the Geographies

The census tract boundaries were retrieved from the publicly accessible ACRP data archive. The equivalency files obtained through Geocorr were processed one additional step to resolve the assignment of split tracts.  The arbitrary rule followed was to assign split census tracts to the PUMA with the largest land area overlap. This ensured that a census tract was always uniquely assigned within PUMA type.  Atlas*GIS desktop software was used for manipulating the geographies and the equivalency files. Several problems were manually resolved using Atlas*GIS, visual verification, and the set of PUMA maps provided as Appendix G of the Public Use Micro Sample technical documentation.  These maps, although helpful, do not show any detail for PUMAs in urban areas -- in essence they are really "3-digit PUMA" maps.  The PUMA identification we adopted is composed of a 5-digit character code (with trailing and leading zeroes) prefixed with the 2-digit character FIPS code for each state. Generally when the last two digits are not "00"'s it represents a county that has been split into subareas.

"Holes" in PUMA boundary

Usually these are fairly large holes which are census tracts that are only water areas which did not get a PUMA code assigned. In all cases, the census tracts constituting these holes were queried and the islands deleted. Certain holes did not have census tract boundaries underneath.  This is a result of a failure in the chaining process used to create the ACRP. In such cases, the failure of the tracts (within county) were verified (see documentation on ACRP) and, in using the equivalency file and the hardcopy maps, the geography was corrected.

"Holes" on PUMA boundary

Holes are a fairly infrequent occurence and resolved using the approach mentioned above. Noteworthy in these cases is that tracts along water areas are often coded differently. For example, the Carolinas show both inclusion and exclusion of the barrier islands; and in the "mitten" of Michigan there are examples of tracts which contain small areas of land and large areas of water.  Overall though, the coast and shore lines are pretty well defined.

Number of vertices per PUMA

The objective was to make these boundary layers compatible with Atlas*GIS for DOS. For very large PUMAs the maximum vertexes per PUMA can be exceeded (about 4,000 points). In such cases, the tracts which compose the target PUMA were first thinned (points were removed if not significant within 0.1 miles; then unioned.  This means that the target PUMA boundary does not exactly match the surrounding PUMAs, if viewed at high resolution scales. Rather than lose information, it was decided to create these polygons this way.  For users who want to bypass these restrictions, the boundaries of the surrounding polygons define the exact outline of the target PUMA. As an example, view the large PUMA in the city of Denver, Colorado.

Discontinuous areas

PUMA's are not compact contiguous areas.  They are frequently "gerry-mandered" to make them fall within the Bureau's guidelines for size and homogeneity of population.  As a result, it is not uncommon to find PUMA's that are made of of multiple parts ("islands") and/or to contain embedded PUMA's ("lakes").  Such oddities could also be caused by errors in the geographic equivalency file.  We are confident that we have researched those equivalencies and eliminated such errors, but we are very interested in feedback from any users who think they have spotted an "island" or a"lake" that was not part of the definition of a PUMA.

Missing PUMAs

The extracted number of BPUMAs from the PUMS data matches the total number of unique BPUMAs in the geographic file. The data file BPUMAs prefixed with "99" must however be carefully mapped to all states matching only the 5-digit BPUMA. For a list of these see the list at the end of this file. For each BPUMA, not only are all states contributing to such BPUMAs listed but also the counties involved.  For the APUMAs, one appears to be missing in the geographic files. This APUMA (3300601 in NH) did not appear on our pumgef equivalency file.  Consult the detailed listing at the end of this file.

Note: The missing apuma was fixed and added to all boundary files posted on 1.17.97.  The 00602 tracts carried the 00601 codes.

Supporting Information

Readme files are present to guide you along. All the information in all the directories for the PUMAs (agf/ shp/ and bna/) are identical, just presented differently. The "master file", containing all apumas, bpumas and state outlines is the "agf/agfpumas.zip" file.  A brief outline (using URLs):

PUMS Data Dictionaries and PUMS-(P)MSA Equivalency files (Source: Bureau of the Census).

Information Access

The geographic correspondence engine, known as "Geocorr", is publicly available at:

     http://mcdc2.missouri.edu/applications/uexplore.shtml#geography

Supporting URLs:

       Bureau of the Census

       http://www.census.gov

       Office of Socioeconomic Data and Analysis (OSEDA)

       http://www.oseda.missouri.edu/uic/

       Socioeconomic Data and Applications Center

       http://sedac.ciesin.columbia.edu

Help

For all general questions and problems contact SEDAC User Services at http://sedac.uservoice.com/knowledgebase       

----------------------------------------------------------------------------

Atlas*GIS and ArcInfo are registered tradmarks of Environmental Systems Research Institute (ESRI), Redlands, CA.

----------------------------------------------------------------------------

                               APPENDIX

-----------------------------------------------------------------------------

Explanation:  This section will show any discrepancies between the datafiles, the 'pumgef' equivalency file, and the polygon identifications we ended up with after merging. It also details the areas for the "99"BPUMAs which can span state lines.

extract="Explore/Extract"

programs written by Al Anderson that access the 5% and 1% PUMS files.

equiv="Geocorr equivalency file"

programs written by John Blodgett based on 'pumgef' equivalency file, obtained via Carmen Campbell from the Census Bureau.

polid="polygon ids"

programs written by Henk Meij merging census tracts (rounding off)

geography to PUMA geography (bna files with polid identification).

5% APUMA RESULTS

Summary: Found one APUMA which exists in the PUMS data files but not in

              'pumgef' equivalency file (thus it is not in bna files).

APUMA ssAPUMA ss PERSONS COUNTIES

00601 3300601 NH 142,537 unknown

             only exists in extract (not in equiv and not in polid)

             Note: Fixed and new files posted on 1.17.97.

1% BPUMA RESULTS

Summary: all polygons matched. Since the data files use as state code "99" care must be taken matching any data records to geographic records.  In the latter, the polygon identification carries the actual state FIPS code not the "99" designations. Here is a list of them. We did find two phantom BPUMAs whose existence we could not verify. Those are the ones labeled "unknown" below.

BPUMA ssBPUMA ssPUMS  ss PERSONS COUNTIES

----- ------- ------- -- ------- --------

00600 ??00600 9900600 ??  11,528 unknown

00700 ??00700 9900700 ??  40,810 unknown

85800 1085800 9985800 DL 549,777 10003

      2485800         ML         24015

      3485800         NJ         34033

87900 2787900 9987900 MN 105,339 27025 27059

      5587900         WI         55109

89500 1889500 9989500 IN 209,728 18029

      3989500         OH         39061

92500 0592500 9992500 AR 150,726 05035

      2892500         MS         28033

      4792500         TN         47167

93500 3493500 9993500 NY 144,684 34041

      4293500         PA         42025

94000 1994000 9994000 IA  95,214 19155

      3194000         NE         31055 31177

94500 4794500 9994500 TN 127,635 47073

      5194500         VA         51169 51191 51520

95900 1795900 9995900 IL 165,303 17073 17161

      1995900         IA         19163

96000 1796000 9996000 IL 175,677 17161

      1996000         IA         19163

96300 2196300 9996300 KY 305,022 21019 21043 21089

      3996300         OH         39087

      5496300         WV         54011 54099

96600 1896600 9996600 IN 113,354 18129 18173

      2196600         KY         21101

97200 0197200 9997200 AL 225,530 01113

      1397200         GA         13053 13215

97500 2797500 9997500 MN 227,667 27137

      5597500         WI         55031

97800 2397800 9997800 MN 207,507 23031

      3397800         NH         33015

98100 0598100 9998100 AR 171,652 05033 05131

      4098100         OK         40135

98300 2198300 9998300 KY 161,865 21047

      4798300         TN         47125

98500 3998500 9998500 OH 147,134 39013

      5498500         WV         54051 54069

98700 2798700 9998700 MN 142,890 27027

      3898700         ND         38017

98900 3998900 9998900 OH 141,306 39167

      5498900         WV         54107

99100 3999100 9999100 OH 136,712 39081

      5499100         WV         54009 54029

99300 0599300 9999300 AR 114,174 05091

      4899300         TX         48037

99500 1999500 9999500 IA 112,266 19193

      3199500         NE         31043

99700 2499700 9999700 ML 122,884 24001 24023

      5499700         WV         54057

Home | Data | Data Uses | Data Citations | Maps | Map Services

News | Tools | Guides | Publications | Blog Posts

About | Help | Privacy | User Registration

Copyright © 1997–2023. The Trustees of Columbia University in the City of New York.

WDS ICSU