Archive of Census Related Products (ACRP)
Facebook Twitter Delicious DiggPublic Use Microdata Sample Areas (PUMA) Boundary Files, v1 (1990)
GENERATION of 5% and 1% PUMA BOUNDARIES
INTRODUCTION
For some time now, actually since the early 90's, the absence of an authoritative boundary layer for the geographies associated with the Public Use Micro Sample (PUMS) data files has been a problem. The PUMS data files represent the decennial "long-form" questionaire and are frequently used for national research. However, mapping these data or being able to compare them to other data sources has been tedious to say the least. From the very onset of our design to build a geographic correspondence engine(Geocorr), we explored the option to generate PUMA boundary layers. This engine provides the ability to query the relationship of geographic layers, or a combination of layers, to each other. It primarily performs this function by accessing the Master Area Block Level Equivalency (MABLE) database utilitzed by Geocorr. This database is essentially a collection of census block records identifying the geographic relationship of each census block to other geographic layers. From this database equivalency files, or correspondence files, can be created for any source to any target geographic layer selected. The generation of the areas representing Public Use Micro Sample units (PUMAs) started with the addition of the 5% and 1% PUMA geographies to the MABLE database. In effect, the equivalency files of any PUMA type to all geographies listed in the source/target geography selection windows of Geocorr are now provided online. Simply load the Geocorr URL and follow the "Samples" and "PUMA Matrixes" links; or create your own.
The general idea of this project is then to create these PUMA boundaries and post them on the "Archive of Census Related Products" (ACRP). Once they are available, we will attempt to aggregate census summary data and create standard extract files pre-linked to these boundaries. These files consist of about 225 frequently asked for variables that describe an area in demographic and economic terms. For more information consult the supporting documentation on the ACRP and the "Basic Tables" reports at the Urban Information Center (UIC). There are also cross tabulation engines accessible via the internet (see "Information Access" below) which provide the ability to generate custom tabulations of the PUMS microdata. Any tabulations generated by state for PUMA areas can also be linked to these boundaries for mapping purposes.
BUILDING THE EQUIVALENCY FILES
The only geographic areas identified on the 1990 PUMS files are states and PUMAs (and some metropolitan areas). PUMS data files are released in three samples (representing the equivalent size of the total population): 5%, 1% and 3%. On the 5% sample ("A" Sample; we'll refer to its geography as APUMA), every effort was made to keep meaningful socioeconomic or planning areas together. On the 1% sample ("B" Sample; we'll refer to its geography as BPUMA), every effort was made to separate metropolitan areas from non-metropolitan areas. In both geographies PUMAs may contain noncontiguous parts to meet the minimum population requirements. In addition, BPUMAs may span state lines, whereas, APUMAs are always bounded by state lines. The data records for the BPUMAs which span state lines carry as state identification, the code "99". In conjunction with this discussion: the "elderly" PUMS, 3% sample ("E" Sample; we'll refer to its geography as EPUMA), has the same geography as the 5% sample.
PUMA boundaries were proposed by state or local officials within each state, with final approval by the Census Bureau. Boundaries of PUMA areas had to be defined in terms of counties, places, county subdivisions or census tracts. In the large majority of cases, PUMAs consist of one or more counties. In larger metro counties, they are frequently broken down along the smaller geographic area lines. A strict guideline for defining a PUMA is that it had to have a minimum population of 100,000 persons (75,000 in New England) during the 1990 census. The Census Bureau has distributed several products in an attempt to define the boundaries of these entities, none of which are complete; and which in many cases, obscure the fairly simple nature of the PUMA assignment, especially in metropolitan areas. From the "thirdgeneration" equivalency file we received (from Carmen Campbell at theCensus Bureau), known as"pumgef"; an attempt was made to assign everycensus block to a PUMA. Establishing such relationships would allow forthe creation of any equivalency files using the census blocks as atomicunits. Approximately 1,300 census blocks remained unassigned (down from70,000 in the "second generation" file). Some detective work was requiredto fill in the holes.Once the census block to APUMA/BPUMA relationships were established inthe MABLE database, Geocorr was invoked to generate a nationalequivalency file. The process of merging the geography of census blocksinto APUMA and BPUMA coverages was deemed too large of a project forthe resources available, so a coarser geography was selected. Thegeography selected was census tracts; the "140" census tracts withinthe summary level designations of the Summary Tape Files (STF). Level"140" tracts may be split by PUMA geography. An analysis of a Geocorrgenerated national equivalency file, weighted by land area, revealed:
APUMA
of 60,897 tracts, 2,736 were split by PUMAs (4.5%)
the mean overlap for all tracts is 99.37%
of the 2,736 split tracts, the mean overlap was 86.08%
the range of overlap spans from 37% to 99.9%
of 51 states, 20 contained no split tracts at all
BPUMA
of 60,897 tracts, 3,521 were split by PUMAs (5.8%)
the mean overlap for all tracts was 99.16%
of 3,521 split tracts, the mean overlap was 85.53%
the range of overlap spans from 37% to 99.9%
of 51 states, 14 contained no split tracts at all
These statistics were deemed robust enough, considering the landarea overlap between census tract and PUMA geography, to proceed withmerging some 60,000 census tracts into coarser levels ofgeographies. In effect, the PUMA boundaries were "rounded off" tocensus tract. For the 1990 PUMS files, there are a total of 1,726APUMAs and 1,760 BPUMAs (including unique "99" areas).
MERGING THE GEOGRAPHIES
The census tract boundaries were retrieved from the publicly accessibleACRP ftp archive. The equivalency files obtained through Geocorr wereprocessed one additional step to resolve the assignment of splittracts. The arbitrary rule followed was to assign split census tractsto the PUMA with the largest land area overlap. This ensured that acensus tract was always uniquely assigned within PUMA type.Atlas*GIS desktop software was used for manipulating the geographiesand the equivalency files. Several problems were manually resolvedusing Atlas*GIS, visual verification, and the set of PUMA maps providedas Appendix G of the Public Use Micro Sample technical documentation.These maps, although helpful, do not show any detail for PUMAs in urbanareas -- in essence they are really "3-digit PUMA" maps.The PUMA identification we adopted is composed of a 5-digit charactercode (with trailing and leading zeroes) prefixed with the 2-digitcharacter FIPS code for each state. Generally when the last two digitsare not "00"'s it represents a county that has been split intosubareas.
"Holes" in PUMA boundary.
Usually these are fairly large holes which are census tracts that areonly water areas which did not get a PUMA code assigned. In all cases,the census tracts constituting these holes were queried and the islandsdeleted. Certain holes did not have census tract boundaries underneath.This is a result of a failure in the chaining process used to createthe ACRP. In such cases, the failure of the tracts (within county) wereverified (see documentation on ACRP) and, in using the equivalency fileand the hardcopy maps, the geography was corrected.
"Holes" on PUMA boundary.
Holes are a fairly infrequent occurence and resolved using the approachmentioned above. Noteworthy in these cases is that tracts along waterareas are often coded differently. For example, the Carolinas show bothinclusion and exclusion of the barrier islands; and in the "mitten" ofMichigan there are examples of tracts which contain small areas of landand large areas of water. Overall though, the coast and shore linesare pretty well defined.
Number of vertices per PUMA.
The objective was to make these boundary layers compatible withAtlas*GIS for DOS. For very large PUMAs the maximum vertexes per PUMAcan be exceeded (about 4,000 points). In such cases, the tracts whichcompose the target PUMA were first thinned (points were removed if notsignificant within 0.1 miles; then unioned. This means that the targetPUMA boundary does not exactly match the surrounding PUMAs, if viewed athigh resolution scales. Rather than lose information, it was decide tocreate these polygons this way. For users who want to bypass theserestrictions, the boundaries of the surrounding polygons define the exactoutline of the target PUMA. As an example, view the large PUMA in the cityof Denver, Colorado.
Discontinuous areas.
PUMA's are not compact contiguous areas. They are frequently "gerry-mandered" to make them fall within the Bureau's guidelines for size andhomogeneity of population. As a result, it is not uncommon to findPUMA's that are made of of multiple parts ("islands") and/or to containembedded PUMA's ("lakes").Such oddities could also be caused by errors in the geographicequivalency file. We are confident that we have researched thoseequivalencies and eliminated such errors, but we are very interestedin feedback from any users who think they have spotted an "island" or a"lake" that was not part of the definition of a PUMA.
Missing PUMAs.
The extracted number of BPUMAs from the PUMS data matches the totalnumber of unique BPUMAs in the geographic file. The data file BPUMAsprefixed with "99" must however be carefully mapped to all statesmatching only the 5-digit BPUMA. For a list of these see the list atthe end of this file. For each BPUMA, not only are all statescontributing to such BPUMAs listed but also the counties involved.For the APUMAs, one appears to be missing in the geographic files. ThisAPUMA (3300601 in NH) did not appear on our pumgef equivalency file.Consult the detailed listing at the end of this file.
Note: The missing apuma was fixed and added to all boundary files posted on 1.17.97. The 00602 tracts carried the 00601 codes.
RESULTS
The files are online on the FTP Archive mentioned above. The path is:
ftp://ftp.ciesin.columbia.edu/pub/census/usa/pums
Boundary files are presented in 3 widely used formats:
Atlas*GIS "agf"system format, ArcView "shp" shapefile format, and the "BNA"ascii-export format used by a number of geographic import/exportutility packages. Each of these formats has its own subdirectory.
Readme files are present to guide you along. All the information in allthe directories for the PUMAs (agf/ shp/ and bna/) are identical, justpresented differently. The "master file", containing all apumas, bpumasand state outlines is the "agf/agfpumas.zip" file. A brief outline(using URLs):
PUMS Data dictionaries: ftp://ftp.ciesin.org/pub/census/usa/pums/dd/
PUMS-(P)MSA Equivalency files (Source: Bureau of the Census): ftp://ftp.ciesin.org/pub/census/usa/pums/eq/
GIFS depicting APUMA and BPUMA geography: ftp://ftp.ciesin.org/pub/census/usa/pums/gif/
...note the shifting APUMA and BPUMA codes
...note water areas in the insets
...note the "99" across states lines BPUMAs
...note the differences (simple Chicago, complex Denver/Los Angeles)
Atlas*GIS (agf) files: ftp://ftp.ciesin.org/pub/census/usa/pums/agf/
agfpumas.zip APUMAs, BPUMAs, and State outlines
apumaagf.zip APUMAs only
bpumaagf.zip BPUMAs only
ArcView (shp) shapefiles: ftp://ftp.ciesin.org/pub/census/usa/pums/shp/
apumashp.zip APUMAs only
bpumashp.zip BPUMAs only
BNA boundary files: ftp://ftp.ciesin.org/pub/census/usa/pums/bna/
apumabna.zip APUMAs only
bpumabna.zip BPUMAs only
s00.zip States only
apbystate/ directory of 51 state BNA files for APUMA
bpbystate/ directory of 51 state BNA files for BPUMA
ap00ctrd.zip APUMA centroids, support file (Source=UIC)
bp00ctrd.zip BPUMA centroids, support file (Source=UIC)
INFORMATION ACCESS
The geographic correspondence engine, known as "Geocorr", is publicly available at:
http://www.oseda.missouri.edu/plue/geocorr/
Supporting URLs:
Bureau of the Census
http://www.census.gov
Office of Socioeconomic Data and Analysis (OSEDA)
and the Urban Information Center (UIC)
http://www.oseda.missouri.edu/usinfo.html
http://www.oseda.missouri.edu/uic/
Socioeconomic Data and Applications Center
http://sedac.ciesin.columbia.edu
CONTACTS
For all general questions and problems contact SEDAC User Services at
http://sedac.uservoice.com/knowledgebase
----------------------------------------------------------------------------
Atlas*GIS and Arc/Info are registered tradmarks of Environmental Systems Research Institute (ESRI), Redlands, CA.
----------------------------------------------------------------------------
APPENDIX
-----------------------------------------------------------------------------
Explanation: This section will show any discrepancies between the datafiles, the 'pumgef' equivalency file, and the polygon identificationswe ended up with after merging. It also details the areas for the "99"BPUMAs which can span state lines.
extract="Explore/Extract"
programs written by Al Anderson that access the 5% and 1% PUMS files.
equiv="Geocorr equivalency file"
programs written by John Blodgett based on 'pumgef' equivalency file, obtained via Carmen Campbell from the Census Bureau.
polid="polygon ids"
programs written by Henk Meij merging census tracts (rounding off)
geography to puma geography (bna files with polid identification).
5% APUMA RESULTS
Summary: Found one APUMA which exists in the PUMS data files but not in
'pumgef' equivalency file (thus it is not in bna files).
APUMA ssAPUMA ss PERSONS COUNTIES
00601 3300601 NH 142,537 unknown
only exists in extract (not in equiv and not in polid)
Note: Fixed and new files posted on 1.17.97.
1% BPUMA RESULTS
Summary: all polygons matched. Since the data files use as state code"99" care must be taken matching any data records to geographicrecords. In the latter, the polygon identification carries the actualstate FIPS code not the "99" designations. Here is a list of them.We did find two phanthom BPUMAs whose existence we could not verify.Those are the ones labeled "unknown" below.
BPUMA ssBPUMA ssPUMS ss PERSONS COUNTIES
----- ------- ------- -- ------- --------
00600 ??00600 9900600 ?? 11,528 unknown
00700 ??00700 9900700 ?? 40,810 unknown
85800 1085800 9985800 DL 549,777 10003
2485800 ML 24015
3485800 NJ 34033
87900 2787900 9987900 MN 105,339 27025 27059
5587900 WI 55109
89500 1889500 9989500 IN 209,728 18029
3989500 OH 39061
92500 0592500 9992500 AR 150,726 05035
2892500 MS 28033
4792500 TN 47167
93500 3493500 9993500 NY 144,684 34041
4293500 PA 42025
94000 1994000 9994000 IA 95,214 19155
3194000 NE 31055 31177
94500 4794500 9994500 TN 127,635 47073
5194500 VA 51169 51191 51520
95900 1795900 9995900 IL 165,303 17073 17161
1995900 IA 19163
96000 1796000 9996000 IL 175,677 17161
1996000 IA 19163
96300 2196300 9996300 KY 305,022 21019 21043 21089
3996300 OH 39087
5496300 WV 54011 54099
96600 1896600 9996600 IN 113,354 18129 18173
2196600 KY 21101
97200 0197200 9997200 AL 225,530 01113
1397200 GA 13053 13215
97500 2797500 9997500 MN 227,667 27137
5597500 WI 55031
97800 2397800 9997800 MN 207,507 23031
3397800 NH 33015
98100 0598100 9998100 AR 171,652 05033 05131
4098100 OK 40135
98300 2198300 9998300 KY 161,865 21047
4798300 TN 47125
98500 3998500 9998500 OH 147,134 39013
5498500 WV 54051 54069
98700 2798700 9998700 MN 142,890 27027
3898700 ND 38017
98900 3998900 9998900 OH 141,306 39167
5498900 WV 54107
99100 3999100 9999100 OH 136,712 39081
5499100 WV 54009 54029
99300 0599300 9999300 AR 114,174 05091
4899300 TX 48037
99500 1999500 9999500 IA 112,266 19193
3199500 NE 31043
99700 2499700 9999700 ML 122,884 24001 24023
5499700 WV 54057
----------------------------------------------------------------------------