SEDAC icon

DDViewer 3.0: Interactive Visualization of Demographic
Data Using Java Technology

Dr. Nathan Sovik
Consortium for International Earth Science Information Network
Socioeconomic Data and Applications Center
2250 Pierce Road
University Center, MI 48710

Presentation and paper prepared for the Conference on Scientific and Technical Data Exchange and Integration, sponsored by the U.S. National Committee for CODATA,
Bethesda, MD, December 15-17, 1997.



ABSTRACT:
DDViewer 3.0 is a Java-based interactive mapping application. Accessible through the World Wide Web, DDViewer creates a client-server link between the user's desktop and a demographic data server located at NASA's Socioeconomic Data and Applications Center (SEDAC). Using a map and listbox Graphical User Interface, the user may select a region of interest, any number of census-derived variables from the 225 available, and a map resolution (states to census tracts). After the data have loaded from the server, the user may recolor, zoom, or pan a map of any variable; create derivative variables; recode variables into as many as ten intervals; query the map to identify underlying values at any location; add titles and legend; retrieve a data listing; and calculate descriptive statistics. The database includes all 50 US states and the District of Columbia. DDViewer's performance has been optimized for on-line real-time access over the Internet. It's underlying technology is applicable to a range of data types and illustrates the potential for interactive visualization of interdisciplinary spatial data. DDViewer 3.0 may be accessed at the following URL: http://plue.sedac.ciesin.org/plue/ddviewer.


1.   Introduction

Over the past three years a variety of mapping applications have been developed for use over the Internet. Almost all of these applications communicate with Web browsers through an http server, sending maps to users in the form of flat image files such as Graphics Interchange (GIF) Files . DDViewer 3.0 was developed as a test-bed for networked Geographic Information Systems (GIS) using the Java programming language (Sun Microsystems, Palo Alto, CA). It uses Java client-server technology to retrieve demographic data over the Internet and create maps on the user's desktop. The client can be loaded through a Web browser on any Pentium-class PC or UNIX workstation. Users select a region of interest from a map and a set of variables from a listbox tool. Once the boundary and attribute data are fetched from the CIESIN server, all data processing takes place within the user's desktop client. Both input and output maps have interactive features.

Boundary data for DDViewer were derived from the US Bureau of the Census TIGER 1992 database. The demographic variables were derived from the US Bureau of the Census STF3A 1990 files. This national database is centrally stored and maintained on the server.

DDViewer 3.0 Features

2.   History of DDViewer

The original concept for DDViewer was developed by Dr. Hendrik Meij, of CIESIN. Meij's first prototype- Yet Another Very Experimental Map Server (YAVEMS) was a SAS-based Web mapper for the State of Michigan released in 1995 (SAS Inc., Cary, NC). This prototype was followed by a national 1990 census data mapper developed by Meij and Xiaoshi Xing and released as DDViewer 1.0 in 1996. The underlying national-level SAS database used in all versions of DDViewer to date originated in this version. User feedback and a desire at CIESIN to test Java as an Internet platform development language led to further refinements in the design of DDViewer. At this time, Yuannan Chen and Dr. Nathan Sovik joined the development team. Two versions of DDViewer 2.0 were then developed. One was an enhanced Common Gateway Interface (CGI) application which allowed multi-state regions to be mapped, the other was a Java development version with an input side Graphical User Interface (GUI) which connected to the server-side SAS mapping routines. These 2.0 versions were released in the early summer of 1997. Like the previous version they both produced GIF images for retrieval by the user. Version 3.0, a fully Java-based GUI for both input and output maps, was released in December of 1997. This version uses a completely different approach from the CGI versions. It makes both input and output maps fully interactive through its Java client-server interface.

3.   DDViewer 3.0 Design

The Java version of DDViewer is designed to provide users with greater access to the data underlying the demographic maps and to facilitate customization of the maps. In most cases the user also benefits from improved performance. The original version allowed users to select a region of interest within a state and to retrieve GIF map images and text files of data attributes and related statistics. DDViewer 1.0 performed well on the server side, but user comments pointed to a number of weaknesses.

First was the inconvenience of customizing the maps to fit the users' exact desires. Each time a small change was made to a map, for example, relocating title text, a new GIF image had to be produced and retrieved by the user. Each change required as long as 30 seconds to process and download, often longer. Repeated trial and error led to long and sometimes frustrating sessions before the user received the desired product. Another result of the server-side dependency is that the original version scaled poorly. The server was an ordinary UNIX workstation, and while it could handle a handful of user requests at once, higher loads slowed down the performance for all users, sometimes leading to significant processing delays. Another weakness was DDViewer 1.0's inability to link demographic data to specific location. That is, a colored choropleth GIF map might show where higher incomes were located in general, but it was not possible for the user to determine the exact value underlying each specific colored census polygon. GIF files only carry color-table information, not vector attributes. Java technology provided a solution to these problems.

The Java version of DDViewer uses a vector, rather than raster, representation of polygons. That is, instead of providing the user a with flat GIF file without attached attribute data, Java allows a topological data structure to be sent to the client. The topology allows map polygons to be painted on the user's screen, and also allows for a point-and-click interface which relates attribute data to each polygon. With Java, the user has full access to the underlying data and can completely change the appearance of maps on the desktop without re-submitting processing jobs to the server. There is an overhead penalty to pay for using vector data because these data are less compact than simple raster images. However, this price in data file size was compensated by greatly increased flexibility in the user interface. Also, in many cases the boundary data are retrieved only once. The user may select an area of interest through the map interface, then these same input map boundary data are used to produce the output demographic map. The boundary data may be re-used over and over to map different variables or to re-categorize or re-color the data in various ways. The more maps the user makes of the same region, the more efficient this data model becomes. Furthermore, the underlying data can always be accessed directly through the map interface. Most of the features listed in the introduction above were enabled or facilitated through the use of interactive Java client-server technology.

When the application is loaded from its home Web page, a client-server connection is established between the user's machine and the CIESIN/SEDAC server machine. Specifically, the Web server sends the base application Java class to the client host which in turn enables a port-to-port connection to be made through the TCP/IP stack on each machine. This direct client-server connection eliminates dependence on the Web server software and its associated processing overhead. The Java server then downloads other classes and the necessary data to display the interface to the user. Once this minute-long initialization is complete, all subsequent processing and data transfer take mere seconds. This benefits both the server side and the user.

On the server side, loads are reduced to some degree because SAS no longer neeeds to process GIF files. Much more important, however, is that for any given region of interest the server only has to process the boundary data once. All subsequent maps are created on the client's machine with no participation by the server. In the first version of DDViewer, every change to the color, text, and placement of map elements and every change of mapped variable involved the server. In the Java version none of these changes involve the server. This enables the server to service more connections. Now, tens, if not hundreds of simultaneous users can connect without noticeably affecting overall system performance. A recent upgrade of the server to a high-performance UNIX workstation has also greatly increased load potential.

The improvements to DDViewer are most visible on the client side. With the vector polygons, users may now select a region of interest from the map while seeing the names of states, counties, and tracts displayed on the input map. Any geographic region may be selected, including regions that cross state borders. On the output side, the user now has complete flexibility to re-color all maps elements; much greater ability to control text size, type, and placement; full control of legend placement; and access to extra features not present in the original version. New variables may be created by combining existing variables. User-defined intervals allow custom coloring of the mapped variables, and most importantly, maps can be queried through mouse point-and-click to display underlying data values and summary statistics. Although the user had some control and many choices to make in the CGI version, the Java version provides almost complete control and an unlimited number of customization choices. Furthermore, updating the map with custom choices is about an order of magnitude faster than with the previous CGI version. The interface is much easier than the awkward html-constrained CGI interface, and on-line prompts and help are now available. Examples are shown below.

4.   The Databases

An important issue faced by developers of Internet applications is the problem of limited network bandwidth. Dealing with limited bandwidth involves trade-offs, as data sent to a client application must be detailed and accurate but at the same time as condensed as possible to speed transfers through slow network devices such as modems. In other words, the client's requirement for enough data for satisfactory visualization has to be optimally balanced against the need to constrain data flow so that overall application performance is acceptable. Another consideration is that most personal computers and workstations have limited available memory. For this reason it is incumbent on developers to reduce both graphical and attribute data as much as possible. To find an appropriate balance, the DDViewer 3.0 development team focused on the user's desktop display.

DDViewer is an application used in real time over the Internet. Data downloaded from the server are displayed immediately on the screen. Since current Java applications do not allow users to save data to disk for later use, for our purposes it was unnecessary to transmit data beyond normal display resolution. A maximum display resolution was chosen to be about 650x650 pixels. The default resolution was set to fit VGA screens (and set indirectly through font sizes in JDK 1.02). We wanted to leave enough room on the users' displays so that they could continue to manipulate individual windows on their desktops. It is inconvenient when one application covers the entire display and prevents the user from accessing other computer resources. By limiting maximum display resolution to about 650 pixels we were able to pare down the graphical data requirements significantly. Providing higher resoution that most users can not view would generally "waste" bandwidth and slow response times.

The US Census Bureau's TIGER 1992 line files are the basis for the geographic data. Coverages of the entire US for states, counties, and census tracts were processed in ArcInfo (ESRI, Redlands, CA). These GIS coverages were "thinned" to reduce the number of line segments, or resolution, effectively reducing the entire size of the data set by eliminating unnecessarily detailed data. After thinning, the ArcInfo coverages were "ungenerated" to produce text files. These ASCII text files were then further processed from floating-point to integer latitude-longitude coordinate pairs. These integerized coordinate files are the permanent storage format for DDViewer boundaries. The thinning and integerizing processes greatly reduced the quantity of data to be sent to the Java client.

The data are further reduced, this time dynamically at run-time, when the client application fetches a set of boundaries. The client sends the server the geographic codes corresponding to the user's region of interest, and also sends a pair of coordinates defining the size of its display. Upon receipt of this request, the server calls a C program to process the boundary files, which reduces them to fit into the client window. In this process, the eight-significant-digit integer data from the boundary files are scaled and reduced to three-digit pixel resolution before being sent to the client. There do exist other strategies that could be used to further compress the data. These include using standard compression algorithms and/or using binary integers instead of ASCII. These strategies may be used in future versions of DDViewer. While the conversion of the boundary data requires expenditure of CPU cycles on the server side, and may at first appear to retard the download process, this is not in fact the case. Network bandwidth and client computing power are still currently the primary bottlenecks in displaying DDViewer files on the client, and overhead from the processing just described is negligible. If and when Internet bandwidth constraints are relaxed in the years to come, it should be straightforward to increase the performance of the server to more than fill the increased bandwidth. This will be done, in part, by storing all the boundary data in a fully indexed and optimized commercial database instead of flat files and by accessing the data on-line through client-server SQL calls.

The demographic attribute data from the 1990 US Census are already stored in a high performance commercial database, in this case SAS. This ~2 Gigabyte database is completely indexed for speedy access and retrieval. Future developments of the attribute database will focus on Java SQL client-server retrieval of these data.


5.   Examples

The first example below shows the steps needed to create census tract maps of the Washington, DC - Baltimore, MD corridor. Upon loading, DDViewer 3.0 presents a US map interface.


Figure 1. Main DDViewer GUI

States of interest are selected from the US map, then counties are fetched.


Figure 2. Counties of interest are selected to retrieve census tracts.

Counties of interest are selected from state maps and used to retrieve census tracts. The Select Variables button in the main GUI loads a tool to select the variables of interest.


Figure 3. The variable selection tool

Job submission then retrieves both the tract boundaries and the selected variables.


Figure 4. DDViewer maps the retrieved data

The demographic variables are then mapped. Polygon boundary lines were turned off for Figure 4 to avoid visual clutter in high density areas. GUI tools are used to customize the maps.


Figure 5. DDViewer's GUI tool for customizing the output map.


Figure 6. All map elements can be recolored in 24-bit color using this tool.

Selecting another variable from the list will map it.


Figure 7. Another variable is mapped without querying the server

Background and forground colors, title and legend positions are all modifiable.


Figure 8.All map elements can be customized


The following example shows how a new variable may be created from existing ones. The US is mapped at the county level. The number of mobile homes per county is divided by the the area of each county to produce a map of mobile home density. The user loads the variable builder tool...


Figure 9. The variable builder tool

... and uses it to calculate the new variable, which can then be mapped.


Figure 10. The newly created variable is mapped.


6.   Conclusion

DDViewer 3.0 allows users without any experience with GIS to create customized maps. Users include employees of government, business, and academic institutions, as well as students and private citizens. To date, the reception by our user community has been very positive and many suggestions have been offered. DDViewer may be enhanced in several ways: by adding regions, adding data sets, adding more sophisticated analytical tools, and/or the inclusion of satellite raster data. Other uses are possible. For example, CIESIN is now using part of the technology as a geographic interface to search through Internet-accessible databases. Earth science data pertinent to a selected region are then found using geographic selection criteria. When compared to a complete GIS, DDViewer is still rudimentary. Yet it demonstrates that it is possible to off-load much of the geographic information processing from a central server to its Web-based clients. This introduces geographic visualization to a wider clientele.

Acknowledgments. This work was supported through CIESIN under NASA Contract NAS5-32632 for the development and operation of the Socioeconomic Data and Applications Center (SEDAC). The opinions, conclusions, and recommendations contained herein represent those of the author and are not necessarily those of CIESIN or SEDAC, nor has the accuracy of the data contained herein been verified or guaranteed by CIESIN or SEDAC.

References

Preparata, Franco P. and Shamos, Michael Ian, Computational Geometry: An Introduction, Springer-Verlag Inc. New York, NY. 1985.

Sun Microsystems, Java Development Kit 1.0.2, on-line documentation, http://java.sun.com/docs/index.html, 1997.


© 1998 National Research Council. CIESIN and the U.S. Government retain the right to use this work for their respective nonprofit and governmental purposes.

The name CIESIN and the world map logo are both registered trademarks of the Consortium for International Earth Science Information Network.

This work was supported through CIESIN under NASA Contract NAS5-32632 for the development and operation of the Socioeconomic Data and Applications Center (SEDAC). The opinions, conclusions, and recommendations contained herein represent those of the author and are not necessarily those of CIESIN, SEDAC, NASA, or the National Research Council.