Spatially visualizing socially and politically relevant datasets

In addition to my "official" KML endeavors with the James Reserve, I have created a few side-projects for Google Earth. I am particularly interested in visualizing large datasets of publicly-available information which might benefit from spatial representation.

The first two projects listed here were posted to the Google Earth Community BBS and generated a lot of unintended political controversy. I've been approached by a couple of journalists here and abroad, interested in writing a short story on the projects, and my motivations behind doing them.

The third project presented here was never published on the BBS, due to the very heavy system demands, and limitations of ver 3.x of Google Earth. However, as discussed below, with the new GE beta 4.x and kml 2.1, I've made this layer manageable.

These exercises helped me to gain a strong understanding of KML, its generation with PHP, and enabled me to successfully work on the James Reserve project. Though I *really* enjoy these projects, my commitments to the James Reserve, and my thesis have prevented me from maintaining these layers in recent months. I intent to revisit, update and improve them with KML 2.1, upon completion of my thesis.

 

US/Coalition Casualties: Iraq & Afghan (~43,000 downloads)

For this layer, I mapped ~2,500 US and Coalition war casualties in the campaigns in Afghanistan and Iraq (up until approximately Jan 2005). The person's first name and an icon on their hometown (if located) are displayed. Clicking on the placemark provides a small thumbnail picture of the person, when available, and information about their death. Links are provided to a few popular online war memorial sites, including CNN's tribute page, Honor the Fallen, icasualties.org's records (with links to DoD records, and others).

Creation Process:
More details are available on the bulletin board, but an overview of the creation process is as follows: Casualty data was harvested from icasualties.org. I had to harvest the data directly from the HTML search results tables, which involved some parsing with php, and crafty use of search/replace and a spreadsheet program.

Next, I determined the naming convention for people's portraits hosted on CNN's site (unfortunately, this was rather inconsistent). I then wrote a php script to check for those images on the CNN server (http headers test), and made a note of those that were available. A generic portrait/silhouette was used for profiles without photos.

I then created a kml file with <address> listings for all the available hometowns, and let GE geocode them for me. Once I had collected all the raw data, images links and coordinates in a large csv file, I created a php script to generate the KML for the layer.

 

Here is the result:

 

30K Iraqi Civilian/Military Casualties (~14,000 downloads)

As a follow-up to the US/Coalition Causalities layer, I wanted to represent the estimated 30,000 Iraqi causalities since the war's inception, up until the end of October 2005. The major challenge here was finding a way to deal so many events tied to the same city location; Baghdad alone had 14,000 causalities. I decided to represent time in a spatial dimension, spiraling out the earliest events from the city center, outwards to the more recent events. Each individual incident is displayed as a vertical line of points.

Creation Process:
For this layer I used the IraqiBodyCount.org's site as a data source. As mentioned before, I chose to use a simple eudclian spiral algorithm to start at the city's center, and chronologically spiral outwards towards the more recent events.

The vertical stack of placemark icon represent each of deaths for that incident. For reports with death tolls over a certain threshold, I decided to move from representing each person lost with a single placemark, to five per placemark. This was primarily done because Google Earth has an understandably difficult time drawing 30k+ icons (this was pre-KML 2.1 and <regions>).

It would have been much easier, and much less computer-intensive to merely represent these deaths a polygons of different heights. However, I really felt that the impact of showing these causalities in a one-to-one (or 1-to-5) ratio was worth the drawbacks.

 

Here is the result:

 

Missing and Exploited Children (unpublished)

For this project I harvested nearly 1,500 U.S. records from the National Missing and Exploited Children's website. Unlike the previous projects, I wanted to show the childrens' faces as icons right in the 3D environment. Clicking on the face bring up the person's records, a larger photo, and links to the National Missing and Exploited Children's website.

Though this is one of my favorite side-projects, I did not publish it upon completion because of performance limitations with GE 3.x and KML 2.0.

Creation Process:
As their search interface only allowed the display of ten or so results at a time, I could not employ the HTML data harvesting methods used in the previous projects. Thus, I was forced to innovate, and refine my data mining techniques.

Using the PHP PEAR module HTTP_Request, I was able to POST the appropriate form parameters to the website's search engine. The script cycled through all 250+ results pages, and wrote them locally to my hard drive.

I used a combination of PHP parsing and spreadsheet manipulation of the resulting HTML files to retrieve names, information, and links to photographs. As there was no simple naming convention employed for the images, I used the harvesting script again to collect all the image URLs recorded in the first step.

What to do with 1500 image files?
Because GE would certainly have difficulties loading 1500 individual image files from a KMZ or webserver, I decided to built icon palettes of the pictures. Using the GD Graphing Library and some example algorithms for "photo contact sheet" generation, I was able to amass all 1500 images into a series of 8 palettes. My script carefully recorded the x, y position of each image, so it was a simple matter of appropriately referencing the palette subsection in the KML.

Unfortunately, Google Earth 3.x creates a mini-thumbnail for each new icon, to be displayed in the "My Place" sidebar. Loading the layer would send my computer to 100% CPU usage, never to return. I was able to work out a little hack to trick GE into not creating the mini-thumbs: 1) I loaded the KML first, without access to the palettes in the specified<href/url> locations. 2) After allowing GE to choose the default "missing icon" for each, I then introducing the palettes to their expected locations. However, because of the required 'hack' and the the performance problems, I decided not to post this on the Keyhole BBS.

 

Revisited with 2.1:
With the arrival of KML 2.1, I recently revised this layer, creating <regions> for each state in order to cut down on the computing demands.

Without access to the extreme latitudes and longitudes of each US state, I had to approximate their bounds. I found locations for the geographical centers of all 50 states, and wrote a php script to estimate the bounds of a square equal in size to the area of each state. Though not perfect (some placemarks disappear when zooming in close to the state's borders), this greatly improves usability. GE 4.x also seems to better handle mini-thumb creation.

 

Here is the result:

 

Forthcoming Projects:

I have a number of other side-projects ideas, in particular, making use of the large datasets from the National Archives. The next dataset I'd like to tinker is from the Japanese Internment Camps during WWII. There is a rich dataset featuring the home cities and counties of all the Japanese-Americans relocated to specific camps in the Pacific southwest. There is also data on their professional, educational, and familial backgrounds. I think it would be really interesting to visualize the relocation of people from their hometowns to various camps, and filter this data according to various criteria, and animate these relocations using KMl 2.1's time features.