Creating Virtual GIS Appliances with Ubuntu JeOS

Some time ago our team met with a 10 county coalition of emergency response and management groups. We identified two general GIS needs for the coalition: the need for seamless GIS data sets across county boundaries, and the need for a user-friendly, browser-based GIS application to handle analysis and reporting functions.

The first part is at the same time the simplest and the the most difficult (and least interesting). Basically you have to create a standard database schema and brow beat overworked GIS staff at the various counties to provide you with periodic updates. Maybe toss in some python scripts to massage the data. Simple conceptually, mind-numbingly difficult to pull off successfully, and fairly boring if you’re not at a bar regaling others with tales of a data mashing project that is slowly devouring your soul. Let’s move on.

The web application project had some interesting attributes, some of which are from the clients and some of which I filled in mentally while listening to the clients:

The application needed to be able to write to an event layer. This immediately flagged it as requiring WFS-T.
Tornadoes are not only attracted to trailer parks; they also love telephone poles and cell towers. Tossing this on one of our servers wasn't going to cut it; this application needed to be able to run independently in each emergency operations center. In a dire emergency when the internal network is kaput, it would be beneficial if it could run on individual laptops.
Barring a grant fairy bestowing a large some of money with the wave of a magic checkbook, lots of these counties don't have it in their coffers to buy hardware or software for this kind of project. Many of them are doing a remarkable job by the skin of their financial teeth. A proprietary SQL Server/ArcGIS Server/SDE/Windows/hardware stack could easily run +$50,000 a county, or a half million dollar project footprint for hardware and software alone. This was clearly going to be a FOSS project.
No way in <fill in your version of Dante's easy-bake here> I'm supporting a bunch of application and database servers at ten different counties with disparate hardware, network, and software setups. Is not happening.

So where does that leave us? It leaves us with a fully open source stack running as a virtual appliance. Says wikipedia:

A virtual appliance is a minimalist virtual machine image designed to run under some sort of virtualization technology (like VMware Workstation, Citrix XenServer, VirtualBox or many others).

Basically, my plan was to distribute the whole application as a virtual server appliance running under VMware Player. I chose VMware Player because it's small, easy to use, free (as in beer, not as in freedom), and VMware was more likely to be familiar to the clients. I personally prefer VirtualBox at home, but VMware's free Server and Player products are also pretty darn good.

The first step was choosing a Linux distribution to use for the guest OS. That decision came pretty quickly - I picked Ubuntu JeOS (Just Enough Operating System). From the web site:

Ubuntu Server Edition JeOS (pronounced "Juice") is an efficient variant of our server operating system, configured specifically for virtual appliances. Currently available as a CD-Rom ISO for download, JeOS is a specialised installation of Ubuntu Server Edition with a tuned kernel that only contains the base elements needed to run within a virtualized environment.
Technical Specs:

Less than 100Mo ISO image

Less than 300Mo installed footprint

Specialised -virtual Kernel 2.6.24

Optimised for VMWare ESX, VMWare Server and KVM

Intel or AMD x86 architecture

Minimum memory 128M

No graphical environment preloaded as it is aimed at server virtual appliance

Working knowledge of linux administration and debian packages recommended to start building your own appliance

Part of the reason I picked this distro was my familiarity with Ubuntu (it's what my machines run at home) and it's package management system. But it's also a fantastic distro specially tailored for just this purpose, and v8.04 is a long term support release, meaning there will be security patches and updates for the next two years.

For the software end of things, I used this stack:

Database: PostgreSQL with PostGIS
Servlet Engine: Apache Tomcat
WMS/WFS-T Server: GeoServer
Tile Caching: GeoWebCache
Httpd: Apache with mod_proxy (GeoServer and GeoWebCache) and mod_php
Server-side code: PHP (just web services and PDF reports - the rest is JavaScript/AJAX)
Client-side code: OpenLayers, jQuery

It took me a while to figure everything out, but once I did, I could probably remake the whole server and application stack in an hour, two tops.

Package management in Linux has really gotten a bad wrap. If anything, I find package management in Linux to be far, far, far superior to that of Windows. After walking through the Ubuntu JeOS set up screens, I basically enabled the root account (sudo on servers is for suckers), and typed:

apt-get install sun-java6-jdk postgresql postgresql-8.3-postgis postgis-client apache2 php5 libapache2-mod-php5 libapache2-mod-proxy-html php5-gd php-fpdf nano wget openssh-server

With one command the OS fetched, installed, and configured Java, PostgreSQL, PostGIS, Apache, PHP, and some other related packages. The only thing I really had to install by hand was Tomcat, as the Ubuntu repositories only have v5.5 and I wanted v6. That was a matter of grabbing the tar file, unzipping it to /usr/local/tomcat, adding an admin user to tomcat-users.xml, and making an init.d script to start Tomcat at startup. Not as easy as apt-get tomcat6 would have been, but not bad.

From there it was a matter of deploying the GeoServer and GeoWebCache WAR files (browse, pick, and deploy from Tomcat manager), restoring our PostgreSQL database for the project, linking to our GeoServer data directory with our settings, copying in the GeoWebCache properties files we needed, dropping the web application folder in /var/www, and setting up a couple of mod_proxy links. That sounds like a lot, but each step is really just a couple of minutes. Bob’s your uncle, a full open source GIS virtual appliance running our complete application and all of its dependencies. The hardest part was the pound-head-on-desk bit of perl I had to write to have the VM applicance report the IP address it DHCP-fetched on bootup so users can point their browser to it (having non-techie users log in, type ifconfig and root through the resulting paragraph about eth0 seemed cruel and unusual).

How does it run? Right now it’s sitting on my laptop, using 1 CPU and 512MB RAM, and running like its hind quarters are on fire. Yes, the OS, database server, application server, and web server are running well on 1 CPU and 512MB of RAM. Having your OS only take ~128MB of RAM really frees your memory up for all of the other stuff you would rather have it doing. I have the resources on the VM appliance set low because you generally want them to work for the lowest-common-denominator host machine. If a county wants to run it on a quad-core workstation with 4GB of RAM, they could up the VM appliance specs accordingly. But it’s running fine right now - CPU utilization is low, and it’s using 0 swap with 305MB RAM free.

Here comes an even better part. Distributing it with an empty cache, the whole thing is ~1.5GB. No joking - 1.5GB. Right now we have some significant holes in the data coverage, so I could see that easily going up a little past 2GB. Once on a hard drive and running the size will start to grow as the tile caches build (I set a max HD size of 20GB), but this whole application can easily be distributed on a DVD, USB thumb drive, or via FTP.

The advantages of this type of delivery system are many:

It's tiny. Easy to distribute.
It runs on minimal hardware. No extra hardware purchases are necessary - it can run on a spare workstation.
No admin (hardware, database, web server, etc.) knowledge by the end user necessary. Just fire it up.
Updates are a piece of cake - just make a new VM applicance image and distribute.
Support is a piece of cake. If it borks, just hit the stop button, followed by the play button. If a local linux hacker gets creative and really borks it, just delete the VM applicance and replace it with a fresh copy. The only time we have to get involved is if we botched some code in the application itself, which never happens. Really.
It can run on as many machines as you want it to. The host can be anything that runs VMware Player (Linux, Windows).
The virtual appliance is 100%, absolutely, no-holds-barred free. As in beer and freedom. VMware Player and Server are not open source, but they are free in terms of cost.

On top of these architectural advantages, our team member that wrote the application did a superb job. Event management, PDF reports, anaysis, great user interface, etc. - top notch. I introduced him to OpenLayers and PostGIS, wrote some basic web services for him in PHP, and he was off to the races. Now I get to steal his code for some other projects (sssh!).

One gotcha that threw me for a bit was an issue with the MAC address. VMware by default gives a guest instance a MAC address generated in a particular range based on the host. Ubuntu, however, caches the MAC address of its eth interfaces in /etc/udev/rules.d/70-persistent-net.rules. So when you give the virtual image to somebody else, their VMware generates another MAC for eth0, and Ubuntu will then cache the new MAC address as eth1 (2, 3, etc.). But since there is no eth1, you get no network connection. The trick is to force a persistent MAC address in the VMX file like this (filling it in the x’s however you want):

ethernet0.addressType = "static" ethernet0.address = "00:50:56:xx:xx:xx"

Then edit the /etc/udev/rules.d/70-persistent-net.rules file to only have eth0 and fill in the MAC address with the one you specified.

This type of architectural approach won’t work for every project, but if it will meet your needs, I highly recommend trying a virtual appliance solution, with Ubuntu JeOS being a good place to start. This approach has worked out great for this project, and given the requirements I don’t think any other approach would have. If you have any more detailed questions on setting up a VMware Ubuntu JeOS image like this, feel free to ask or check out the Ubuntu JeOS community help page.