CGI vs. SAPI vs. FastCGI

One interesting thing that differentiates a web project from a desktop project is you never know how many people you’re going to end up serving.

With a desktop application, I can know how many client installs I have and make a reasonable prediction as to how many people will be connecting at any one time. With a web application, you’re at the whim of the public. One day your server may be yawning from boredom, the next day the cooling fans may be turning so fast your server will become airborne if not properly strapped down.

Case in point. Address Information Center, our second most popular web mapping site, normally averages ~150,000+ page requests per month and ~20,000 visits. It hums along with extremely fast response times and is uber-reliable, being based on MapServer and PostgreSQL/PostGIS.

One day I happened to notice it seemed a little slower than usual. A map request that used to come back in a second or two was taking four or five seconds. Nothing to worry about really, but I was curious. I pulled up the server, and it was being absolutely pounded. We got over 80,000 page views, or over half of our normal monthly total for that app, in about 8 hours. We peaked during different parts of the day at over 8 maps being served per second.

As it turns out, there was an article in the local paper that said people’s trash collection day might be changing, and they should hop on over to Address Information Center to check it out. Ah. That would explain things. I normally expect a good server pounding when schools let in or before a big election (Address Information Center serves up both school assignments for Charlotte-Mecklenburg Schools and polling location assignments for the Board of Elections), but this one caught me out of the blue.

I was fairly happy with how the open source software performed - based on past and current experience, ArcIMS would have folded like a cheap tent. But it could have performed faster. I decided to do some research on ways to speed it up a bit.

As I’m using PHP/Mapscript for MapServer development, MapServer is loaded as an extra DLL include in the php.ini file. Hence, MapServer performs as PHP performs, as it runs within that process. There are three ways to run PHP (and many other things) on a web server: CGI, SAPI, and FastCGI.

CGI is the default way to run PHP. CGI (Common Gateway Interface) is a standard protocol for running external application software from a web server. Essentially, the web service will get a request to a page with a particular extension (.php) and launch a php-cgi.exe executable process to handle the .php page, after which the php-cgi.exe process closes. So - server gets request, starts php-cgi.exe, php-cgi.exe handles the request, and then php-cgi.exe closes.

This is the way I was running PHP. The advantage of this system is that it is fantastically stable. There is literally nothing to break. Should the php executable error, it simply closes itself, and everything keeps on trucking. Where CGI hurts you is in the performance department.* Every time you get a new request you are spawning new processes. PHP has to start, load the php.ini, and then load all of the extensions. That takes time and server resources.

SAPI (Server Application Programming Interface), or ISAPI if it’s IIS, is a generic term for direct module interfaces with the web server. For example, running PHP on ISAPI means PHP runs in-process with IIS. No php executable ever launches to handle a request; the web server has PHP more or less built-in. You are essentially extending the functionality of IIS to cover PHP.

This is a very speedy way to go. You’ll get about 5x the performance over CGI. There are a couple of problems with ISAPI, however. One is portability - SAPI interfaces are web server specific - an IIS server will not use an Apache SAPI, and vice versa. That one you may not care about. What you should care about is this one: stability. By running something in SAPI mode, it’s running in-process with the web server. If it goes down, it can take the whole web server down with it. Yikes. Scaling it is darn near impossible (odd proxy solutions aside). There are also some security concerns, as the process will run with the full privileges that the web server process runs with, something you may not want. I’ve also heard mixed reviews of the PHP ISAPI module. People seem to have had more luck with mod_python for Apache.

FastCGI tries to overcome the performance problems of CGI while still maintaining its benefits. It’s basically a wrapper around CGI applications. The web server will communicate with the FastCGI module, which then communicates with php-cgi executable processes over TCP (which lets you distribute processes across multiple servers if so desired). FastCGI launches a specified number of these CGI processes when the web server starts. For example, FastCGI on our production server spawns a minimum of 5 php-cgi executables to wait for requests, and will spawn up to a total of 20 processes if traffic increases (though I haven’t run into a traffic load that spawned more than the original 5 yet).

As the php processes are running, the overhead of launching the processes is eliminated, realizing a giant speed boost. The processes also run outside of the web server process, addressing both stability and security issues. Finally, as you can branch processes to different machines, scalability no longer becomes an issue (unless you don’t have any other machines of course).

FastCGI is the way we’ve gone with PHP/MapServer on IIS, and I’ve been very pleased with the results. Performance is better and stability has remained rock-solid. I’ll go over the fairly simple way to set up PHP and PHP/MapScript with FastCGI in another post this month.

*Performance is a relative thing. Even in CGI mode, using MapServer with PHP/Mapscript is still a lot faster than ArcIMS.