Looking Cloudy | Fuzzy Tolerance

I’m in the process of replacing our primary web server for open source stuff¹, and it seemed like a good opportunity to figure out how to get our content to the “cloud” or, if that word raises your nomenclature hackles, a VPS host.

There are many reasons to look at cloud hosting, but my top 3 are:

Cost: Like most government organizations, we pay insane amounts for hardware. Replacing our current box would probably run us ~$10k (our tiniest server size), and we’ll have to replace it in 3-4 years. Not to mention storage costs on “your dad’s SAN”. Not to mention all the human support time. Not to mention the Windows license. Not to mention the freaking power costs.
Convenience: Try firing up a new server 60 seconds after you thought of it with your IT shop.
Control: I want to be fast and innovate and try new things. Or at least run Linux on a server like a normal person.

I’ve been hearing great things about Digital Ocean. For $5/month you get 1 core, 512MB, 20GB SSD storage, 1TB of transfer (upload doesn’t count toward that). The thing that really separates them is the SSD. Our IT shop doesn’t believe in SSD’s, but since the first time I dropped an Intel 40GB SSD into a machine at home I refuse to use anything else. The speed difference is jaw-dropping.

I decided to run a little benchmark. I dropped a simple HTML file on Digital Ocean, our old/dying physical web server, and a new local VM that might replace that physical server.

Test File - ab.html

<!DOCTYPE html>
<html>
<head>
<title>Webserver test</title>
</head>
<body>
This is a webserver test page.
</body>
</html>

The three web servers are like so:

Current Physical Server: 4GB RAM, 2 CPU (4 cores each), Windows Server 2003 32bit, Apache (this is an old box)
VMWare Instance: 6GB RAM, 2 CPU, Windows Server 2012 64bit, Apache
Digital Ocean: 512MB RAM, 1 CPU, Ubuntu 13.10, nginx

It isn’t exactly a fair comparison - Apache vs nginx, physical vs virtual, single core vs multicore, SSD vs your dad’s SAN. But that’s kind of the point. I don’t care what the backend is as much as comparing the performance of the backends available to me. Plus, I can’t exactly run nginx on Windows².

I used Apache ab to do a simple benchmark on the test file. It was run on the local network (but not the same machine) of the test boxes so we’re not seeing web latency issues in the test. 10,000 requests were made, 20 concurrently.

Apache ab

1	ab -n 10000 -c 20 http://thehost/ab.html

Here are the result for our old physical server.

Physical Server

Concurrency Level:      20
Time taken for tests:   16.393 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      4710000 bytes
HTML transferred:       1290000 bytes
Requests per second:    610.01 [#/sec] (mean)
Time per request:       32.786 [ms] (mean)
Time per request:       1.639 [ms] (mean, across all concurrent requests)
Transfer rate:          280.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1   14   7.0     14      60
Processing:     4   19   9.9     18     193
Waiting:        2   17   9.9     16     193
Total:          6   33  12.8     34     208

Percentage of the requests served within a certain time (ms)
  50%     34
  66%     37
  75%     38
  80%     39
  90%     43
  95%     49
  98%     57
  99%     68
 100%    208 (longest request)

Not bad. The requests were handled in 16.393 seconds, averaging 610 requests per second and 33ms per request.

Now for the VMWare server.

VMWare Server

Concurrency Level:      20
Time taken for tests:   16.896 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      3740000 bytes
HTML transferred:       1290000 bytes
Requests per second:    591.85 [#/sec] (mean)
Time per request:       33.793 [ms] (mean)
Time per request:       1.690 [ms] (mean, across all concurrent requests)
Transfer rate:          216.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1   14   5.2     14      38
Processing:     4   20   7.7     19     158
Waiting:        3   18   7.6     18     155
Total:          8   34  10.3     35     164

Percentage of the requests served within a certain time (ms)
  50%     35
  66%     37
  75%     38
  80%     39
  90%     42
  95%     49
  98%     56
  99%     62
 100%    164 (longest request)

Again, not bad. It’s a tiny bit slower - the requests were handled in 16.896 seconds, averaging 591 requests per second and 34ms* per request. I had hoped to be the same or a little better than the physical box (the physical box is old), but it’s pretty close. There was some weirdness however. The test would only complete 1 of every 3 tries, timing out on the others. This might be an ab bug and not a server issue.

Now for the eye opener.

Digital Ocean $5/month - seriously

Concurrency Level:      20
Time taken for tests:   1.974 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      3620000 bytes
HTML transferred:       1220000 bytes
Requests per second:    5065.36 [#/sec] (mean)
Time per request:       3.948 [ms] (mean)
Time per request:       0.197 [ms] (mean, across all concurrent requests)
Transfer rate:          1790.68 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       5
Processing:     0    4   1.3      3      16
Waiting:        0    4   1.3      3      15
Total:          1    4   1.3      4      16

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      5
  95%      7
  98%      8
  99%      9
 100%     16 (longest request)

I was hoping for similar speeds or maybe a slight improvement with nginx and a SSD. I didn’t think my eyes would pop out. The requests were handled in 1.974 seconds, averaging 5065.36* requests per second and 4ms* per request.

	Total Time (s)	Requests/Second	Mean Request Time (ms)
Physical Box	16.393	610.01	32.786
VMware	16.896	591.85	33.793
Digital Ocean	1.974	5065.36	3.948

For $5/month I can replace a $10k+ server with all of the associated costs and headaches, be able to do whatever I want with it, and increase performance by ~900%. And how did that test bump the CPU at Digital Ocean?

Hells yeah I want some of that. So much so I’m willing to jump through potentially soul-crushing procurement hoops to get it³.

¹ We separate our open source and proprietary stuff because we don't want the instability of our proprietary stuff screwing with the uptime of our open source stuff. True story.

² Cygwin doesn't count.

³ Figuring out how to bill the government $5 a month may well cost $5,000 in staff time.