The Devver Blog

A Boulder startup improving the way developers work.

Ruby Messaging Shootout

We have spent a bit of time looking into various Ruby messaging systems. We briefly posted about the speed of Ruby messaging in the past and promised some more detailed numbers. We will share a bit of code to run some basic tests on various Ruby messaging systems, and benchmark the performance. We are sure you can do more to get even more accurate results, but these were good enough for our purposes and we thought we should share them.

We decided that we would place and take X messages through each messaging system. Our baseline for the best performance was a standard Ruby in-memory queue. In an effort to reduce the effects of initial loading, we ran a small staging with each of the queues, and threw away the first set of results. We ended up comparing SQS, Starling, and Beanstalk. In the comparisons we ran tests using local queue servers running on the same machine as the tests (LocalStarlingQueue, BeanstalkClient). We also ran the queue servers remotely on an EC2 server (BeanstalkClientRemote, RemoteStarlingQueue). We are only showing results for 10 SQS messages, because it was so slow with any sizable amount of messages. We quickly found that both Beanstalk and Starling were faster than SQS by over 10x, on the remote servers and insanely faster on local servers. Surprisingly running Starling and Beanstalk between multiple EC2 instances is almost as fast as having local queue servers, and completely puts SQS in the dust (1000x faster).

When we started running larger tests we found some interesting results. Compare the last two sets of test in which we run remote queue servers. The first set shows that Starling is faster by a decent amount mostly because taking messages off the queue is significantly faster. This is actually because on Starling removing an item from a queue is a operation, while on Beanstalk it is two operations get and remove. This made Starling seem better for our needs since in our app any job taken should be considered completed. Once moving to messaging between EC2 instances though we can see that the overhead of the multiple messages disappears because the internal network is very fast between EC2 machines.

The speed of Beanstalk between EC2 was one of the reasons we eventually went with it as our messaging choice. The other isn’t really shown in our results here, but running tests on Starling over time show that the system begins to slow down with use. I am assuming that this is related to Starling persisting queues to disk, and the overhead related to persistence. In fact I found that if I killed the Starling server and cleared its persistent storage, Starling would return to its original performance. We currently have no need for persisting our queues, so taking a performance hit to support that feature was the final reason we went with Beanstalk.

We also looked at ActiveMessaging, but ultimately decided not to take the time to implement and test it. I would love to see others take our messaging test harness and put other messaging systems through the paces. We have decided to share the code we used to generate the results you see below, feel free to contact us if you have any questions, or if you add any other queues to the tests.

Download Ruby Messaging Tests

UPDATE: I updated the results to include runs with 10,000 and finally 100,000 messages, because there was some interest in seeing those numbers. There is some interesting discussion about this going on at the beanstalk Google group, about the results.

Running 10 messages on all systems
Queue type               user     system      total        real
MemoryQueue:             0.000000   0.000000   0.000000 (  0.000067)
LocalStarlingQueue:      0.000000   0.010000   0.010000 (  0.015040)
BeanstalkClient:         0.010000   0.000000   0.010000 (  0.005700)
SQS1:                    0.120000   0.050000   0.170000 ( 10.608450)
BeanstalkClientRemote:   0.020000   0.030000   0.050000 (  4.263844)
RemoteStarlingQueue:     0.010000   0.020000   0.030000 (  3.366750)

MemoryQueue::::: mean time for put : 0.000020 std dev for put: 0.000014 mean time for take: 0.000013 std dev for take: 0.000001 put mean is 1.0 slower than MemoryQueue take mean is 1.0 slower than MemoryQueue LocalStarlingQueue::::: mean time for put : 0.000867 std dev for put: 0.0002 mean time for take: 0.001340 std dev for take: 0.000639 put mean is 43.3659117997616 slower than MemoryQueue take mean is 101.259459459459 slower than MemoryQueue BeanstalkClient::::: mean time for put : 0.000166 std dev for put: 0.000026 mean time for take: 0.000288 std dev for take: 0.000020 put mean is 8.2777115613826 slower than MemoryQueue take mean is 21.7963963963964 slower than MemoryQueue SQS1::::: mean time for put : 1.081836 std dev for put: 2.264929 mean time for take: 0.670880 std dev for take: 0.045992 put mean is 54082.8140643623 slower than MemoryQueue take mean is 50700.4522522523 slower than MemoryQueue BeanstalkClientRemote::::: mean time for put : 0.101100 std dev for put: 0.007356 mean time for take: 0.202382 std dev for take: 0.012278 put mean is 5054.16805721097 slower than MemoryQueue take mean is 15294.6 slower than MemoryQueue RemoteStarlingQueue::::: mean time for put : 0.111742 std dev for put: 0.008853 mean time for take: 0.155392 std dev for take: 0.110507 put mean is 5586.18355184744 slower than MemoryQueue take mean is 11743.4378378378 slower than MemoryQueue 100 messages, Remote Queue servers on EC2 (client and tests running locally) Queue type user system total real MemoryQueue: 0.000000 0.000000 0.000000 ( 0.000165) BeanstalkClientRemote: 0.130000 0.250000 0.380000 ( 33.909095) RemoteStarlingQueue: 0.080000 0.170000 0.250000 ( 22.677569)
MemoryQueue::::: mean time for put : 0.000015 std dev for put: 0.000006 mean time for take: 0.000013 std dev for take: 0.000002 put mean is 1.0 slower than MemoryQueue take mean is 1.0 slower than MemoryQueue BeanstalkClientRemote::::: mean time for put : 0.113252 std dev for put: 0.004145 mean time for take: 0.227982 std dev for take: 0.031950 put mean is 7479.36954810266 slower than MemoryQueue take mean is 17757.1864438254 slower than MemoryQueue RemoteStarlingQueue::::: mean time for put : 0.132354 std dev for put: 0.188740 mean time for take: 0.112539 std dev for take: 0.003803 put mean is 8740.89418989135 slower than MemoryQueue take mean is 8765.5156917363 slower than MemoryQueue 100 messages, Remote Queue servers on EC2 (client and tests running on a separate EC2 instance) Queue type user system total real MemoryQueue: 0.020000 0.000000 0.020000 ( 0.006392) BeanstalkClientRemote: 0.010000 0.000000 0.010000 ( 0.793841) RemoteStarlingQueue: 0.010000 0.000000 0.010000 ( 1.067932)
MemoryQueue::::: mean time for put : 0.000009 std dev for put: 0.000102 mean time for take: 0.000030 std dev for take: 0.000769 put mean is 1.0 slower than MemoryQueue take mean is 1.0 slower than MemoryQueue BeanstalkClientRemote::::: mean time for put : 0.000353 std dev for put: 0.002849 mean time for take: 0.000527 std dev for take: 0.002545 put mean is 38.4510216814849 slower than MemoryQueue take mean is 17.5400444232905 slower than MemoryQueue RemoteStarlingQueue::::: mean time for put : 0.000773 std dev for put: 0.004554 mean time for take: 0.000805 std dev for take: 0.006630 put mean is 84.2330369677118 slower than MemoryQueue take mean is 26.7816119308266 slower than MemoryQueue 10,000 messages, Remote Queue servers on EC2 (client and tests running on a separate EC2 instance) Queue type user system total real MemoryQueue: 0.040000 0.000000 0.040000 ( 0.127432) BeanstalkClientRemote: 0.390000 0.090000 0.480000 ( 7.646054) RemoteStarlingQueue: 0.070000 0.020000 0.090000 ( 10.685410)
MemoryQueue::::: mean time for put : 0.000024 std dev for put: 0.001314 mean time for take: 0.000015 std dev for take: 0.000685 put mean is 1.0 slower than MemoryQueue take mean is 1.0 slower than MemoryQueue BeanstalkClientRemote::::: mean time for put : 0.000283 std dev for put: 0.002114 mean time for take: 0.000526 std dev for take: 0.002925 put mean is 11.7913700313271 slower than MemoryQueue take mean is 35.8050617861459 slower than MemoryQueue RemoteStarlingQueue::::: mean time for put : 0.000602 std dev for put: 0.004539 mean time for take: 0.000546 std dev for take: 0.004831 put mean is 25.0511140001964 slower than MemoryQueue take mean is 37.2107648228277 slower than MemoryQueue 100,000 messages, Remote Queue servers on EC2 (client and tests running on a separate EC2 instance) Queue type user system total real MemoryQueue: 0.260000 0.040000 0.300000 ( 0.677368) BeanstalkClientRemote: 3.200000 0.940000 4.140000 ( 76.989950) RemoteStarlingQueue: 0.820000 0.240000 1.060000 (110.507879)
MemoryQueue::::: mean time for put : 0.000019 std dev for put: 0.000915 mean time for take: 0.000018 std dev for take: 0.001125 put mean is 1.0 slower than MemoryQueue take mean is 1.0 slower than MemoryQueue BeanstalkClientRemote::::: mean time for put : 0.000274 std dev for put: 0.002125 mean time for take: 0.000531 std dev for take: 0.003302 put mean is 14.6932435272862 slower than MemoryQueue take mean is 30.0050748059956 slower than MemoryQueue RemoteStarlingQueue::::: mean time for put : 0.000592 std dev for put: 0.006346 mean time for take: 0.000577 std dev for take: 0.004349 put mean is 31.7037894886494 slower than MemoryQueue take mean is 32.5781596463523 slower than MemoryQueue
Advertisements

Written by DanM

June 30, 2008 at 8:44 am

7 Responses

Subscribe to comments with RSS.

  1. Have you also performed any latency tests?

    Henrik Holst

    July 1, 2008 at 12:43 am

  2. No not really, I guess finding the std dev for puts and takes, sort of covers the average delay time of performing an action. I am not specifically sure what kind of test you are looking for, I don’t think I am thinking about latency in the same way you are. Any thoughts on the kinds of tests you would like to see or what would cover the issues you are wondering about?

    Dan

    July 1, 2008 at 10:51 am

  3. I’m not sure about these SQS results. See the bottom of this thread for my SQS receive/delete message times in Java (with a library I wrote)
    http://developer.amazonwebservices.com/connect/thread.jspa?messageID=63449&#63449

    I’ve also been doing heavy-duty messaging with SQS in EC2 for a year or more and I think your send times must be off. I have some test send code I’ll try running again.

    David Kavanagh

    July 2, 2008 at 5:17 am

  4. I just did a test, sending 50 messages and the time per send is: 0.28656, and that isn’t threaded either, and it’s over a DSL line from home. Right away, it’s 3x faster and I don’t think I’m reproducing your test env properly either.

    David Kavanagh

    July 2, 2008 at 5:21 am

  5. As I mention here:

    http://developer.amazonwebservices.com/connect/thread.jspa?threadID=22892&tstart=0

    my put times were closer to 0.08 seconds for small test messages to an existing queue. This is from Python using boto on a 384K upload link. SQS will definitely not be the fastest solution but it really shouldn’t be as slow as you indicate in your benchmarks.

    Mitch Garnaat

    July 2, 2008 at 6:19 am

  6. Yeah, we only posted the SQS results from our home test. Internally on EC2, it is 10x faster, I should have really posted those results as well. I am also hearing that other languages send and receive a lot faster than ruby. I think the default Amazon SQS library does XML handling via Ruby, it would probably be faster to use one of the C xml libraries, I have heard that other SQS libraries are more robust and better than Amazon’s so perhaps I should run another test using Right-AWS or the Active Messaging plugin to get some SQS times.

    Mitch running SQS internally on EC2 our puts were about 0.04 seconds, have you run anything between EC2 instances?

    Thanks for all the feedback good to know.

    Dan

    July 2, 2008 at 8:23 am

  7. What about the EventMachine-based Ruby Stomp server? I keep noticing this option gets completely ignored in many Ruby discussions about messaging.

    The gem for the server is “stompserver” and the client gem is “stomp”. I use it for several production applications.

    Jay Phillips

    July 4, 2008 at 1:01 pm


Comments are closed.

%d bloggers like this: