The Devver Blog

A Boulder startup improving the way developers work.

Archive for June 2009

A dozen (or so) ways to start sub-processes in Ruby: Part 1

Introduction

It is often useful in Ruby to start a sub-process to run a particular chunk of Ruby code. Perhaps you are trying to run two processes in parallel, and Ruby’s green threading doesn’t provide sufficient concurrency. Perhaps you are automating a set of scripts. Or perhaps you are trying to isolate some untrusted code while still getting information back from it.

Whatever the reason, Ruby provides a wealth of facilities for interacting with sub-processes, some better known than others. In this series of articles I will be focusing on running Ruby as a sub-process of Ruby, although many of the techniques I’ll be demonstrating are applicable to running any type of program in a sub-process. I’ll also be keeping the focus on UNIX-style platforms, such as Linux and Mac OS X. Sub-process handling on Windows differs significantly, and we’ll leave that for another series.

In the first and second articles, I’ll demonstrate some of the facilities for starting sub-processes that Ruby possesses out-of-the-box, no requires needed. In the third article we’ll look at some tools provided in Ruby’s Standard Library which build on the methods introduced in part one. And in the fourth instalment I’ll briefly survey a few of the many Rubygems which simplify sub-process interactions.

Getting Started

To begin, let’s define a few helper methods and constants which we’ll refer back to throughout the series. First, let’s define a simple method which will serve as our “slave” code – the code we want to execute in a sub-process. Here it is:

def hello(source, expect_input)
  puts "[child] Hello from #{source}"
  if expect_input
    puts "[child] Standard input contains: \"#{$stdin.readline.chomp}\""
  else
    puts "[child] No stdin, or stdin is same as parent's"
  end
  $stderr.puts "[child] Hello, standard error"
end

(Note: The full source code for this article can be found at http://gist.github.com/137705)

This method prints a message to the standard output stream, a message to the standard error stream, and optionally reads and prints a message from the standard input stream. One of the things we’ll be exploring in this series is the differing ways in which the various sub-process-starting methods handle standard I/O streams.

Next, let’s define a couple of helpful constants.

require 'rbconfig'
THIS_FILE = File.expand_path(__FILE__)

RUBY = File.join(Config::CONFIG['bindir'], Config::CONFIG['ruby_install_name'])

The first, THIS_FILE, is simply the fully-qualified name of the file containing our demo source code. RUBY, the second constant, is set to the fully-qualified path of the running Ruby executable. These constants will come in handy with sub-process methods which require an explicit shell command to be run.

In order to make the order of events clearer, we’ll force the standard output stream into synchronised mode. This will cause it to flush its buffer after every write.

$stdout.sync = true

Finally, we’ll be surrounding all of the code which follows in the following protective IF-statement:

if $PROGRAM_NAME == __FILE__
# ...
end

This will ensure that the demo code won’t be re-executed when we require the source file within sub-processes.

Method #1: The Backtick Operator

The simplest way to execute a sub-process in Ruby is with the backtick (]`). This method, which harks back to Bourne Shell scripting and Perl, is concise and often gives us exactly as much interaction as we need with a sub-process. The backtick, while it may look like a part of Ruby’s core syntax, is technically an operator defined by Kernel. Like most Ruby operators it can be redefined in your own code, although that’s beyond the scope of this article. Kernel defines the backtick operator as a method which executes its argument in a subshell.

puts "1. Backtick operator"
output = `#{RUBY} -r#{THIS_FILE} -e'hello("backticks", false)'`
output.split("\n").each do |line|
  puts "[parent] output: #{line}"
end
puts

Here, we use backticks to execute a child Ruby process which loads our demo source code and executes the hello method. This yields:

1. Backtick operator
[child] Hello, standard error
[parent] output: [child] Hello from backticks
[parent] output: [child] No stdin, or stdin is same as parent's

The backtick operator doesn’t return until the command has finished. The sub-process inherits its standard input and standard error streams from the parent process. The process’ ending status is made available as a Process::Status object in the $? global (aka $CHILD_STATUS if the English library is loaded).

We can use the %x operator as an alternate syntax for backticks, which enables us to select arbitrary delimiters for the command string. E.g. %x{echo `which cowsay`}.

Method #2: Kernel#system

Kernel#system is similar to the backtick operator in operation, with one important difference. Where the backtick operator returns the STDOUT of the finished command, system returns a Boolean value indicating the success or failure of the command. If the command exits with a zero status (indicating success), system will return true. Otherwise it returns false.

puts "2. Kernel#system"
success = system(RUBY, "-r", THIS_FILE, "-e", 'hello("system()", false)')
puts "[parent] success: #{success}"
puts

This results in:

2. Kernel#system
[child] Hello from system()
[child] No stdin, or stdin is same as parent's
[child] Hello, standard error
[parent] success: true

Just like the backtick operator, system doesn’t return until its process has exited, and leaves the process exit status in $?. The sub-process inherits the parent process’ standard input, output, and error streams.

As we can see in the example above, when system() is given multiple arguments they are assembled into a single command for execution. This feature can make system() a little more convenient than backticks for executing complex commands. For this reason and because it’s more visually apparent in the code, I prefer to use Kernel#system over backticks unless I need to capture the command’s output. Note that there are some other ways system() can be called; see the Kernel#exec documentation for the details.

Method #3: Kernel#fork (aka Process.fork)

Ruby provides access to the *NIX fork() system call via Kernel#fork. On UNIX-like OSes, fork splits the currently executing Ruby process in two. Both processes run concurrently and independently from that point on. Unlike the methods we’ve examined so far, fork enables us to execute in-line Ruby code in a sub-process, rather than explicitly starting a new Ruby interpreter and telling it to load our code.

Traditionally we would need to put in some conditional code to examine the return value of fork and determine whether the code was executing in the parent or child process. Ruby makes it easy to specify what code should be run in the child by allowing us to pass a block to fork. The contents of the block will be run in the child process, after which it will exit. The parent will continue running at the point where the block ends.

puts "3. Kernel#fork"
pid = fork do
hello("fork()", false)
end
Process.wait(pid)
puts "[parent] pid: #{pid}"
puts

This produces the following output:

3. Kernel#fork
[child] Hello from fork()
[child] No stdin, or stdin is same as parent's
[child] Hello, standard error
[parent] pid: 19935

Note the call to Process.wait. Since the process spawned by fork runs concurrently with the parent process, we need to explicitly wait for the child process to finish if we want to synchronize with it. We use the child process ID, returned by fork, as the argument to Process.wait.

The sub-process inherits its standard error and output streams from the parent. Since fork is a *NIX-only syscall, it will only reliably work on UNIX-style systems.

Conclusion

In this first installment in the Ruby Sub-processes series we’ve looked at three of the simplest ways to start another Ruby process from inside a Ruby program. Stay tuned for part 2, in which we’ll delve into some methods for doing more complex communication with spawned sub-processes.

Written by avdi

June 30, 2009 at 8:57 am

Posted in Ruby, Tips & Tricks

Tagged with ,

SimpleDB DataMapper Adapter: Progress Report

From the beginning of Devver, we decided we wanted to work with some new technologies and we wanted to be able to scale easily. After looking at options AWS seemed to have many technologies that could help us build and scale a system like Devver. One of these technologies was SimpleDB. One of the other new things we decided to try was DataMapper (DM) rather than the more familiar ActiveRecord. This eventually let me to work on my own SimpleDB DataMapper adapter.

Searching for ways to work with SDB using Ruby, we found a SimpleDB DM adapter by Jeremy Boles. It worked well initially but as our needs grew (and to make it compatible with the current version of DM) it became necessary to add and update the features of the adapter. These changes lived hidden in our project’s code for awhile, for no other reason than we were too lazy to really commit it all back on GitHub. Recently though there has been a renewed interest about working with on SimpleDB with Ruby. I started pushing the code updates on GitHub, then I got a couple requests and suggestions here and there to improve the adapter. One of these suggestions cam from Ara Howard, who is doing impressive work of his own on Ruby and AWS, specifically SimpleDB. His suggestion on moving from the aws_sdb gem to right_aws, which along with other changes improved performance significantly (1.6x on write, up to 36x on reading large queries over the default limit of 100 objects). Besides performance improvements, we have recently added limit and sorting support to the adapter.

#new right_aws branch using AWS select
$ ruby scripts/simple_benchmark.rb
      user     system      total        real
creating 200 users
 1.020000   0.240000   1.260000 ( 35.715608)
Finding all users age 25 (all of them), 100 Times
 59.280000   8.640000  67.920000 ( 99.727380)

#old aws_sdb using query with attributes
$ ruby scripts/simple_benchmark.rb
      user     system      total        real
creating 200 users
  1.290000   0.530000   1.820000 ( 52.916103)
Finding all users age 25 (all of them), 100 Times
  356.640000  53.090000 409.730000 (3574.260988)

view this gist

As I added features, testing the adapter also became slow, (over a minute a run) because the functional tests actually connect to and use SimpleDB. Since Devver is all about speeding up Ruby tests, I decided to get the tests running on Devver. It was actually very easy and sped up the test suite from 1 minute and 8 seconds down to 28 seconds. You can check out how much Devver speeds up the results yourself.

We are currently using the SimpleDB adapter to power our Devver.net website as well as the Devver backend service. It has been working well for us, but we know that it doesn’t cover everyone’s needs. Next time you are creating a simple project, give SimpleDB a look, we would love feedback about the DM adapter, and it would be great to get some other people contributing to the project. If anyone does fork my SDB Adapter Github repo, feel free to send me pull requests. Also, let me know if you want to try using Devver as you hack on the adapter, it can really speed up testing, and I would be happy to give out a free account.

Lastly, at a recent Boulder Ruby users group meet up, the group did a code review for the adapter. It went well and I should finish cleaning up the code and get the improvements suggested by the group committed to GitHub soon.

Update: The refactorings suggested at the code review are now live on GitHub.

Written by DanM

June 22, 2009 at 11:27 am

We're hiring!

We’re looking for an awesome Ruby developer to join our team. Get more details at http://devver.net/jobs.

Written by Ben

June 15, 2009 at 2:05 pm

Posted in Uncategorized

Tagged with

Boulder CTO Lunch with Matt McAdams

Dan usually goes to the Boulder CTO lunches, but he was out of town this month, which meant I had the pleasure of hanging out with some of Boulder’s best and brightest.

This month’s guest was Matt McAdams of TrackVia. TrackVia is an online database that is powerful yet simple enough to be used by people who are used to keeping data in spreadsheets (primarily business people). Matt gave a candid and often hilarious talk that touched on both both technical topics, and, luckily for me, a discussion of pricing and metrics, which are two topics that I’m currently very interested in.

On technology decisions:

Matt wasn’t a database guy originally, but used his practical knowledge he gained working on a previous startup

Went with the simplest design that could work and it’s continued to scale well

Smart technology decisions have allowed TrackVia to compete with a small, lean development team

On product development:

TrackVia started as a contract project for a single customer, but they saw the broader appeal

One of the earliest databases in TrackVia is the bug database (still around).  In other words, they’ve been dogfooding since day one.

They don’t worry about the competition. Instead, they focus on building the features that get people to sign up and pay.

On pricing:

You’ve got try stuff and iterate. TrackVia has changed their pricing several times.

Customers on the old pricing models have always been grandfathered in.

Sometimes raising your price can actually gain customers because some people assume that a cheap product or service must be low-quality (even if it’s actually very high quality).

If big customers really want feature X, it’s OK to ask them to pay extra to accelerate the development of that feature (or to customize their experience).

On metrics:

Good metrics allow you to try different strategies and measure their effect.

You must measure, tweak, and iterate.

If you can iterate on a weekly basis and your competition can iterate on a quarterly basis, you’ll win.

Metrics must continually be improved. TrackVia spends a lot of time tracking useful metrics, but even they know they need to add additional metrics in some key areas.


As usual, the CTO lunch was a great place to hear from other Boulder companies and I learned a lot. Thanks for everyone who attended and special thanks to Matt for leading our discussion.

Written by Ben

June 4, 2009 at 12:08 pm

Posted in Boulder

Tagged with , , ,