The Devver Blog

A Boulder startup improving the way developers work.

A dozen (or so) ways to start sub-processes in Ruby: Part 1

Introduction

It is often useful in Ruby to start a sub-process to run a particular chunk of Ruby code. Perhaps you are trying to run two processes in parallel, and Ruby’s green threading doesn’t provide sufficient concurrency. Perhaps you are automating a set of scripts. Or perhaps you are trying to isolate some untrusted code while still getting information back from it.

Whatever the reason, Ruby provides a wealth of facilities for interacting with sub-processes, some better known than others. In this series of articles I will be focusing on running Ruby as a sub-process of Ruby, although many of the techniques I’ll be demonstrating are applicable to running any type of program in a sub-process. I’ll also be keeping the focus on UNIX-style platforms, such as Linux and Mac OS X. Sub-process handling on Windows differs significantly, and we’ll leave that for another series.

In the first and second articles, I’ll demonstrate some of the facilities for starting sub-processes that Ruby possesses out-of-the-box, no requires needed. In the third article we’ll look at some tools provided in Ruby’s Standard Library which build on the methods introduced in part one. And in the fourth instalment I’ll briefly survey a few of the many Rubygems which simplify sub-process interactions.

Getting Started

To begin, let’s define a few helper methods and constants which we’ll refer back to throughout the series. First, let’s define a simple method which will serve as our “slave” code – the code we want to execute in a sub-process. Here it is:

def hello(source, expect_input)
  puts "[child] Hello from #{source}"
  if expect_input
    puts "[child] Standard input contains: \"#{$stdin.readline.chomp}\""
  else
    puts "[child] No stdin, or stdin is same as parent's"
  end
  $stderr.puts "[child] Hello, standard error"
end

(Note: The full source code for this article can be found at http://gist.github.com/137705)

This method prints a message to the standard output stream, a message to the standard error stream, and optionally reads and prints a message from the standard input stream. One of the things we’ll be exploring in this series is the differing ways in which the various sub-process-starting methods handle standard I/O streams.

Next, let’s define a couple of helpful constants.

require 'rbconfig'
THIS_FILE = File.expand_path(__FILE__)

RUBY = File.join(Config::CONFIG['bindir'], Config::CONFIG['ruby_install_name'])

The first, THIS_FILE, is simply the fully-qualified name of the file containing our demo source code. RUBY, the second constant, is set to the fully-qualified path of the running Ruby executable. These constants will come in handy with sub-process methods which require an explicit shell command to be run.

In order to make the order of events clearer, we’ll force the standard output stream into synchronised mode. This will cause it to flush its buffer after every write.

$stdout.sync = true

Finally, we’ll be surrounding all of the code which follows in the following protective IF-statement:

if $PROGRAM_NAME == __FILE__
# ...
end

This will ensure that the demo code won’t be re-executed when we require the source file within sub-processes.

Method #1: The Backtick Operator

The simplest way to execute a sub-process in Ruby is with the backtick (]`). This method, which harks back to Bourne Shell scripting and Perl, is concise and often gives us exactly as much interaction as we need with a sub-process. The backtick, while it may look like a part of Ruby’s core syntax, is technically an operator defined by Kernel. Like most Ruby operators it can be redefined in your own code, although that’s beyond the scope of this article. Kernel defines the backtick operator as a method which executes its argument in a subshell.

puts "1. Backtick operator"
output = `#{RUBY} -r#{THIS_FILE} -e'hello("backticks", false)'`
output.split("\n").each do |line|
  puts "[parent] output: #{line}"
end
puts

Here, we use backticks to execute a child Ruby process which loads our demo source code and executes the hello method. This yields:

1. Backtick operator
[child] Hello, standard error
[parent] output: [child] Hello from backticks
[parent] output: [child] No stdin, or stdin is same as parent's

The backtick operator doesn’t return until the command has finished. The sub-process inherits its standard input and standard error streams from the parent process. The process’ ending status is made available as a Process::Status object in the $? global (aka $CHILD_STATUS if the English library is loaded).

We can use the %x operator as an alternate syntax for backticks, which enables us to select arbitrary delimiters for the command string. E.g. %x{echo `which cowsay`}.

Method #2: Kernel#system

Kernel#system is similar to the backtick operator in operation, with one important difference. Where the backtick operator returns the STDOUT of the finished command, system returns a Boolean value indicating the success or failure of the command. If the command exits with a zero status (indicating success), system will return true. Otherwise it returns false.

puts "2. Kernel#system"
success = system(RUBY, "-r", THIS_FILE, "-e", 'hello("system()", false)')
puts "[parent] success: #{success}"
puts

This results in:

2. Kernel#system
[child] Hello from system()
[child] No stdin, or stdin is same as parent's
[child] Hello, standard error
[parent] success: true

Just like the backtick operator, system doesn’t return until its process has exited, and leaves the process exit status in $?. The sub-process inherits the parent process’ standard input, output, and error streams.

As we can see in the example above, when system() is given multiple arguments they are assembled into a single command for execution. This feature can make system() a little more convenient than backticks for executing complex commands. For this reason and because it’s more visually apparent in the code, I prefer to use Kernel#system over backticks unless I need to capture the command’s output. Note that there are some other ways system() can be called; see the Kernel#exec documentation for the details.

Method #3: Kernel#fork (aka Process.fork)

Ruby provides access to the *NIX fork() system call via Kernel#fork. On UNIX-like OSes, fork splits the currently executing Ruby process in two. Both processes run concurrently and independently from that point on. Unlike the methods we’ve examined so far, fork enables us to execute in-line Ruby code in a sub-process, rather than explicitly starting a new Ruby interpreter and telling it to load our code.

Traditionally we would need to put in some conditional code to examine the return value of fork and determine whether the code was executing in the parent or child process. Ruby makes it easy to specify what code should be run in the child by allowing us to pass a block to fork. The contents of the block will be run in the child process, after which it will exit. The parent will continue running at the point where the block ends.

puts "3. Kernel#fork"
pid = fork do
hello("fork()", false)
end
Process.wait(pid)
puts "[parent] pid: #{pid}"
puts

This produces the following output:

3. Kernel#fork
[child] Hello from fork()
[child] No stdin, or stdin is same as parent's
[child] Hello, standard error
[parent] pid: 19935

Note the call to Process.wait. Since the process spawned by fork runs concurrently with the parent process, we need to explicitly wait for the child process to finish if we want to synchronize with it. We use the child process ID, returned by fork, as the argument to Process.wait.

The sub-process inherits its standard error and output streams from the parent. Since fork is a *NIX-only syscall, it will only reliably work on UNIX-style systems.

Conclusion

In this first installment in the Ruby Sub-processes series we’ve looked at three of the simplest ways to start another Ruby process from inside a Ruby program. Stay tuned for part 2, in which we’ll delve into some methods for doing more complex communication with spawned sub-processes.

About these ads

Written by avdi

June 30, 2009 at 8:57 am

Posted in Ruby, Tips & Tricks

Tagged with ,

26 Responses

Subscribe to comments with RSS.

  1. Interesting, thanks for sharing

    Fran J.

    June 30, 2009 at 9:40 am

  2. I wrote a script literally two hours ago that needed to shell out to a sub-process. I ended up using IO.popen, which I assume you'll be getting to in the next article.

    I wanted to add, one pain point of the backtick operator (and probably system method) is that your input needs to escape special shell characters like quotes. I've seen errors when input is not properly escaped.

    Thanks for the post! Looking forward to the next article!

    faithfulgeek

    June 30, 2009 at 9:52 am

  3. That's a good point about shell characters, and another reason to use the multi-argument form of system() when possible.

    And yes, popen() and friends come next :-)

    Avdi

    June 30, 2009 at 12:07 pm

  4. Glad you liked it!

    Avdi

    June 30, 2009 at 12:08 pm

  5. I probably should have clarified in that last comment: given a single argument, system() will behave like backticks and do shell interpretation. But given multiple arguments no shell interpretation will be done.

    Avdi

    June 30, 2009 at 12:09 pm

  6. You forgot about Kernel.exec.

    sporkmonger

    June 30, 2009 at 4:42 pm

  7. I didn't forget about it. I'm specifically focusing on methods for starting up child processes under the control of a parent. While exec() technically starts a child process, the child process replaces the parent, which puts it outside the scope of the series.

    Thanks for the attention to detail, though – you're right that any discussion of starting processes in Ruby is incomplete without mentioning exec().

    Avdi

    June 30, 2009 at 4:49 pm

  8. True, exec() won't start a child process in the sense you were covering in the post.

    I've found it to be incredibly useful however, especially within the context of rake. You can use it to wrap all kinds of operations in a rake command without actually incurring the overhead that would be present if rake were to remain memory resident while the process doing the real work finishes.

    I think a lot of newer Ruby programmers are unaware of its existence.

    sporkmonger

    June 30, 2009 at 4:57 pm

  9. Great series, looking forward to the next installment. Don't forget about PTY.spawn when you talk about popen and friends. :) (I think it is just IO.spawn in 1.9.)

    Ben Mabey

    June 30, 2009 at 5:33 pm

  10. That one's new to me! I'll be sure to research it for one of the instalments. Thanks!

    avdi

    June 30, 2009 at 9:35 pm

  11. So by input you mean command line input (not standard input) right?

    charlieok39431

    July 6, 2009 at 12:18 am

  12. Great article. I cant wait for the next 2. I would like to know why I cant do:
    /etc/init.d/something start
    using system or backtips? The ruby proccess simple freeze, waiting for I dont know what.

    Neves

    July 8, 2009 at 6:44 pm

  13. Avdi this is a rad article! I've used these techniques for awhile but often have to go look up the exact details when I want to do this; this will help me remember the semantics better. bravo!

    Mike Subelsky

    July 10, 2009 at 10:47 am

  14. […] the previous article we looked at some basic methods for starting subprocesses in Ruby. One thing all those methods had […]

  15. […] A dozen (or so) ways to start sub-processes in Ruby […]

  16. Hey good post. Just a few days ago I was trying to figure out the overall impact of fork on my available memory. Eg if I call fork from inside of rails does that create another instance of my rails app in memory (that'd be bad) or does it somehow limit memory consumption to just what is needed by the block?

    The reason I was getting into this was to possibly use fork whenever I delivered an email with ActionMailer. Thus would alleviate the need for something like delayed job. But obviously I want to be careful in letting my app fork a bunch of email processes. Do you have any thoughts on all of this?

    Josh

    October 21, 2009 at 8:18 am

  17. Easy way to build a "reconnect wrapper" for ssh (or anything else):
    $bash_line = 'ssh localhost'
    while 1 do
    `clear`
    pid = Kernel.fork do
    Kernel.exec(bash_line)
    end
    Proccess.wait(pid)
    get_enter
    end

    #{TROLL}

    December 14, 2009 at 2:16 pm

  18. -sigh- the 'Kernel.exec(bash_line)' should be 'Kernel.exec($bash_line)' sorry.

    #{TROLL}

    December 14, 2009 at 2:17 pm

  19. […] Posted July 13, 2009 Filed under: Ruby, Tips & Tricks | Tags: Ruby, tutorial | In the previous article we looked at some basic methods for starting subprocesses in Ruby. One thing all those methods had […]

  20. […] Filed under: Development, Ruby | Tags: processes, Ruby, shell, subprocesses, terminal, unix | In part 1 and part 2 of this series, we took a look at some of Ruby’s built-in ways to start […]

  21. Very instructive – continue to spread your message. Looking forward to an update

    Timmy Mcirvin

    May 14, 2010 at 8:47 pm

  22. […] reading in The Devver Blog GA_googleAddAttr("AdOpt", "1"); GA_googleAddAttr("Origin", "other"); […]

  23. I’ve read through your 3 parts on running external programs, but saw no mention of %x{foo args} that I’ve been seeing in code included with puppet and facter. It’s a neat feature, but I’d like to know more details about it and how does one Google for such a cryptic symbolic notation?

    Can you please shed some light on this feature?

    John Florian

    August 2, 2011 at 6:20 pm

    • John, %x{} (or %x(), %x[], etc.) is just the more “formal” version of the backquote syntax. It runs the string as a shell command and returns the output as a string.

      Avdi Grimm

      August 3, 2011 at 4:44 pm

  24. […] A dozen (or so) ways to start sub-processes in Ruby: Part 1 (Part 2, Part 3) […]

  25. […] Part 1: Backticks and system() […]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: