The Devver Blog

A Boulder startup improving the way developers work.

Posts Tagged ‘tutorial

A dozen (or so) ways to start sub-processes in Ruby: Part 2

In the previous article we looked at some basic methods for starting subprocesses in Ruby. One thing all those methods had in common was that they didn’t permit a lot of communication between parent process and child. In this article we’ll examine a few built-in Ruby methods which give us the ability to have a two-way conversation with our subprocesses.

The complete source code for this article can be found at http://gist.github.com/146199.

Method #4: Opening a pipe

As you know, the Kernel#open method allows you to open files for reading and writing (and, with addition of the open-uri library, HTTP sockets as well). What you may not know is that Kernel.open can also open processes as if they were files.

  puts "4a. Kernel#open with |"
  cmd = %Q<|#{RUBY} -r#{THIS_FILE} -e 'hello("open(|)", true)'>
  open(cmd, 'w+') do |subprocess|
    subprocess.write("hello from parent")
    subprocess.close_write
    subprocess.read.split("\n").each do |l|
      puts "[parent] output: #{l}"
    end
    puts
  end
  puts "---"

By passing a pipe (“|”) as the first character in the command, we signal to open that we want to start a process, not open a file. For a command, we’re starting another Ruby process and calling our trusty hello method (see the first article or the source code for this article for the definition of the hello method RUBY and THIS_FILE constants).

open yields an IO object which enables us to communicate with the subprocess. Anything written to the object is piped to the process’ STDIN, and the anything the process writes to its STDOUT can be read back as if reading from a file. In the example above we write a line to the child, read some text back from the child, and then end the block.

Note the call to close_write on line 5. This call is important. Because the OS buffers input and output, it is possible to write to a subprocess, attempt to read back, and wait forever because the data is still sitting in the buffer. In addition, filter-style programs typically wait until they see an EOF on their STDIN to exit. By calling close_write, we cause the buffer to be flushed and an EOF to be sent. Once the subprocess exits, its output buffer wil be flushed and any read calls on the parent side will return.

Also note that we pass “w+” as the file open mode. Just as with files, by default the IO object will be opened in read-only mode. If we want to both write to and read from it, we need to specify an appropriate mode.

Here’s the output of the above code:

4a. Kernel#open with |
[child] Hello, standard error
[parent] output: [child] Hello from open(|)
[parent] output: [child] Standard input contains: "hello from parent"

---

Another way to open a command as an IO object is to call IO.popen:

  puts "4b. IO.popen"
  cmd = %Q<#{RUBY} -r#{THIS_FILE} -e 'hello("popen", true)'>
  IO.popen(cmd, 'w+') do |subprocess|
    subprocess.write("hello from parent")
    subprocess.close_write
    subprocess.read.split("\n").each do |l|
      puts "[parent] output: #{l}"
    end
    puts
  end
  puts "---"

This behaves exactly the same as the Kernel#open version. Which way you choose to use is a matter of preference. The IO.popen version arguably makes it a little more obvious what is going on.

Method #5: Forking to a pipe

This is a variation on the previous technique. If Kernel#open is passed a pipe followed by a dash (“|-“) as its first argument, it starts a forked subprocess. This is like the previous example except that instead of executing a command, it forks the running Ruby process into two processes.

  puts "5a. Kernel#open with |-"
  open("|-", "w+") do |subprocess|
    if subprocess.nil?             # child
      hello("open(|-)", true)
      exit
    else                        # parent
      subprocess.write("hello from parent")
      subprocess.close_write
      subprocess.read.split("\n").each do |l|
        puts "[parent] output: #{l}"
      end
      puts
    end
  end
  puts "---"

Both processes then execute the given block. In the child process, the argument yielded to the block will be nil. In the parent, the block argument will be an IO object. As before, the IO object is tied to the forked process’ standard input and standard output streams.

Here’s the output:

5a. Kernel#open with |-
[child] Hello, standard error
[parent] output: [child] Hello from open(|-)
[parent] output: [child] Standard input contains: "hello from parent"

---

Once again, there is an IO.popen version which does the same thing:

  puts "5b. IO.popen with -"
  IO.popen("-", "w+") do |subprocess|
    if subprocess.nil?             # child
      hello("popen(-)", true)
      exit
    else                        # parent
      subprocess.write("hello from parent")
      subprocess.close_write
      subprocess.read.split("\n").each do |l|
        puts "[parent] output: #{l}"
      end
      puts
    end
  end
  puts "---"

Applications and Caveats

The techniques we’ve looked at in this article are best suited for “filter” style subprocesses, where we want to feed some input to a process and then use the output it produces. Because of the potential for deadlocks mentioned earlier, they are less suitable for running highly interactive subprocesses which require multiple reads and responses.

open/popen also do not give us access to the subprocess’ standard error (STDERR) stream. Any output error generated by the subprocesses will print the same place that the parent process’ STDERR does.

In the upcoming parts of the series we’ll look at some libraries which overcome both of these limitations.

Conclusion

In this article we’ve explored two (or four, depending on how you count it) built-in ways of starting a subprocess and communicating with it as if it were a file. In part 3 we’ll move away from built-ins and on to the facilities provided in Ruby’s Standard Library for starting and controlling subprocesses.

Advertisements

Written by avdi

July 13, 2009 at 4:29 pm

Posted in Ruby, Tips & Tricks

Tagged with ,

A dozen (or so) ways to start sub-processes in Ruby: Part 1

Introduction

It is often useful in Ruby to start a sub-process to run a particular chunk of Ruby code. Perhaps you are trying to run two processes in parallel, and Ruby’s green threading doesn’t provide sufficient concurrency. Perhaps you are automating a set of scripts. Or perhaps you are trying to isolate some untrusted code while still getting information back from it.

Whatever the reason, Ruby provides a wealth of facilities for interacting with sub-processes, some better known than others. In this series of articles I will be focusing on running Ruby as a sub-process of Ruby, although many of the techniques I’ll be demonstrating are applicable to running any type of program in a sub-process. I’ll also be keeping the focus on UNIX-style platforms, such as Linux and Mac OS X. Sub-process handling on Windows differs significantly, and we’ll leave that for another series.

In the first and second articles, I’ll demonstrate some of the facilities for starting sub-processes that Ruby possesses out-of-the-box, no requires needed. In the third article we’ll look at some tools provided in Ruby’s Standard Library which build on the methods introduced in part one. And in the fourth instalment I’ll briefly survey a few of the many Rubygems which simplify sub-process interactions.

Getting Started

To begin, let’s define a few helper methods and constants which we’ll refer back to throughout the series. First, let’s define a simple method which will serve as our “slave” code – the code we want to execute in a sub-process. Here it is:

def hello(source, expect_input)
  puts "[child] Hello from #{source}"
  if expect_input
    puts "[child] Standard input contains: \"#{$stdin.readline.chomp}\""
  else
    puts "[child] No stdin, or stdin is same as parent's"
  end
  $stderr.puts "[child] Hello, standard error"
end

(Note: The full source code for this article can be found at http://gist.github.com/137705)

This method prints a message to the standard output stream, a message to the standard error stream, and optionally reads and prints a message from the standard input stream. One of the things we’ll be exploring in this series is the differing ways in which the various sub-process-starting methods handle standard I/O streams.

Next, let’s define a couple of helpful constants.

require 'rbconfig'
THIS_FILE = File.expand_path(__FILE__)

RUBY = File.join(Config::CONFIG['bindir'], Config::CONFIG['ruby_install_name'])

The first, THIS_FILE, is simply the fully-qualified name of the file containing our demo source code. RUBY, the second constant, is set to the fully-qualified path of the running Ruby executable. These constants will come in handy with sub-process methods which require an explicit shell command to be run.

In order to make the order of events clearer, we’ll force the standard output stream into synchronised mode. This will cause it to flush its buffer after every write.

$stdout.sync = true

Finally, we’ll be surrounding all of the code which follows in the following protective IF-statement:

if $PROGRAM_NAME == __FILE__
# ...
end

This will ensure that the demo code won’t be re-executed when we require the source file within sub-processes.

Method #1: The Backtick Operator

The simplest way to execute a sub-process in Ruby is with the backtick (]`). This method, which harks back to Bourne Shell scripting and Perl, is concise and often gives us exactly as much interaction as we need with a sub-process. The backtick, while it may look like a part of Ruby’s core syntax, is technically an operator defined by Kernel. Like most Ruby operators it can be redefined in your own code, although that’s beyond the scope of this article. Kernel defines the backtick operator as a method which executes its argument in a subshell.

puts "1. Backtick operator"
output = `#{RUBY} -r#{THIS_FILE} -e'hello("backticks", false)'`
output.split("\n").each do |line|
  puts "[parent] output: #{line}"
end
puts

Here, we use backticks to execute a child Ruby process which loads our demo source code and executes the hello method. This yields:

1. Backtick operator
[child] Hello, standard error
[parent] output: [child] Hello from backticks
[parent] output: [child] No stdin, or stdin is same as parent's

The backtick operator doesn’t return until the command has finished. The sub-process inherits its standard input and standard error streams from the parent process. The process’ ending status is made available as a Process::Status object in the $? global (aka $CHILD_STATUS if the English library is loaded).

We can use the %x operator as an alternate syntax for backticks, which enables us to select arbitrary delimiters for the command string. E.g. %x{echo `which cowsay`}.

Method #2: Kernel#system

Kernel#system is similar to the backtick operator in operation, with one important difference. Where the backtick operator returns the STDOUT of the finished command, system returns a Boolean value indicating the success or failure of the command. If the command exits with a zero status (indicating success), system will return true. Otherwise it returns false.

puts "2. Kernel#system"
success = system(RUBY, "-r", THIS_FILE, "-e", 'hello("system()", false)')
puts "[parent] success: #{success}"
puts

This results in:

2. Kernel#system
[child] Hello from system()
[child] No stdin, or stdin is same as parent's
[child] Hello, standard error
[parent] success: true

Just like the backtick operator, system doesn’t return until its process has exited, and leaves the process exit status in $?. The sub-process inherits the parent process’ standard input, output, and error streams.

As we can see in the example above, when system() is given multiple arguments they are assembled into a single command for execution. This feature can make system() a little more convenient than backticks for executing complex commands. For this reason and because it’s more visually apparent in the code, I prefer to use Kernel#system over backticks unless I need to capture the command’s output. Note that there are some other ways system() can be called; see the Kernel#exec documentation for the details.

Method #3: Kernel#fork (aka Process.fork)

Ruby provides access to the *NIX fork() system call via Kernel#fork. On UNIX-like OSes, fork splits the currently executing Ruby process in two. Both processes run concurrently and independently from that point on. Unlike the methods we’ve examined so far, fork enables us to execute in-line Ruby code in a sub-process, rather than explicitly starting a new Ruby interpreter and telling it to load our code.

Traditionally we would need to put in some conditional code to examine the return value of fork and determine whether the code was executing in the parent or child process. Ruby makes it easy to specify what code should be run in the child by allowing us to pass a block to fork. The contents of the block will be run in the child process, after which it will exit. The parent will continue running at the point where the block ends.

puts "3. Kernel#fork"
pid = fork do
hello("fork()", false)
end
Process.wait(pid)
puts "[parent] pid: #{pid}"
puts

This produces the following output:

3. Kernel#fork
[child] Hello from fork()
[child] No stdin, or stdin is same as parent's
[child] Hello, standard error
[parent] pid: 19935

Note the call to Process.wait. Since the process spawned by fork runs concurrently with the parent process, we need to explicitly wait for the child process to finish if we want to synchronize with it. We use the child process ID, returned by fork, as the argument to Process.wait.

The sub-process inherits its standard error and output streams from the parent. Since fork is a *NIX-only syscall, it will only reliably work on UNIX-style systems.

Conclusion

In this first installment in the Ruby Sub-processes series we’ve looked at three of the simplest ways to start another Ruby process from inside a Ruby program. Stay tuned for part 2, in which we’ll delve into some methods for doing more complex communication with spawned sub-processes.

Written by avdi

June 30, 2009 at 8:57 am

Posted in Ruby, Tips & Tricks

Tagged with ,