The Devver Blog

A Boulder startup improving the way developers work.

Archive for October 2008

Someone please build an awesome embeddable code widget

One awesome thing about working at a startup is that you get to focus very deeply on the problem you’re trying to solve. On the other hand, if you’ve taken the leap and founded a startup, it’s probably because you tend to see solutions and opportunities everywhere. It can be really hard to focus on one thing when you often have ideas for services that you’d like to use, or better yet, build.

Dan and I regulary talk about services that we wish existed but we simply can’t work on due to our commitment to Devver. The other day we were discussing one problem we wish someone would solve: why can’t we easily post nicely formatted code in our blog posts?

All I want is this: I copy/paste some code into a web site, choose the programming language, copy some widget code and paste that code into my blog. The code is indented and formatted, has syntax coloration, wraps correctly (for any iPhone readers) and can be easy copied/pasted. Including line numbers (that don’t mess with copy/paste) is a bonus.

In other words, I just want Pastie in my blog posts.

Yes, I know there are a few really nice projects that you can install on your server that will do all this. But we could all host our own video as well, but it’s just easier to upload and embed a video on YouTube or Vimeo.

This wouldn’t just have to be for the good of humanity either. Such a service could make money off ads (each widget could have a link to the full-screen code on the main site, which could have ads for programming jobs, books, and conferences) or even sell off the data about which programming languages were most popular (in blogs and on the main site).

Maybe there is a solution for this (if there is, please let me know in the comments. I’m more than willing to publicly display my ignorance in order to learn about it), but if there is, I don’t see it widely being used and it’s not easy to find on Google. If there isn’t (yet), please go forth and build. I’ll be anxiously waiting.

Written by Ben

October 30, 2008 at 7:53 am

Posted in Development, Hacking

Ruby Beanstalkd distributed worker basics

At Devver we have a lot of jobs to do quickly, so we distribute our work out to a group of EC2 workers. We have tried and used a number of queuing solutions with Ruby, but in the end beanstalkd seemed to be the best solution for us at the time.

I have only seen a few posts about the basics of using beanstalkd with Ruby. I decided to make two posts evolving a simple Ruby beanstalkd example into a more complicated example. This way people new to beanstalkd could see how easy it can be to get up and running with distributed processing using Ruby and beanstalkd. Then people that are doing more advanced work with beanstalkd could see some examples of how we are working with it here at Devver. It would also be great for more experienced beanstalkd warriors to share their thoughts as there aren’t many examples out in the wild. The lack of examples makes it harder to learn and difficult to decide what the best practices are when working with beanstalkd queues.

I have also shared two scripts we have found useful while working with beanstalkd. beanstalk_monitor.rb, which lets you see all the queue statistics about current usage, or to monitor the information of a single queue you are interested in. Finally, beanstalk_killer.rb, which is useful if you want to work on how your code will react to beanstalkd getting backed up or stalling (in beanstalkd speak, “Putting on the brakes”). It was a little harder to pull everything out and make a simple example from our code than I thought, and obviously the example is a bit useless. It should still give a solid example of how to do the basics of distributing jobs with beanstalkd.

For those new to beanstalk, there are a few things you will need to know like how to get a queue object, how to put objects on the queue, how to take objects off the queue, and how to control which queue you are working with. For a higher level overview or more detailed information, I recommend checking out the beanstalkd FAQ. The full example code is below, but first taking a look at the basic snippets might help.

#to work with beanstalk you need to get a client connection
queue = Beanstalk::Pool.new(["#{SERVER_IP}:#{DEFAULT_PORT}"])
#by default you will be working on the 'default' tube or queue
#if we wanted to work on a different queue we could change tubes, like so
queue.watch('test_queue')
queue.use('test_queue')
queue.ignore('default')
#to put a simple string on a queue
queue.put('hello queue world')
#to receive a simple string
job = queue.reserve
puts job.body #prints 'hello queue world'
#if you don't delete the job when you're done, the queue assumes there is an error
#and the job will show back up on the queue again
job.delete

How to run this example (on OS X, with macports installed)

> sudo port install beanstalkd
> sudo gem install beanstalk-client
> beanstalkd
> ruby beanstalk_tester.rb

Download: beanstalk_tester.rb

require 'beanstalk-client.rb'

DEFAULT_PORT = 11300
SERVER_IP = '127.0.0.1'
#beanstalk will order the queues based on priority, with the same priority
#it acts FIFO, in a later example we will use the priority
#(higher numbers are higher priority)
DEFAULT_PRIORITY = 65536
#TTR is time for the job to reappear on the queue.
#Assuming a worker died before completing work and never called job.delete
#the same job would return back on the queue (in seconds)
TTR = 3

class BeanBase


  #To work with multiple queues you must tell beanstalk which queues
  #you plan on writing to (use), and which queues you will reserve jobs from
  #(watch). In this case we also want to ignore the default queue
  def get_queue(queue_name)
    queue = Beanstalk::Pool.new(["#{SERVER_IP}:#{DEFAULT_PORT}"])
    queue.watch(queue_name)
    queue.use(queue_name)
    queue.ignore('default')
    queue
  end

end

class BeanDistributor < BeanBase

  def initialize(amount)
    @messages = amount
  end

  def start_distributor
    #put all the work on the request queue
    bean_queue = get_queue('requests')
    @messages.times do |num|
      msg = BeanRequest.new(1,num)
      #Take our ruby object and convert it to yml and put it on the queue
      bean_queue.yput(msg,pri=DEFAULT_PRIORITY, delay=0, ttr=TTR)
    end

    puts "distributor now getting results"
    #get all the results from the results queue
    bean_queue = get_queue('results')
    @messages.times do |num|
      result = take_msg(bean_queue)
      puts "result: #{result}"
    end

  end

  #this will take a message off the queue, process it and return the result
  def take_msg(queue)
    msg = queue.reserve
    #by calling ybody we get the content of the message and convert it from yml
    count = msg.ybody.count
    msg.delete
    return count
  end

end

class BeanWorker < BeanBase

  def initialize(amount)
    @messages = amount
    @received_msgs = 0
  end

  def start_worker
    results = []
    #get and process all the requests, on the requests queue
    bean_queue = get_queue('requests')
    @messages.times do |num|
      result = take_msg(bean_queue)
      results << result
      @received_msgs += 1
    end

    #return all of the results, by placing them on the separate results queue
    bean_queue = get_queue('results')
    results.each do |result|
      msg = BeanResult.new(1,result)
      bean_queue.yput(msg,pri=DEFAULT_PRIORITY, delay=0, ttr=TTR)
    end

    #this is just to pass information out of the forked process
    #we return the number of messages we received as our exit status
    exit @received_msgs
  end

  #this will take a message off the queue, process it and return the result
  def take_msg(queue)
    msg = queue.reserve
    #by calling ybody we get the content of the message and convert it from yml
    count = msg.ybody.count
    result = count*count
    msg.delete
    return result
  end

end

############
# These are just simple message classes that we pass using beanstalks
# to yml and from yml functions.
############
class BeanRequest
  attr_accessor :project_id, :count
  def initialize(project_id, count=0)
    @project_id = project_id
    @count = count
  end
end

class BeanResult
  attr_accessor :project_id, :count
  def initialize(project_id, count=0)
    @project_id = project_id
    @count = count
  end
end

#write X messages on the queue
numb = 10

recv_count = 0

# Most of the time you will have two entirely seperate classes
# but to make it easy to run this example we will just fork and start our server
# and client seperately. We will wait for them to complete and check
# if we received all the messages we expected.
puts "starting distributor"
server_pid = fork {
  BeanDistributor.new(numb).start_distributor
}

puts "starting client"
client_pid = fork {
  BeanWorker.new(numb).start_worker
}

Process.wait(client_pid)
recv_count = $?.exitstatus
puts "client finished received #{recv_count} msgs"
if(numb==recv_count)
  puts "received the expected number of messages"
else
  puts "error didn't receive the correct number of messages"
end

Process.wait(server_pid)

Written by DanM

October 28, 2008 at 2:35 pm

iPhone Apps I would like to have

I know everyone has posted a list of the best/must-have iPhone apps. I am sure many people have also posted lists of Apps they would like to have, but it is amazingly fun so I decided I should do it to.

This is my list of apps and solutions I want for the iPhone. They don’t have to all necessarily be native apps. If you know of an web-app (that works on the iPhone) which provides the functionality I am looking for, let me know.

I want something like Brain Age for the GameBoy on the iPhone. Spending a few minutes doing little puzzles and math when I have downtime seems like a better use of my time then just playing random games. I have Brain Tuner, which is nice, but I want some more options/different puzzles.

I want a Google Contacts to Apple contacts one-time syncer, All my iPhone contacts are missing their emails. All my Gmail contacts are missing their phone numbers, someone help me sync that up.

I want full syncing between my Google calendar and the native apple calendar app. I always had this and it was really easy to do on pocket PC. I want a full 2 way sync. Google and Apple seem pretty buddy buddy, so get on this.

I want flash cards, with prebuilt decks. I would like to be able to work my way through some word decks building my vocabulary. I also would love to have some Spanish/English decks. I am working on improving my Spanish by listening to Coffee Break Spanish, and having a Spanish study deck would be great.

I want an EBook reader – oh wait, someone just pointed one out to me (thanks Matt) it is called Stanza. It’s seriously awesome, if they add a screen reader, it would be perfect. I could listen to some classics books while jogging/driving.

I want GPS tracking that works. I have a great iPhone app called Trailguru, which tracks movement/location with GPS and can tell me the speed and total distance I go when I run. The problem is my GPS seems to stop working after a day or two, and won’t come back until I restart the iPhone or re-sync. Then I want a driving direction helper, something that says out loud, “turn approaching in 200 feet,” just like every in-car navigation system.

I want an on-the-go web reader (Ben shared this idea with me). This would offer a way to open/transfer all the tabs or URLs currently in my browser over to the iPhone. Preferably, it would open them all in an offline mode allowing me to then read the articles through out my day, while being on the move. I really would like if this wasn’t even in Safari since that crashes the iPhone too often.

I want a full Flash player, or at the very least the ability to play Flash videos. There are so many sites with Flash videos and streaming video that are useless on the iPhone. I really wanted to watch the debates on my iPhone because I had to leave in the middle, but while every site on the internet was streaming the debates, not one of them had a way to view the debate on an iPhone. Apparently Adobe has confirmed working on Flash, but Apple is likely to block it. Screw you Apple, even Pocket PC phones years ago had Flash.

I want streaming internet radio. Yes, Pandora and a few others are nice, but why can’t I just browse and listen to any streaming net radio station? It would be even better if it could allow me to browse many of the well known stations (shoutcast), with out having to search through iPhone’s browser.

I want something similar to Elasticfox for managing and monitoring EC2 instances on the iPhone. Actually beyond that it would be cool to be able to manage a few scripts and SSH credentials on a site. It wouldn’t allow arbitrary SSH, but you could store ssh login keys and a few scripts it could run and return the results. This would allow you to ahead of time write some scripts to monitor, clean up, restart or do other tasks, which you could then execute and verify the results of remotely over your iPhone. A sysadmin’s dream, until then I have pTerm for slow clumsy SSH.

That is all I have for now, but if you have thought of iPhone apps you would like to have leave some comments. If you know of a solution to any of the problems I mention above let me know. Some of these apps I would be willing to pay for so developers get busy.

Written by DanM

October 17, 2008 at 8:45 am

Posted in Misc

Tracking down open files with lsof

The other day I was running in a weird error on Devver. After running around twenty test runs on the system, the component that actually runs individual unit tests was crashing due to “Too many open files – (Errno::EMFILE)”

Unfortunately, I didn’t know much more than that. Which files were being kept open? I knew that this component loaded quite a few files, and that by default, OS X only allows 256 open file descriptors (

ulimit -n

will tell you the default on your system). If this was a valid case of needing to load more files, I could just up the limit using

ulimit -n <bigger_number>

.

Fortunately, a quick Google or two pointed the way to

lsof

. Unfortunately, my Unix-fu is never nearly as good as I wish and I didn’t know much about this handy utility. But I quickly discovered that it’s very useful for tracking down problems like this. I quickly used

ps

to find the PID of the Devver process and then a quick

lsof -p <PID>

displayed all the files that the process had open. So easy!

Sure enough, there were a ton of redundant file handles to the file that we use to store information about the Devver run. Armed with this information, it was easy to find the buggy code where we called File.open but failed to ever close the file.

Unfortunately, I still don’t know how to write a good unit test for this case. I guess I could do something ugly like call sytem(“lsof -p pid | wc -l”) before and after calling the code and make sure the number of descriptors stays constant, but that’s really ugly. Is there a way to test this within Ruby? I’m open to ideas.

Still, it’s always good to learn more about a powerful Unix tool. I’m constanly amazed by the power and depth of the Unit tool set.

Written by Ben

October 9, 2008 at 12:23 pm

Sending Files with EventMachine

Devver has to keep the developer’s environment synchronized with our servers. To do this our Devver client sends all of the project files to our servers. We currently have a EventMachine client transfer files over SSL to a EventMachine server. We went through various stages and methods of sending files with EventMachine before finding a good solution. On smaller projects we didn’t even realize how bad our performance was. After bringing up some larger projects we realized we needed to look more into our file transfer performance. Since I couldn’t find much out on the web about this, I thought sharing some examples of how we had set up our EventMachine clients and servers to send files might be useful to someone else out there.

I got some help from people on the EventMachine mailing list, here is the thread discussing sending large files with EventMachine.

Since I was already playing around with a few of our options, I decided to do some comparisons between using EM.send_data, EM.stream_file_data, an alternative buffer recommended by James Tucker, our crappy buffer we have been using before we discovered the default EM BufferedTokenizer, and layering compression on top of the various methods. We had hacked together our buffer tokenizer rather quickly, and it always performed well enough in our initial testing, but it shows why performance tests are worth a little bit of effort. The benchmarks on the various setups are below (This was all done on localhost, it is worth noting that compression helps much more between remote servers).

Sending log file with compression turned off (5 times)
OurBadBufferedTokenizer: 10.40 s
Standard EM BufferedTokenizer: 0.93 s
Tucker’s BufferedTokenizer: 0.92 s
steam_file_data w/ EM BufferedTokenizer: 0.98 s

Sending log file with compression turned on (5 times)
OurBadBufferedTokenizer: 1.02 s
Standard EM BufferedTokenizer: 0.99 s
Tucker’s BufferedTokenizer: 1.04 s
steam_file_data w/ EM BufferedTokenizer: N/A can’t use stream_file_data with on the fly compression

Sending compressed Mp3 file with compression turned off (5 times)
OurBadBufferedTokenizer: 18.55 s
Standard EM BufferedTokenizer: 1.09 s
Tucker’s BufferedTokenizer: 1.10 s
steam_file_data w/ EM BufferedTokenizer: 1.22 s

Adding compression to already compressed files like mp3s doesn’t change the time significantly. This is a longer run just to show how the times vary with a larger test. I also tested on full projects and the variance seemed to hold.
Sending compressed Mp3 file with compression turned off (25 times)
OurBadBufferedTokenizer: N/A takes too long
Standard EM BufferedTokenizer: 5.70 s
Tucker’s BufferedTokenizer: 4.38 s
steam_file_data w/ EM BufferedTokenizer: 4.82 s

Below are the little tests and examples I was working with. Obviously you won’t have the same files on your system or Tucker’s buffer, so I packed everything up as zip. To try everything out just download the EventMachine sending files tests zip. Then extract, and run ‘ruby em_send_file_test.rb’. Any thoughts or feedback are welcome, I am still learning the ins and outs of EventMachine so feel free to send me any tips.

dir = File.expand_path(File.dirname(__FILE__))
unless($LOAD_PATH.member?(dir))
  $LOAD_PATH.unshift(dir)
end

require 'test/unit'
require 'eventmachine'
require 'zlib'
require 'yaml'
require 'ruby-debug'
require 'buffered_tokenizer_pastie'
require 'benchmark'

Thread.abort_on_exception = true

SERVER_PORT = 7999
SERVER_IP = '127.0.0.1'
TOKEN = "|DEFAULTDELIMITED|"
#check with different types of files compression
#results varies a bunch for txt vs compressed like mp3
FILE_NAME = '~/development.log'
#FILE_NAME = '~/Blue.mp3'
COMPRESS = false
#COMPRESS = true

TIMES = 5

class EmClientExample < EventMachine::Connection

  def unbind
    puts "client connection has terminated"
  end

  def process(data)
    puts "client got data: #{data}"
    send_files() if data=="success"
    send(prepare("some_msg")) if data=="filesDone"
    send(prepare("quit")) if data=="ack"
    if(data=="goodbye")
      puts "Client successfully sent all data shutting down!!!!"
      EventMachine::stop_event_loop
    end
  end

  def send_files()
    puts "sending files"
    @files = Array.new(TIMES,[FILE_NAME, Time.now.to_s])
    send_files_loop
  end

  def send_files_loop
    if @files && @files.length > 0
      file = @files.shift
      EM.next_tick do
        send_file(file[0],file[1])
        send_files_loop
      end
    else
      puts "done syncing files"
      send(prepare("files_completed"))
    end
  end

  def send_file(path,mtime)
    puts "Syncing "+path
    contents = File.read(File.expand_path(path))
    contents = Zlib::Deflate.deflate(contents,Zlib::BEST_SPEED) if COMPRESS
    send(prepare("send_file #{path}, #{mtime}, content:#{contents}"))
  end

  def send(str)
    #puts "sending: #{str}"
    send_data str
  end

  def prepare(str)
    str+TOKEN
  end

  def self.push_start()
    EventMachine.connect(SERVER_IP,SERVER_PORT,self) do |c|
      c.send_files()
    end
  end

end

class EmClientExampleBadBuffer < EmClientExample

  attr_accessor :buffer

  def initialize(*args)
    super
    @buffer = DataBuffer.new
  end

  def receive_data(data)
    @buffer.append(data)
    while(command = @buffer.grab)
      process(command)
    end
  end

  def prepare(str)
    @buffer.prepare(str)
  end

end

class EmClientExampleBuffToken < EmClientExample

  def initialize(*args)
    super
    @recv_buffer = BufferedTokenizer.new(TOKEN)
  end

  def receive_data(data)
    @recv_buffer.extract(data).each do |m|
      process(m)
    end
  end

end

class EmClientExampleStreamBuffToken < EmClientExample

  def initialize(*args)
    super
    @recv_buffer = BufferedTokenizer.new(TOKEN)
  end

  def send_files_loop
    if @files && @files.length > 0
      file = @files.shift
      EM.next_tick do
        send_file(file[0],file[1])
      end
    else
      puts "done syncing files"
      send(prepare("files_completed"))
    end
  end

  def send_file(path,mtime)
    puts "Syncing "+path
    send("send_file #{path}, #{mtime}, content:")

    EM::Deferrable.future( stream_file_data(File.expand_path(path)) ) {
      send(prepare(""))
      send_files_loop
    }
  end

  def receive_data(data)
    @recv_buffer.extract(data).each do |m|
      process(m)
    end
  end

end

class EmClientExamplePastie < EmClientExample

  def initialize(*args)
    super
    @recv_buffer = BufferedTokenizerPastie.new(TOKEN)
  end

  def receive_data(data)
    @recv_buffer.extract(data).each do |m|
      process(m)
    end
  end

end

class EmServerExample < EventMachine::Connection

  def post_init
    if(@signature)
      client = Socket.unpack_sockaddr_in(get_peername)
      puts "Received a new connection from #{client.last}:#{client.first}"
    end
  end

  def unbind
    puts "server connection has terminated\n"
  end

  def process(data)
    #puts "server: #{data[0..15]}"
    send(prepare("success")) if data=="login"
    send(prepare("filesDone")) if data=="files_completed"
    send(prepare("ack")) if data=="some_msg"
    if data.match(/^send_file/)
      #puts data[0..40]
      puts "received file"
      start = data.index(", content:") + ", content:".length
      ender = data.length
      contents = data[start,ender]
      contents = Zlib::Inflate.inflate(contents) if COMPRESS
      file_contents = File.read(File.expand_path(FILE_NAME))
      if contents != file_contents
        puts "file was corrupted"
        puts "received length: #{contents.length} file lenght: #{file_contents.length}"
        #File.open(File.expand_path("~/copy.file"),"w") do |f|
        #  f << contents
        #end
      end
    end
    if data=="quit"
      send(prepare("goodbye"))
      close_connection_after_writing
    end
  end

  def prepare(str)
    str+TOKEN
  end

  def send(msg)
    #puts "server sent: #{msg}"
    send_data msg
  end

end

class EmServerExampleBadBuffer < EmServerExample

  def initialize(*args)
    super
    @buffer = DataBuffer.new
  end

  def receive_data(data)
    @buffer.append(data)
    while(command = @buffer.grab)
      process(command)
    end
  end

  def prepare(str)
    @buffer.prepare(str)
  end

end

class EmServerExampleBuffToken < EmServerExample

  def initialize(*args)
    super
    @recv_buffer = BufferedTokenizer.new(TOKEN)
  end

  def receive_data(data)
    @recv_buffer.extract(data).each do |m|
      process(m)
    end
  end

end

class EmServerExamplePastie < EmServerExample

  def initialize(*args)
    super
    @recv_buffer = BufferedTokenizerPastie.new(TOKEN)
  end

  def receive_data(data)
    @recv_buffer.extract(data).each do |m|
      process(m)
    end
  end

end

class DataBuffer
  FRONT_DELIMITER = "0x5b".hex.chr # '['
  BACK_DELIMITER = "0x5d".hex.chr #']'[0].to_s(16).hex.chr
  DELIMITER = "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|"
  DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/
    DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/

    def initialize
      @unprocessed = ""
      @commands = []
    end

    def grab
      new_messages = @unprocessed.split(DELIM_ESCAPE)
      while new_messages.length > 1
        @commands << new_messages.shift
      end
      msg_length = new_messages.length
      if msg_length > 0
        if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END)
          @commands.push(new_messages.shift)
          @unprocessed = ""
        else
          #put the rest of the last statement back into the buffer
          while(cut=@unprocessed.index(DELIM_ESCAPE))
            @unprocessed = (@unprocessed[cut..@unprocessed.length]).sub(DELIMITER,"")
          end
        end
      end
      if @commands.length > 0
        return @commands.shift
      else
        return nil
      end
    end

    def prepare(str)
      str.to_s+DELIMITER
    end

    def append(data)
      @unprocessed = @unprocessed + data
    end

  end

  class EmSendFileTest < Test::Unit::TestCase

    def test_placeholder
      assert true
    end

    def start_server(server_type)
      server_pid = fork {
        EventMachine::run do
          EventMachine::start_server SERVER_IP, SERVER_PORT, server_type
          puts "Server now accepting requests..."
        end
      }
      server_pid
    end

    def start_client(client_type)
      client_pid = fork {
        EventMachine::run { client_type.push_start() }
      }
      client_pid
    end

    def run_against_server_client(client_example, server_example)
      assert_nothing_raised do
        puts Benchmark.realtime {
          server_pid = start_server(server_example)
          #make sure server is up for client to connect to
          sleep(0.2)
          client_pid = start_client(client_example)
          sleep(0.2)

          Process.wait(client_pid)
          puts "client finished"

          #I don't know a clean way to end event machine take it down
          Process.kill('KILL',server_pid)
          Process.waitall
        }
        puts "##############################################################"
      end
    end

    def test_em_send_files_with_em_buffered_tokenizer
      puts "send files test with em buffered tokenizer"
      client_example = EmClientExampleBuffToken
      server_example = EmServerExampleBuffToken
      run_against_server_client(client_example, server_example)
    end

    def test_em_stream_files_with_em_buffered_tokenizer
      puts "steam_file_data test with em buffered tokenizer"
      if COMPRESS == true
        puts "steam_file_data can't be used with on the fly compression"
      else
        client_example = EmClientExampleStreamBuffToken
        server_example = EmServerExampleBuffToken
        run_against_server_client(client_example, server_example)
      end
    end

    def test_em_send_files_with_bad_tokenizer
      puts "send files test with our bad bueffered tokenizer"
      client_example = EmClientExampleBadBuffer
      server_example = EmServerExampleBadBuffer
      run_against_server_client(client_example, server_example)
    end

    def test_em_send_files_with_pastie_tokenizer
      puts "send files test with the pastied tokenizer"
      client_example = EmClientExamplePastie
      server_example = EmServerExamplePastie
      run_against_server_client(client_example, server_example)
    end

  end

Written by DanM

October 8, 2008 at 8:22 am

Posted in Uncategorized

Ruby Tools Roundup

Update: Devver now offers a hosted metrics service for Ruby developers which can give you useful feedback about your code. Check out Caliper, to get started with metrics for your project.

I collected all of the Ruby tools posts I made this week into a single roundup. You can quickly jump to any tool that interests you or read my reviews start to finish. If you just want to read a individual section here are the previous posts Ruby Code Quality Tools, Ruby Test Quality Tools, and Ruby Performance Tools.

There have been a bunch of interesting tools released for Ruby lately. I decided to write about a few of my favorite Ruby tools and give some of the new tools a shot as well. Simply put, better tools can help you be a better developer. I am ignoring the entire topic of IDEs as tools, as I have written about Ruby IDEs before, and it is basically a religious war. If you use any Ruby tools I don’t mention be sure to let me know as I am always interested in trying something new out.

Tool Name Description
Code Quality Tools
Roodi Roodi gives developers information about common mistakes in their Ruby code. It makes it easy to clean up your code before things start to get ugly.
Dust Dust is a new tool that will analyze your code, detect unsafe blocks and unused code. Dust is being created by the same mind behind Heckle
Flog Flog essentially scores an ABC metric, giving you a good understanding of the overall code complexity of any give file or method.
Saikuro When given Ruby source code, Saikuro will generate a report listing the cyclomatic complexity of each method found.
Test Quality Tools
Heckle Heckle helps test your Ruby tests (how cool is that?). Heckle is a mutation tester. It alters/breaks code and verifies that tests fail.
rcov rcov is the easiest way to get information about your current code coverage.
Ruby/Rails Performance Tools
ruby-prof ruby-prof is a fast and easy-to-use Ruby profiler. The first of four tools that can help you solve performance issues.
New Relic New Relic is one of the three Rails plugin performance debugging and monitoring tools recently released.
TuneUp TuneUp a Rails performance tool from FiveRuns. This tool has an interesting community built around it as well.
RubyRun Ruby Run is a Rails performance tool similar to New Relic and TuneUp

Lets get into it…

Roodi


Roodi gives you a bunch of interesting warnings about your Ruby code. We are about to release some code, so I took the opportunity to fix up anything Roodi complained about. It helped identify refactoring opportunities, both with long methods, and overly complex methods. The code and tests became cleaner and more granular after breaking some of the methods down. I even found and fixed one silly performance issue that was easy to see after refactoring, which improved the speed of our code. Spending some time with Roodi looks like it could easily improve the quality and readability of most Ruby projects with very little effort. I didn’t solve every problem because in one case I just didn’t think the method could be simplified anymore, but the majority of the suggestions were right on. Below is an example session with Roodi

dmayer$ sudo gem install roodi
dmayer$ roodi lib/client/syncer.rb
lib/client/syncer.rb:136 - Block cyclomatic complexity is 5.  It should be 4 or less.
lib/client/syncer.rb:61 - Method name "excluded" has a cyclomatic complexity is 10.  It should be 8 or less.
lib/client/syncer.rb:101 - Method name "should_be_excluded?" has a cyclomatic complexity is 9.  It should be 8 or less.
lib/client/syncer.rb:132 - Method name "find_changed_files" has a cyclomatic complexity is 10.  It should be 8 or less.
lib/client/syncer.rb:68 - Rescue block should not be empty.
lib/client/syncer.rb:61 - Method name "excluded" has 25 lines.  It should have 20 or less.
lib/client/syncer.rb:132 - Method name "find_changed_files" has 27 lines.  It should have 20 or less.
Found 7 errors.

After Refactoring:

~/projects/gridtest/trunk dmayer$ roodi lib/client/syncer.rb
lib/client/syncer.rb:148 - Block cyclomatic complexity is 5.  It should be 4 or less.
lib/client/syncer.rb:82 - Rescue block should not be empty.
Found 2 errors.

I did have one problem with Roodi – the errors about rescue blocks just seemed to be incorrect. For code like the little example below it kept throwing the error even though I obviously am doing some work in the rescue code.

Roodi output: lib/client/syncer.rb:68 - Rescue block should not be empty.
begin
  socket = TCPSocket.new(server_ip,server_port)
  socket.close
  return true
rescue Errno::ECONNREFUSED
  return false
end

Dust


Dust detects unused code like unused variables,branches, and blocks. I look forward to see how the project progresses. Right now there doesn’t seem to be much out there on the web, and the README is pretty bare bones. Once you can pass it some files to scan, I think this will be something really useful. For now I didn’t think there wasn’t much I could actually do besides check it out. Kevin, who also helped create the very cool Heckle, does claim that code scanning is coming soon, so I look forward to doing a more detailed write up eventually.

Flog


Flog gives feedback about the quality of your code by scoring code using the ABC metric. Using Flog to help guide refactoring, code cleanup, and testing efforts can be highly effective. It is a little easier to understand the reports after reading how Flog scores your code, and what is a good Flog score. Once you get used to working with Flog you will likely want to run it often against your whole project after making any significant changes. There are two easy ways to do this a handy Flog Rake task or MetricFu which works with both Flog and Saikuro.

Running Flog against any subset of a project is easy, here I am running it against our client libraries

find ./lib/client/ -name \*.rb | xargs flog -n -m &gt; flog.log

Here some example Flog output when run against our client code.

Total score = 1364.52395469781

Client#send_tests: (64.3)
    14.3: assignment
    13.9: puts
    10.7: branch
    10.5: send
     4.7: send_quit
     3.4: message
     3.4: now
     2.0: create_queue_test_msg
     1.9: create_run_msg
     1.9: test_files
     1.8: dump
     1.7: each
     1.7: report_start
     1.7: length
     1.7: get_tests
     1.7: -
     1.7: open
     1.7: load_file
     1.6: empty?
     1.6: nil?
     1.6: use_cache
     1.6: exists?
ModClient#send_file: (32.0)
    12.4: branch
     5.4: +
     4.3: assignment
     3.9: send
     3.1: puts
     2.9: ==
     2.9: exists?
     2.9: directory?
     1.9: strftime
     1.8: to_s
     1.5: read
     1.5: create_file_msg
     1.4: info
Syncer#sync: (30.8)
    13.2: assignment
     8.6: branch
     3.6: inspect
     3.2: info
     3.0: puts
     2.8: +
     2.6: empty?
     1.7: map
     1.5: now
     1.5: length
     1.4: send_files
     1.3: max
     1.3: >
     1.3: find_changed_files
     1.3: write_sync_time
Syncer#find_changed_files: (26.2)
    15.6: assignment
     8.7: branch
     3.5: <<
     1.8: to_s
     1.7: get_relative_path
     1.7: >
     1.7: mtime
     1.6: exists?
     1.6: ==
     1.5: prune
     1.4: should_be_excluded?
     1.3: get_removed_files
     1.3: find
... and so on ...

Saikuro


Saikuro is another code complexity tool. It seems to give a little less information than some of the others. It does generate nice HTML reports. Like other code complexity tools it can be helpful to discover the most complex parts of your projects for refactoring and to help focus your testing. I liked the way Flog broke things down for me into a bit more detail, but either is a useful tool and I am sure it is a matter of preference depending on what you are looking for.

saikuro screenshot
Saikuro Screenshot

Heckle


Heckle is an interesting tool to do mutation testing of your tests. Heckle currently supports Test:Unit and RSpec, but does have a number of issues. I had to run it on a few different files and methods before I got some useful output that helped me improve my testing. The first problem was it crashing when I passed it entire files (crashing the majority of the time). I then began passing it single methods I was curious about, which still occasionally caused Heckle to get into an infinite loop case. This is a noted problem in Heckle, but -T and providing a timeout should solve that issue. In my case it was actually not an infinite loop timing error, but an error when attempting to rewrite the code, which lead to a continual failure loop that wouldn’t time out. When I found a class and method that Heckle could test I got some good results. I found one badly written test case, and one case that was never tested. Lets run through a simple Heckle example.

#install heckle
dmayer$ sudo gem install heckle

#example of the infinite loop Error Heckle run
heckle Syncer should_be_excluded? --tests test/unit/client/syncer_test.rb -v

Setting timeout at 5 seconds.
Initial tests pass. Let's rumble.

**********************************************************************
*** Syncer#should_be_excluded? loaded with 13 possible mutations
**********************************************************************
...
2 mutations remaining...
Replacing Syncer#should_be_excluded? with:

2 mutations remaining...
Replacing Syncer#should_be_excluded? with:
... loops forever ...

#Heckle run against our Client class and the process method

dmayer$ heckle Client process --tests test/unit/client/client_test.rb

Initial tests pass. Let's rumble.

**********************************************************************
*** Client#process loaded with 9 possible mutations
**********************************************************************

9 mutations remaining...
8 mutations remaining...
7 mutations remaining...
6 mutations remaining...
5 mutations remaining...
4 mutations remaining...
3 mutations remaining...
2 mutations remaining...
1 mutations remaining...

The following mutations didn't cause test failures:

--- original
+++ mutation

def process(command)

case command
when @buffer.Ready then
process_ready
- when @buffer.SetID then
+ when nil then
process_set_id(command)
when @buffer.InitProject then
process_init_project
when @buffer.Result then
process_result(command)
when @buffer.Goodbye then
kill_event_loop
when @buffer.Done then
process_done
when @buffer.Error then
process_error(command)
else
@log.error("client ignoring invalid command #{command}") if @log
end
end

--- original
+++ mutation
def process(command)
case command
when @buffer.Ready then
process_ready
when @buffer.SetID then
process_set_id(command)
when @buffer.InitProject then
process_init_project
when @buffer.Result then
process_result(command)
when @buffer.Goodbye then
kill_event_loop
when @buffer.Done then
process_done
when @buffer.Error then
process_error(command)
else
- @log.error("client ignoring invalid command #{command}") if @log
+ nil if @log
end
end

Heckle Results:

Passed : 0
Failed : 1
Thick Skin: 0

Improve the tests and try again.

#Tests added / changed to improve Heckle results

def test_process_process_loop__random_result
    Client.any_instance.expects(:start_tls).returns(true)
    client = Client.new({})
    client.stubs(:send_data)
    client.log = stub_everything
    client.log.expects(:error).with("client ignoring invalid command this is random")
    client.process("this is random")
  end

  def test_process_process_loop__set_id
    Client.any_instance.expects(:start_tls).returns(true)
    client = Client.new({})
    client.stubs(:send_data)
    client.log = stub_everything
    cmd = DataBuffer.new.create_set_ids_msg("4")
    client.expects(:process_set_id).with(cmd)
    client.process(cmd)
  end

#A final Heckle run, showing successful results

dmayer$ heckle Client process --tests test/unit/client/client_test.rb

Initial tests pass. Let's rumble.

**********************************************************************
*** Client#process loaded with 9 possible mutations
**********************************************************************

9 mutations remaining...
8 mutations remaining...
7 mutations remaining...
6 mutations remaining...
5 mutations remaining...
4 mutations remaining...
3 mutations remaining...
2 mutations remaining...
1 mutations remaining...
No mutants survived. Cool!

Heckle Results:

Passed : 1
Failed : 0
Thick Skin: 0

All heckling was thwarted! YAY!!!

rcov


rcov is a code coverage tool for Ruby. If you are doing testing you should probably be monitoring your coverage with a code coverage tool. I don't know of a better tool for code coverage than rcov. It is simple to use and generates beautiful, easy-to-read HTML charts showing the current coverage broken down by file. An easy way to make you project more stable is to occasionally spend some time increasing the coverage you have on your project. I have always found it a great way to get back into a project if you have been off of it for awhile. You just need to find some weak coverage points and get to work.
Rcov Screenshot
rcov screenshot

ruby-prof


ruby-prof does what every other profiler does, but it is much faster than the one built in to Ruby. It also makes it easy to output the information you are seeking to HTML pages, such as call graphs. If you are just looking for a simple write up to get started with ruby-prof I recommend the previous link. I will talk a little more about the kinds of problems I find and how I have solved them with ruby-prof.

I have used ruby-prof a number of times to isolate the ways to speed up my code. I haven't used it to identify why an entire Rails application is slow (there are better tools I discuss later for that), but if you have a small but highly important piece of code ruby-prof is often the best way to isolate the problem. I used ruby-prof to identified the two slowest lines of code of a spellchecker, which was rewritten to become twice as fast.

Most recently I used it to identify where the code was spending all of its time in a loop for a file syncer. It turns out that for thousands of files each time through the loop we were continually calling Pathname.new(path).relative_path_from(@dir_path) over and over. Putting a small cache around that call essentially eliminated all delays in our file synchronization. Below is a simple example of how a few lines of code can make all the difference in performance and how easily ruby-prof can help you isolate the problem areas and where to spend your time. I think seeing the code that ruby-prof helped isolate, and the changes made to the code might be useful if you are new to profiling and performance work.

changes in our spellchecker / recommender

#OLD Way
 alteration = []
    n.times {|i| LETTERS.each_byte {
        |l| alteration << word[0...i].strip+l.chr+word[i+1..-1].strip } }
 insertion = []
     (n+1).times {|i| LETTERS.each_byte {
        |l| insertion << word[0...i].strip+l.chr+word[i..-1].strip } }
 #NEW Way
    #pre-calculate the word breakups
    word_starts = []
    word_missing_ends = []
    word_ends = []
    (n+1).times do |i|
      word_starts << word[0...i]
      word_missing_ends << word[i+1..-1]
      word_ends << word[i..-1]
    end

 alteration = []
    n.times {|i|
      alteration = alteration.concat LETTERS.collect { |l|
        word_starts[i]+l+word_missing_ends[i] } }
 insertion = []
    (n+1).times {|i|
      insertion = insertion.concat LETTERS.collect { |l|

        word_starts[i]+l+word_ends[i] } }

Changes in our file syncer

#OLD
 path_name = Pathname.new(path).relative_path_from(@dir_path).to_s
 #NEW
 path_name = get_relative_path(path)

  def get_relative_path(path)
    return @path_cache[path] if @path_cache.member?(path)
    retval = Pathname.new(path).relative_path_from(@dir_path).to_s
    @path_cache[path] = retval
    return retval
  end

New Relic


New Relic is a performance monitoring tool for Rails apps. It has a great development mode that will help you track down performance issues before they even become a problem, and live monitoring so that you can find any hiccups that are slowing down the production application. The entire performance monitoring space for Ruby/Rails seems to be heating up. I guess it is easy to see why, when scaling has been such an issue for some Rails apps. Just playing around with New Relic was exciting and fun. I could quickly track down the slowest pages, and our most problematic SQL calls, in this case I was testing New Relic on Seekler (an old project of ours) since I didn't think I would find much interesting on our current Devver site. Seekler had some glaring performance issues and I think if we had New Relic from the beginning we could have avoided many of them. Sounds like I might have a day project involving New Relic and giving Seekler as much of a performance boost as possible. New Relic turned out to be my favorite of the performance monitoring tools. For a much more detailed writeup check out RailsTips New Relic Review.

newrelic screenshot
New Relic screenshot

TuneUp


TuneUp another easy-to-install and use Rails performance monitoring solution. The problem I had with TuneUp was I couldn't get it working on test app for these sorts of things. I tried running Seekler with TuneUp, but had no luck. I found that many people on the message boards seemed to be having various compatibility issues. I looked at the TuneUp screencast and the kind of information that they give you and I feel like this would be equal to New Relic if it works for you. I am emailing back and forth with FiveRuns support who have been very attentive and helpful, so if I get it working I will update this section.

Update: FiveRuns is pretty amazing with their support. I haven't got TuneUp fully working yet, but have made some progress. Some good things to know are that some plugins like safe_erb and output_compression can cause problems with TuneUp. They are aware of the issues, and actively looking into it.

Ruby Run


RubyRun provides live performance monitoring and debugging tools. I hadn't ever heard of this product before I started doing some research while writing this blog article. I am sorry to say but this was the hardest to set up, and gave back less valuable information. I think they need a simple screencast on how to get set up and get useful information back. After getting setup and running I could only get ugly CSV reports that didn't tell me much more than the regular Rails log files. I started reading the RubyRun Manual but it was about as long as Moby Dick and all I wanted was how to view simple easy-to-read reports which is a snap in New Relic and TuneUp. Since the site didn't mention RubyRun providing better data than New Relic or TuneUp which were much more user friendly, I don't think I would recommend RubyRun.

UPDATE: After reading about my difficulties with RubyRun the great folks from Rubysophic got in touch with me. They offered to help me get the tool working and posted a RubyRun quick start guide to their site. I got it working in a snap thanks to an email from their dev and the amazingly simple quick start guide. I still didn't get the same depth of information that I got with New Relic, although RubyRun has a ton of settings so it is likely you can get more depth to the reports. Something worth pointing out is that RubyRun is working on Seekler, which I haven't been able to get TuneUp running on. So if you have been having problems with TuneUp or New Relic, definitely give RubyRun a look. In the end I think the other offerings are slightly more user friendly (less complex settings), and easier to explore the data (link in the feed to both reports, at least when in developer mode). That being said RubyRun offers some great information and options that the others don't and with a bit more UI tuning RubyRun would be at the top of the pack. Thanks to the helpful devs at Rubysophic for helping me to get the most out of RubyRun.

RubyRun screenshot
RubyRun screenshot
RubyRun second screen shot
screenshot of a different RubyRun report

That is it, hope you learned about a new Ruby tool. So get to work, try a new tool, and get to know your code a little better than you did before.

While I was writing this article, people pointed out to me two more tools worth pointing out. I didn't get a chance to try them out or review them, but thought I should point them out. Towlie, helps keep your code dry by finding redundant methods. and finally Source ANalysis (SAN), which is described as, "a Ruby gem for analyzing the contents of source code including comment to script ratios, todo items, declared functions, classes, and much more".

Written by DanM

October 3, 2008 at 10:25 am

Ruby Code Quality Tools

Update: Devver now offers a hosted metrics service for Ruby developers which can give you useful feedback about your code. Check out Caliper, to get started with metrics for your project.

This is the third post in my series of Ruby tools articles. This time I look at Ruby code quality tools. Rubyists like Ruby because the code can look so nice, simple, and sometimes beautiful. Unfortunately not all code is so great, in fact often the code I write doesn’t look good. Fortunately while a great language can help you to write great code, great tools can help as well. As code grows it is easy for code bloat, dead code, or confusing complexities to slip in. The tools I review below can help with all of these problems. I recommend finding the one or two code quality tools you like best and starting to integrate them more into your development process.

Roodi


Roodi gives you a bunch of interesting warnings about your Ruby code. We are about to release some code, so I took the opportunity to fix up anything Roodi complained about. It helped identify refactoring opportunities, both with long methods, and overly complex methods. The code and tests became cleaner and more granular after breaking some of the methods down. I even found and fixed one silly performance issue that was easy to see after refactoring, which improved the speed of our code. Spending some time with Roodi looks like it could easily improve the quality and readability of most Ruby projects with very little effort. I didn’t solve every problem because in one case I just didn’t think the method could be simplified anymore, but the majority of the suggestions were right on. Below is an example session with Roodi

dmayer$ sudo gem install roodi
dmayer$ roodi lib/client/syncer.rb
lib/client/syncer.rb:136 - Block cyclomatic complexity is 5.  It should be 4 or less.
lib/client/syncer.rb:61 - Method name "excluded" has a cyclomatic complexity is 10.  It should be 8 or less.
lib/client/syncer.rb:101 - Method name "should_be_excluded?" has a cyclomatic complexity is 9.  It should be 8 or less.
lib/client/syncer.rb:132 - Method name "find_changed_files" has a cyclomatic complexity is 10.  It should be 8 or less.
lib/client/syncer.rb:68 - Rescue block should not be empty.
lib/client/syncer.rb:61 - Method name "excluded" has 25 lines.  It should have 20 or less.
lib/client/syncer.rb:132 - Method name "find_changed_files" has 27 lines.  It should have 20 or less.
Found 7 errors.

After Refactoring:

~/projects/gridtest/trunk dmayer$ roodi lib/client/syncer.rb
lib/client/syncer.rb:148 - Block cyclomatic complexity is 5.  It should be 4 or less.
lib/client/syncer.rb:82 - Rescue block should not be empty.
Found 2 errors.

I did have one problem with Roodi – the errors about rescue blocks just seemed to be incorrect. For code like the little example below it kept throwing the error even though I obviously am doing some work in the rescue code.

Roodi output: lib/client/syncer.rb:68 - Rescue block should not be empty.
begin
  socket = TCPSocket.new(server_ip,server_port)
  socket.close
  return true
rescue Errno::ECONNREFUSED
  return false
end

Dust


Dust detects unused code like unused variables,branches, and blocks. I look forward to see how the project progresses. Right now there doesn’t seem to be much out there on the web, and the README is pretty bare bones. Once you can pass it some files to scan, I think this will be something really useful. For now I didn’t think there wasn’t much I could actually do besides check it out. Kevin, who also helped create the very cool Heckle, does claim that code scanning is coming soon, so I look forward to doing a more detailed write up eventually.

Flog


Flog gives feedback about the quality of your code by scoring code using the ABC metric. Using Flog to help guide refactoring, code cleanup, and testing efforts can be highly effective. It is a little easier to understand the reports after reading how Flog scores your code, and what is a good Flog score. Once you get used to working with Flog you will likely want to run it often against your whole project after making any significant changes. There are two easy ways to do this a handy Flog Rake task or MetricFu which works with both Flog and Saikuro.

Running Flog against any subset of a project is easy, here I am running it against our client libraries

find ./lib/client/ -name \*.rb | xargs flog -n -m &gt; flog.log

Here some example Flog output when run against our client code.

Total score = 1364.52395469781

Client#send_tests: (64.3)
    14.3: assignment
    13.9: puts
    10.7: branch
    10.5: send
     4.7: send_quit
     3.4: message
     3.4: now
     2.0: create_queue_test_msg
     1.9: create_run_msg
     1.9: test_files
     1.8: dump
     1.7: each
     1.7: report_start
     1.7: length
     1.7: get_tests
     1.7: -
     1.7: open
     1.7: load_file
     1.6: empty?
     1.6: nil?
     1.6: use_cache
     1.6: exists?
ModClient#send_file: (32.0)
    12.4: branch
     5.4: +
     4.3: assignment
     3.9: send
     3.1: puts
     2.9: ==
     2.9: exists?
     2.9: directory?
     1.9: strftime
     1.8: to_s
     1.5: read
     1.5: create_file_msg
     1.4: info
Syncer#sync: (30.8)
    13.2: assignment
     8.6: branch
     3.6: inspect
     3.2: info
     3.0: puts
     2.8: +
     2.6: empty?
     1.7: map
     1.5: now
     1.5: length
     1.4: send_files
     1.3: max
     1.3: >
     1.3: find_changed_files
     1.3: write_sync_time
Syncer#find_changed_files: (26.2)
    15.6: assignment
     8.7: branch
     3.5: <<
     1.8: to_s
     1.7: get_relative_path
     1.7: >
     1.7: mtime
     1.6: exists?
     1.6: ==
     1.5: prune
     1.4: should_be_excluded?
     1.3: get_removed_files
     1.3: find
... and so on ...

Saikuro


Saikuro is another code complexity tool. It seems to give a little less information than some of the others. It does generate nice HTML reports. Like other code complexity tools it can be helpful to discover the most complex parts of your projects for refactoring and to help focus your testing. I liked the way Flog broke things down for me into a bit more detail, but either is a useful tool and I am sure it is a matter of preference depending on what you are looking for.

saikuro screenshot
Saikuro Screenshot

Written by DanM

October 1, 2008 at 10:04 pm

Posted in Development, Ruby, Testing

Follow

Get every new post delivered to your Inbox.