The Devver Blog

A Boulder startup improving the way developers work.

Improving Code using Metric_fu

Often, when people see code metrics they think, “that is interesting, I don’t know what to do with it.” I think metrics are great, but when you can really use them to improve your project’s code, that makes them even more valuable. metric_fu provides a bunch of great metric information, which can be very useful. But if you don’t know what parts of it are actionable it’s merely interesting instead of useful.

One thing when looking at code metrics to keep in mind is that a single metric may not be as interesting. If you look at a metric trends over time it might help give you more meaningful information. Showing this trending information is one of our goals with Caliper. Metrics can be your friend watching over the project and like having a second set of eyes on how the code is progressing, alerting you to problem areas before they get out of control. Working with code over time, it can be hard to keep everything in your head (I know I can’t). As the size of the code base increases it can be difficult to keep track of all the places where duplication or complexity is building up in the code. Addressing the problem areas as they are revealed by code metrics can keep them from getting out of hand, making future additions to the code easier.

I want to show how metrics can drive changes and improve the code base by working on a real project. I figured there was no better place to look than pointing metric_fu at our own website source and fixing up some of the most notable problem areas. We have had our backend code under metric_fu for awhile, but hadn’t been following the metrics on our Merb code. This, along with some spiked features that ended up turning into Caliper, led to some areas getting a little out of control.

Flay Score before cleanup

When going through metric_fu the first thing I wanted to start to work on was making the code a bit more DRY. The team and I were starting to notice a bit more duplication in the code than we liked. I brought up the Flay results for code duplication and found that four databases models shared some of the same methods.

Flay highlighted the duplication. Since we are planning on making some changes to how we handle timestamps soon, it seemed like a good place to start cleaning up. Below are the methods that existed in all four models. A third method ‘update_time’ existed in two of the four models.

 def self.pad_num(number, max_digits = 15)
    "%%0%di" % max_digits % number.to_i

  def get_time

Nearly all of our DB tables store time in a way that can be sorted with SimpleDB queries. We wanted to change our time to be stored as UTC in the ISO 8601 format. Before changing to the ISO format, it was easy to pull these methods into a helper module and include it in all the database models.

module TimeHelper

  module ClassMethods
    def pad_num(number, max_digits = 15)
      "%%0%di" % max_digits % number.to_i

  def get_time

  def update_time
    self.time = self.class.pad_num(


Besides reducing the duplication across the DB models, it also made it much easier to include another time method update_time, which was in two of the DB models. This consolidated all the DB time logic into one file, so changing the time format to UTC ISO 8601 will be a snap. While this is a trivial example of a obvious refactoring it is easy to see how helper methods can often end up duplicated across classes. Flay can come in really handy at pointing out duplication that over time that can occur.

Flog gives a score showing how complex the measured code is. The higher the score the greater the complexity. The more complex code is the harder it is to read and it likely contains higher defect density. After removing some duplication from the DB models I found our worst database model based on Flog scores was our MetricsData model. It included an incredibly bad high flog score of 149 for a single method.

File Total score Methods Average score Highest score
/lib/sdb/metrics_data.rb 327 12 27 149

The method in question was extract_data_from_yaml, and after a little refactoring it was easy to make extract_data_from_yaml drop from a score of 149 to a series of smaller methods with the largest score being extract_flog_data! (33.6). The method was doing too much work and was frequently being changed. The method was extracting the data from 6 different metric tools and creating summary of the data.

The method went from a sprawling 42 lines of code to a cleaner and smaller method of 10 lines and a collection of helper methods that look something like the below code:

  def self.extract_data_from_yaml(yml_metrics_data)
    metrics_data = {|hash, key| hash[key] = {}}
    extract_flog_data!(metrics_data, yml_metrics_data)
    extract_flay_data!(metrics_data, yml_metrics_data)
    extract_reek_data!(metrics_data, yml_metrics_data)
    extract_roodi_data!(metrics_data, yml_metrics_data)
    extract_saikuro_data!(metrics_data, yml_metrics_data)
    extract_churn_data!(metrics_data, yml_metrics_data)

  def self.extract_flog_data!(metrics_data, yml_metrics_data)
    metrics_data[:flog][:description] = 'measures code complexity'
    metrics_data[:flog]["average method score"] = Devver::Maybe(yml_metrics_data)[:flog][:average].value(N_A)
    metrics_data[:flog]["total score"]   = Devver::Maybe(yml_metrics_data)[:flog][:total].value(N_A)
    metrics_data[:flog]["worst file"] = Devver::Maybe(yml_metrics_data)[:flog][:pages].first[:path].fmap {|x|}.value(N_A)

Churn gives you an idea of files that might be in need of a refactoring. Often if a file is changing a lot it means that the code is doing too much, and would be more stable and reliable if broken up into smaller components. Looking through our churn results, it looks like we might need another layout to accommodate some of the different styles on the site. Another thing that jumps out is that both the TestStats and Caliper controller have fairly high churn. The Caliper controller has been growing fairly large as it has been doing double duty for user facing features and admin features, which should be split up. TestStats is admin controller code that also has been growing in size and should be split up into more isolated cases.

churn results

Churn gave me an idea of where might be worth focusing my effort. Diving in to the other metrics made it clear that the Caliper controller needed some attention.

The Flog, Reek, and Roodi Scores for Caliper Controller:

File Total score Methods Average score Highest score
/app/controllers/caliper.rb 214 14 15 42

reek before cleanup

Roodi Report
app/controllers/caliper.rb:34 - Method name "index" has a cyclomatic complexity is 14.  It should be 8 or less.
app/controllers/caliper.rb:38 - Rescue block should not be empty.
app/controllers/caliper.rb:51 - Rescue block should not be empty.
app/controllers/caliper.rb:77 - Rescue block should not be empty.
app/controllers/caliper.rb:113 - Rescue block should not be empty.
app/controllers/caliper.rb:149 - Rescue block should not be empty.
app/controllers/caliper.rb:34 - Method name "index" has 36 lines.  It should have 20 or less.

Found 7 errors.

Roodi and Reek both tell you about design and readability problems in your code. The screenshot of our Reek ‘code smells’ in the Caliper controller should show how it had gotten out of hand. The code smells filled an entire browser page! Roodi similarly had many complaints about the Caliper controller. Flog was also showing the file was getting a bit more complex than it should be. After picking off some of the worst Roodi and Reek complaints and splitting up methods with high Flog scores, the code had become easily readable and understandable at a glance. In fact I nearly cut the Reek complaints in half for the controller.

Reek after cleanup

Refactoring one controller, which had been quickly hacked together and growing out of control, brought it from a dizzying 203 LOC to 138 LOC. The metrics drove me to refactor long methods (52 LOC => 3 methods the largest being 23 LOC), rename unclear variable names (s => stat, p => project), move some helpers methods out of the controller into the helper class where they belong. Yes, all these refactorings and good code designs can be done without metrics, but it can be easy to overlook bad code smells when they start small, metrics can give you an early warning that a section of code is becoming unmanageable and likely prone to higher defect rates. The smaller file was a huge improvement in terms of cyclomatic complexity, LOC, code duplication, and more importantly, readability.

Obviously I think code metrics are cool, and that your projects can be improved by paying attention to them as part of the development lifecycle. I wrote about metric_fu so that anyone can try these metrics out on their projects. I think metric_fu is awesome, and my interest in Ruby tools is part of what drove us to build Caliper, which is really the easiest way try out metrics for your project. Currently, you can think of it as hosted metric_fu, but we are hoping to go even further and make the metrics clearly actionable to users.

In the end, yep, this is a bit of a plug for a product I helped build, but it is really because I think code metrics can be a great tool to help anyone with their development. So submit your repo in and give Caliper hosted Ruby metrics a shot. We are trying to make metrics more actionable and useful for all Ruby developers out, so we would love to here from you with any ideas about how to improve Caliper, please contact us.

Written by DanM

October 27, 2009 at 10:30 pm

10 Responses

Subscribe to comments with RSS.

  1. […] reek, refactoring, ruby Interesting link: In Improving Code using Metric Fu the folks at give a little insight into how they have been using Reek and the other Ruby […]

  2. I have never tried metrics for my code, but this looks promising!

    Deepak Prasanna

    October 31, 2009 at 8:14 pm

  3. […] Improving Code using Metric_fu – The Devver Blog […]

  4. Looks great. Will it remain public service for public projects, or can involve to SaaS or maybe a product of some sort (ie Redmine plugin)?


    November 1, 2009 at 5:12 am

  5. We plan on leaving the service free for open source software. We are considering a pay service for private projects.


    November 2, 2009 at 8:26 am

  6. I've read before that when working with tools like this you want to calibrate them. Who says that your code shouldn't have a method that's 20 lines or longer? Maybe for your application this is OK. Sure it's less readable than something that has less LOC.

    The best suggestion I've heard about these tools is to start with very generous requirements. In fact, start by configuring them so that all of your code passes. Then start ratcheting down the numbers until you get to a point where you feel good about things.

    For example, you might have a method that's 35 lines long in your code. By default roodi would flag this because it's over 20 lines. So start by configuring roodi to allow 35 lines or less. Then drop it down to 30 and make all of your code pass that metric again. Then drop it to 25… and so on. When you are finished, you will have improved your code base and customized these metric tools specifically for your application.


    November 3, 2009 at 3:53 pm

  7. […] 仅仅计算是否符合标准是不够的,代码还应该能够在以后被重构。Devver的博客上展示了Caliper的标准是如何用来改善其代码库。Caliper也能够和其他的服务通过提交确认回调(Post-commit Hooks)的方式集成在一起,例如GitHub和。 […]

  8. […] code metrics are to actually be useful and not just interesting, there must be some clear business-relevant question that you’re trying answering. Here are a few […]

  9. […] a big fan of Caliper, as I’ve mentioned before. I’m eager to use it as a method of using metrics to put pressure on your design. We’ll be writing more about this technique along the […]

  10. […] code metrics are to actually be useful and not just interesting, there must be some clear business-relevant question that you’re trying answering. Here are a few […]

Comments are closed.

%d bloggers like this: