<< ActiveRecord connection pool fairness | Home | Profiling JRuby with NetBeans >>

Comparing Ruby exceptions to additional stat() test

While profiling assets precompilation in a JRuby on Rails application, it was found that a large amount of time was being spent converting Java exceptions into Ruby exceptions (this happens to be relatively slow in JRuby). Digging in with the profiler revealed some patterns where exceptions were being raised (and handled) in code dealing with reading and writing cache files and making directories.

A common idiom in Ruby code (FileUtils, ActiveSupport, Hike) when dealing with a file or directory that may or may not exist is to go ahead and call a method that expects the file to exist (such as open), and catch the exception raised for nonexistent files.

def entries_with_exception(dir)
  Dir.entries(dir).size
rescue Errno::ENOENT
  0
end

The alternative is to first test for existence with something like File.exists?, which means a second system call in cases where the file does exist: one for the test, and a second to actually open the file.

def entries_after_stat(dir)
  if File.directory?(dir)
    Dir.entries(dir).size
  else
    0
  end
end

(As Ethan points out below, this code will fail in the case that dir is deleted after the stat call but before reading its entries)

A single system call is always more efficient than two. However, in the failing case, an exception object must be constructed. This is not free on MRI and is quite expensive under JRuby. The best choice depends on how common the failing case is expected to be.

A synthetic benchmark was created to demonstrate the relative differences. All measurements were done on a AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ running Ubuntu 12.10 with Linux kernel 3.5.0 and ext4 filesystem.

The benchmark gathers a list of directories by scanning /usr/lib (arbitrary; there were just under 2500). It then sums the number of entries in each directory using Dir.entries. A second scan is done after mangling the directory names into directories that do not exist to exercise the failure case.

In one mode (stat), the code first tests if the directory exists before querying the entries. In another mode (excep), no test is made but an exception handler is installed to handle failure case.

In total, there are four variables with two states each: success or failure case (directory exists or not), stat test or exception handler, whether or not the operating system's disk cache is cleared, and finally the version of Ruby (Ruby 1.9.3 or JRuby 1.7.0).

In the success case, it is expected that the extra stat call to test directory existence will add some overhead, and this is exactly what is observed:

Times are given in milliseconds. Lower is better.

Success Case (directory exists)

Ruby 1.9.3 JRuby 1.7.0
Warm Cold Warm Cold
excep 7012000 16013000
stat 7813000 18913000

Failure Case (directory does not exist)

Ruby 1.9.3 JRuby 1.7.0
Warm Cold Warm Cold
excep 905000 7207100
stat 84900 205600

Update: re-ran tests on Mac with 2GHz i7, OpenJDK 1.7.0-u10-b09, warm cache only, 9100 dirs in the sample.

Success Case (directory exists)

Ruby 1.9.3 JRuby 1.7.0
excep 263 518
stat 296 602

Failure Case (directory does not exist)

Ruby 1.9.3JRuby 1.7.0
excep 216 1505
stat 24 216

Creating exceptions adds significant overhead in both MRI and JRuby. In both, the additional stat test adds under 20% overhead to the success case while handling the failure by catching an exception is an order of magnitude slower.

See JRuby Performance: Exceptions are not flow control which also makes the case that while particularly bad in JRuby, exceptions as flow control slows performance in MRI as well. See other benchmarks showing exceptions are slow.



Re: Comparing Ruby exceptions to additional stat() test

The problem with stat-ing a file before you open it is that there may be nonzero time between the two, in which time the file may have been deleted by somebody else. if you stat a file, another process deletes it, and then you open the file, you'll get an unexpected ENOENT. if your only call is to open it, there's no time in between, and you are already handling ENOENT.

given the slowness of converting exceptions, it's probably still worth doing the stat to avoid the exception in the vast majority of cases. but unless you are assured that your process is the only one that may be touching the file, you should still handle corner case where there is still a ENOENT.

begin
  if File.exists?(filename)
    file = File.open(filename) # still expect this may raise ENOENT
    do_things(file)
  else
    handle_nonexistent(filename)
  end
rescue Errno::ENOENT
  handle_nonexistent(filename)
end
Avatar: Patrick

Re: Comparing Ruby exceptions to additional stat() test

Ethan,

Thanks, this is an excellent point.

I'd personally like to see a lower level file api that returns error codes, or perhaps one that accepts an error handler as a block.  I'd been working on a proof-of-concept of such an api, but it is on the far back burner at the moment.


Add a comment Send a TrackBack