Ruby exceptions or additional stat() test


26 October 2012

A common idiom in Ruby code (FileUtils, ActiveSupport, Hike) when dealing with a file or directory that may or may not exist is to go ahead and call a method that expects the file to exist (such as open), and catch the exception raised for nonexistent files. This is generally a poor choice performance-wise on JRuby, but is also not necessarily ideal on MRI.

While profiling assets precompilation in a JRuby on Rails application, it was found that a large amount of time was being spent converting Java exceptions into Ruby exceptions (this happens to be relatively slow in JRuby). Digging in with the profiler revealed some patterns where exceptions were being raised (and handled) in code dealing with reading and writing cache files and making directories.

def entries_with_exception(dir)
  Dir.entries(dir).size
rescue Errno::ENOENT
  0
end

The alternative is to first test for existence with something like File.exists?, which means a second system call in cases where the file does exist: one for the test, and a second to actually open the file.

def entries_after_stat(dir)
  if File.directory?(dir)
    Dir.entries(dir).size
  else
    0
  end
rescue Errno::ENOENT
  0
end

(Thanks Ethan for pointing out that you still need to catch the exception should the dir be deleted after the stat call but before reading its entries)

A single system call is always more efficient than two. However, in the failing case, an exception object must be constructed. This is not free on MRI and is quite expensive under JRuby. The best choice depends on how common the failing case is expected to be.

A synthetic benchmark was created to demonstrate the relative differences. All measurements were done on a AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ running Ubuntu 12.10 with Linux kernel 3.5.0 and ext4 filesystem.

The benchmark gathers a list of directories by scanning /usr/lib (arbitrary; there were just under 2500). It then sums the number of entries in each directory using Dir.entries. A second scan is done after mangling the directory names into directories that do not exist to exercise the failure case.

In one mode (stat), the code first tests if the directory exists before querying the entries. In another mode (excep), no test is made but an exception handler is installed to handle failure case.

In total, there are four variables with two states each: success or failure case (directory exists or not), stat test or exception handler, whether or not the operating system’s disk cache is cleared, and finally the version of Ruby (Ruby 1.9.3 or JRuby 1.7.0).

In the success case, it is expected that the extra stat call to test directory existence will add some overhead, and this is exactly what is observed:

Times are given in milliseconds. Lower is better.

Success Case (directory exists)

Ruby 1.9.3 JRuby 1.7.0
Warm Cold Warm Cold
excep 70 12000 160 13000
stat 78 13000 189 13000

Failure Case (directory does not exist)

Ruby 1.9.3 JRuby 1.7.0
Warm Cold Warm Cold
excep 90 5000 720 7100
stat 8 4900 20 5600

Creating exceptions adds significant overhead in both MRI and JRuby. In both, the additional stat test adds under 20% overhead to the success case while handling the failure by catching an exception is an order of magnitude slower.

See JRuby Performance: Exceptions are not flow control which also makes the case that while particularly bad in JRuby, exceptions as flow control slows performance in MRI as well. See other benchmarks showing exceptions are slow.

The problem with stat-ing a file before you open it is that there may be nonzero time between the two, in which time the file may have been deleted by somebody else. if you stat a file, another process deletes it, and then you open the file, you’ll get an unexpected ENOENT. if your only call is to open it, there’s no time in between, and you are already handling ENOENT.

given the slowness of converting exceptions, it’s probably still worth doing the stat to avoid the exception in the vast majority of cases. but unless you are assured that your process is the only one that may be touching the file, you should still handle corner case where there is still a ENOENT.

– Ethan