A common idiom in Ruby code (FileUtils,
ActiveSupport, Hike)
when dealing with a file or directory that may or may not exist is to
go ahead and call a method that expects the file to exist (such as
open
), and catch the exception raised for nonexistent
files. This is generally a poor choice performance-wise on JRuby, but
is also not necessarily ideal on MRI.
While profiling assets precompilation in a JRuby on Rails application, it was found that a large amount of time was being spent converting Java exceptions into Ruby exceptions (this happens to be relatively slow in JRuby). Digging in with the profiler revealed some patterns where exceptions were being raised (and handled) in code dealing with reading and writing cache files and making directories.
def entries_with_exception(dir)
Dir.entries(dir).size
rescue Errno::ENOENT
0
end
The alternative is to first test for existence with something like
File.exists?
, which means a second system call in cases where the
file does exist: one for the test, and a second to actually open the
file.
def entries_after_stat(dir)
if File.directory?(dir)
Dir.entries(dir).size
else
0
end
rescue Errno::ENOENT
0
end
(Thanks Ethan for pointing out that you still need to catch the exception should the dir be deleted after the stat call but before reading its entries)
A single system call is always more efficient than two. However, in the failing case, an exception object must be constructed. This is not free on MRI and is quite expensive under JRuby. The best choice depends on how common the failing case is expected to be.
A synthetic benchmark was created to demonstrate the relative differences. All measurements were done on a AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ running Ubuntu 12.10 with Linux kernel 3.5.0 and ext4 filesystem.
The benchmark gathers a list of directories by scanning /usr/lib
(arbitrary; there were just under 2500). It then sums the number of
entries in each directory using Dir.entries
. A second scan is
done after mangling the directory names into directories that do not
exist to exercise the failure case.
In one mode (stat
), the code first tests if the directory exists
before querying the entries. In another mode (excep
), no test is
made but an exception handler is installed to handle failure case.
In total, there are four variables with two states each: success or failure case (directory exists or not), stat test or exception handler, whether or not the operating system’s disk cache is cleared, and finally the version of Ruby (Ruby 1.9.3 or JRuby 1.7.0).
In the success case, it is expected that the extra stat call to test directory existence will add some overhead, and this is exactly what is observed:
Times are given in milliseconds. Lower is better.
Success Case (directory exists)
Ruby 1.9.3 | JRuby 1.7.0 | |||
Warm | Cold | Warm | Cold | |
excep | 70 | 12000 | 160 | 13000 |
stat | 78 | 13000 | 189 | 13000 |
Failure Case (directory does not exist)
Ruby 1.9.3 | JRuby 1.7.0 | |||
Warm | Cold | Warm | Cold | |
excep | 90 | 5000 | 720 | 7100 |
stat | 8 | 4900 | 20 | 5600 |
Creating exceptions adds significant overhead in both MRI and JRuby. In both, the additional stat test adds under 20% overhead to the success case while handling the failure by catching an exception is an order of magnitude slower.
See JRuby Performance: Exceptions are not flow control which also makes the case that while particularly bad in JRuby, exceptions as flow control slows performance in MRI as well. See other benchmarks showing exceptions are slow.
The problem with stat-ing a file before you open it is that there may be nonzero time between the two, in which time the file may have been deleted by somebody else. if you stat a file, another process deletes it, and then you open the file, you’ll get an unexpected ENOENT. if your only call is to open it, there’s no time in between, and you are already handling ENOENT.
given the slowness of converting exceptions, it’s probably still worth doing the stat to avoid the exception in the vast majority of cases. but unless you are assured that your process is the only one that may be touching the file, you should still handle corner case where there is still a ENOENT.
– Ethan