CodeGnome Consulting, LTD

Programming - DevOps - Project Management - Information Security

Not All Hashes Are Created Equal

| Comments

In Ruby, there’s often more than one way to do something. The language contains a lot of synonyms such as Enumerable#reduce and Enumerable#inject. It also contains a number of work-alikes such as hash literals like {} and hash constructors like Hash.new.

In general, programmers can treat this cornucopia of syntax as a matter of personal preference, or choose synonyms to communicate more clearly about the intent of a given piece of code. However, there are cases where work-alikes are not identical in terms of performance. Let’s look at some of those use cases.

Readability Matters

To my eye, some syntax just looks cleaner than its alternatives. For example, I find that array.reduce Hash.new, :update is easier to look at than array.inject({}, :merge!). They both do the same thing, but I’d rather look at the first example.

However, during some recent code exploration, I found that there can be some subtleties and performance considerations that should inform the choice of Hash syntax beyond just readability.

Readability matters a great deal in code, but there are times when performance matters more. In such cases, the following benchmarks may prove useful.

Performance of Hash Creation

First of all, there is a surprising difference in the speed of creating a hash that depends on whether you use a literal or method constructor. Consider the following benchmark:

Literal vs. Method Constructors
1
2
3
4
5
6
7
8
9
#!/usr/bin/env ruby

require 'benchmark'

ITER = 1_000_000
Benchmark.bmbm do |x|
  x.report('{}')       { ITER.times do {} end  }
  x.report('Hash.new') { ITER.times {Hash.new} }
end

The benchmark data immediately below shows that Hash.new takes 380% longer to create an empty hash than using a hash literal. I’m unsure why the method constructor is so much slower, although I suspect it’s because there’s more overhead in making a method call than in letting the parser do the work with literals.

               user     system      total        real
{}         0.100000   0.000000   0.100000 (  0.097031)
Hash.new   0.380000   0.000000   0.380000 (  0.376029)

The difference in Hash construction speed is barely measurable for a one-off operation. However, if your application builds a lot of hashes, even jiffies can start to add up.

Performance of Merge vs. Update

Another area of performance differentiation can be found in Hash#merge and Hash#update, which is a synonym for the Hash#merge! method.

Hash#update vs. Hash#merge
1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env ruby

require 'benchmark'

ITER = 1_000_000
Benchmark.bmbm do |x|
  array = [{:a => :b}, {:c => :d}]
  x.report(:update) { ITER.times do array.reduce({}, :update) end }
  x.report(:merge)  { ITER.times do array.reduce({}, :merge)  end }
end

Just look at the huge difference between these two methods in terms of speed:

             user     system      total        real
update   0.920000   0.000000   0.920000 (  0.924688)
merge    1.980000   0.000000   1.980000 (  1.972249)

The call to Hash#update (or Hash#merge! if you prefer) is more than twice as fast as its counterpart. If you look at the documentation, the reason for this is actually quite clear: Hash#merge returns a new hash object, while Hash#update makes changes to the current hash object in-place. In quantity, the overhead required to repeatedly create and garbage-collect all those hashes can really add up.

Again, for one-off operations the difference is miniscule. However, if you find yourself working with tree-like structures where you’re making frequent changes to a hash, the performance boost from using :update is well worth considering.

Conclusion

The Hash class is a work-horse object that underpins a lot of Ruby code. However, there are some subtleties and performance considerations that one should consider when implementing data structures that rely on hashes.

For most purposes, one should use whatever syntax is most readable, and select syntax that communicates the intent of the code most effectively. However, tight loops and large numbers of Hash objects may drive you to consider alternatives based on the performance characteristics of Ruby’s varied Hash syntax.

Comments