There's a trick to using Hash.new

by Jared Norman
published April 29, 2020

In my previous post on hashes I showed that in Ruby you can create a hash with a static default value. Here’s the example I used in that post:

dog_counts = Hash.new(0)
# => {}

[
  "doberman",
  "dachshund",
  "doberman",
  "doberman",
  "whippet",
  "labrador"
].each do |dog_seen|
  dog_counts[dog_seen] += 1
end

dog_counts
# => {"doberman"=>3, "dachshund"=>1, "whippet"=>1, "labrador"=>1}

Grouping Dogs By Breed

This pattern works great for counting dogs, but it can get you into trouble if you’re trying to group them. Let’s look at a second example:

dogs_by_type = Hash.new([])

[
  {type: "whippet", name: "Roxie"},
  {type: "rhodesian", name: "Cessna"},
  {type: "alsatian", name: "Daeva"},
].each do |dog|
  dog_type = dog[:type]
  # The assignment here is because accessing default values doesn't set them in
  # the hash, see: https://jarednorman.ca/hash-new-with-a-block
  dogs_by_type[dog_type] = dogs_by_type[dog_type] << dog[:name]
end

Do you see the bug? Hint for Python programmers: it’s the same issue default argument values have in Python. Let’s take a look at value of dogs_by_type:

{"whippet" => ["Roxie", "Cessna", "Daeva"],
 "weimaraner" => ["Roxie", "Cessna", "Daeva"],
 "rhodesian" => ["Roxie", "Cessna", "Daeva"],
 "alsatian" => ["Roxie", "Cessna", "Daeva"]}

We meant to set the default value for this hash to an array, but we actually set the default value to the one, specific array that we passed in. Every time we accessed a key that wasn’t set yet we got the same array and then mutated (changed) it. As a result all the keys in the hash get set to the same array. Here’s an example that demonstrates this more explicitly:

my_array = [3, 2, 1]
my_hash = Hash.new(my_array)
my_array.sort!
my_hash[:some_key]
#=> [1, 2, 3]

Better Grouping

The solution is to use the block syntax from that blog post that I keep linking you to. That might look something like this:

dogs_by_type = Hash.new { [] }

[
  {type: "whippet", name: "Roxie"},
  {type: "rhodesian", name: "Cessna"},
  {type: "alsatian", name: "Daeva"},
].each do |dog|
  dog_type = dog[:type]
  dogs_by_type[dog_type] = dogs_by_type[dog_type] << dog[:name]
end

dogs_by_type
#=> {"whippet"=>["Roxie"], "rhodesian"=>["Cessna"], "alsatian"=>["Daeva"]}

This gets you the right result, but we can use our learnings from that other post to also assign the default value inside of the block we pass in when creating the hash.

dogs_by_type = Hash.new { |hash, key| hash[key] = [] }

[
  {type: "whippet", name: "Roxie"},
  {type: "rhodesian", name: "Cessna"},
  {type: "alsatian", name: "Daeva"},
].each do |dog|
  dogs_by_type[dog[:type]] << dog[:name]
end

That cleans things up significantly by getting rid of that extra assignment. You could even go one step further and write the logic in a more “functional” style:

[
  {type: "whippet", name: "Roxie"},
  {type: "rhodesian", name: "Cessna"},
  {type: "alsatian", name: "Daeva"},
].group_by { |dog|
  dog[:type]
}.transform_values { |dogs|
  dogs.map { |dog|
    dog[:name]
  }
}

Now we never even create a hash with a default value!

Conclusion

There are three things you should take away from this article:

  1. Dogs are good.
  2. Hashes can have default values in Ruby, but make sure you create them dynamically if you’re going to mutate them.
  3. Sometimes you can lean on Ruby’s Enumerable methods and you won’t even need to make a hash with a default value. That said, the code you end up with might not communicate what you’re trying to do as clearly. It’s a judgement call.