Writing Block Methods That Classify or Collect

August 13, 2007 · Filed Under Code Blocks and Iteration, Ruby on Rails · Comment 

Problem

The basic block methods that come with the Ruby standard library aren’t enough for you. You want to define your own method that classifies the elements in an enumeration (like Enumerable#detect and Enumerable#find_all), or that does a transformation on each element in an enumeration (like Enumerable#collect).

Solution

You can usually use inject to write a method that searches or classifies an enumeration of objects. With inject you can write your own versions of methods such as detect and find_all:

	module Enumerable
	  def find_no_more_than(limit)
	    inject([]) do |a,e|
	      a << e if yield e
	      return a if a.size >= limit
	      a
	    end
	  end
	end

This code finds at most three of the even numbers in a list:

	a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
	a.find_no_more_than(3) { |x| x % 2 == 0 }         # => [2, 4, 6]

If you find yourself needing to write a method like collect, it’s probably because, for your purposes, collect itself yields elements in the wrong order. You can’t use inject, because that yields elements in the same order as collect.

You need to find or write an iterator that yields elements in the order you want. Once you’ve done that, you have two options: you can write a collect equivalent on top of the iterator method, or you can use the iterator method to build an Enumerable object, and call its collect method .

Discussion

We discussed these block methods in more detail in , because arrays are the simplest and most common Enumerable data type, and the most common. But almost any data structure can be enumerated, and a more complex data structure can be enumerated in more different ways.

As you’ll see in “Implementing Enumerable: Write One Method, Get 22 Free“, the Enumerable methods, like detect and inject, are actually implemented in terms of each. The detect and inject methods yield to the code block every element that comes out of each. The value of the yield statement is used to determine whether the element matches some criteria.

In a method like detect, the iteration may stop once it finds an element that matches. In a method like find_all, the iteration goes through all elements, collecting the ones that match.

Methods like collect work the same way, but instead of returning a subset of elements based on what the code block says, they collect the values returned by the code block in a new data structure, and return the data structure once the iteration is completed.

If you’re using a particular object and you wish its collect method used a different iterator, then you should turn the object into an Enumerator and call its collect method. But if you’re writing a class and you want to expose a new collect-like method, you’ll have to define a new method. In that case, the best solution is probably to expose a method that returns a custom Enumerator: that way, your users can use all the methods of Enumerable, not just collect.

Changing the Way an Object Iterates

August 13, 2007 · Filed Under Code Blocks and Iteration, Ruby on Rails · Comment 

Problem

You want to use a data structure as an Enumerable, but the object’s implementation of #each doesn’t iterate the way you want. Since all of Enumerable’s methods are based on each, this makes them all useless to you.

Discussion

Here’s a concrete example: a simple array.

array = %w{bob loves alice} array.collect { |x| x.capitalize } # => ["Bob", "Loves", "Alice"]

Suppose we want to call collect on this array, but we don’t want collect to use each: we want it to use reverse_each. Something like this hypothetical collect_reverse method:

array.collect_reverse { |x| x.capitalize } # => ["Alice", "Loves", "Bob"]

Actually defining a collect_reverse method would add significant new code and only solve part of the problem. We could overwrite the array’s each implementation with a singleton method that calls reverse_each, but that’s hacky and it would surely have undesired side effects.

Fortunately, there’s an elegant solution with no side effects: wrap the object in an Enumerator. This gives you a new object that acts like the old object would if you’d swapped out its each method:

require ‘enumerator’ reversed_array = array.to_enum(:reverse_each) reversed_array.collect { |x| x.capitalize } # => ["Alice", "Loves", "Bob"] reversed_array.each_with_index do |x, i| puts %{#{i}=>”#{x}”} end # 0=>”alice” # 1=>”loves” # 2=>”bob”

Note that you can’t use the Enumerator for our array as though it were the actual array. Only the methods of Enumerable are supported:

reversed_array[0] # NoMethodError: undefined method ‘[]‘ for #<Enumerable::Enumerator:0xb7c2cc8c>

Discussion

Whenever you’re tempted to reimplement one of the methods of Enumerable, try using an Enumerator instead. It’s like modifying an object’s each method, but it doesn’t affect the original object.

This can save you a lot of work. Suppose you have a tree data structure that provides three different iteration styles: each_prefix, each_postfix, and each_infix. Rather than implementing the methods of Enumerable for all three iteration styles, you can let each_prefix be the default implementation of each, and call tree.to_enum(:each_postfix) or tree.to_enum(:each_infix) if you need an Enumerable that acts differently.

A single underlying object can have multiple Enumerable objects. Here’s a second Enumerable for our simple array, in which each acts like each_with_index does for the original array:

array_with_index = array.enum_with_index array_with_index.each do |x, i| puts %{#{i}=>”#{x}”} end # 0=>”bob” # 1=>”loves” # 2=>”alice” array_with_index.each_with_index do |x, i| puts %{#{i}=>#{x.inspect}} end # 0=>["bob", 0] # 1=>["loves", 1] # 2=>["alice", 2]

When you require ‘enumerator‘, Enumerable sprouts two extra enumeration methods, each_cons and each_slice. These make it easy to iterate over a data structure in chunks. An example is the best way to show what they do:

sentence = %w{Well, now I’ve seen everything!} two_word_window = sentence.to_enum(:each_cons, 2) two_word_window.each { |x| puts x.inspect } # ["Well,", "now"] # ["now", "I've"] # ["I've", "seen"] # ["seen", "everything!"] two_words_at_a_time = sentence.to_enum(:each_slice, 2) two_words_at_a_time.each { |x| puts x.inspect } # ["Well,", "now"] # ["I've", "seen"] # ["everything!"]

Note how any arguments passed into to_enum are passed along as arguments to the iteration method itself.

In Ruby 1.9, the Enumerable::Enumerator class is part of the Ruby core; you don’t need the require statement. Also, each_cons and each_slice are built-in methods of Enumerable.

Next Page »