made of stone: Collecting things

I was helping someone out on Friday night (such an exciting life I lead) ... I have no idea of his programming background but at one point we needed to iterate through a collection. His hands started typing out i = 0 and then within the loop accessing the elements via an index. And I said ' ... you don't need a counter ... this is Ruby!'.

I've mentioned before that my favourite piece of Rails isn't part of Rails at all. In fact, it's not even part of Ruby. It's the Smalltalk style collections.


people.each do | person |
  puts person.name
end

and


ages = people.collect do | person |
  person.age
end

are directly descended from


people do: [ :person |
  Transcript show: person name
].

and


ages := people collect: [ :person |
  person age
].

I think the similarity is pretty clear.

But why is this so good? Why is it probably my favourite piece of code ever? Most design decisions "smell" right to me ... it's only after a lot of analysis that I come to realise the arguments underlying why it smells right. Years after my first experience with Smalltalk I'm beginning to understand why I love it so.

The equivalent in Delphi or other 'old-school' hybrid OO language would be:


var I: Integer;
begin
  for I:= 0 to People.Count - 1 do
    Writeln((People[I] as TPerson).Name);
end;

Ignoring the typecast (a necessary artifact for any language that does not support duck typing/late binding) how does this compare to the Ruby and Smalltalk examples above?

First of all, we need to define an index. Not a big deal but it's still an obstacle getting in the way of what we want to do (and it's the obliteration of those tiny obstacles that makes Rails so great).

Next we have to count the number of items in the collection. We don't actually care how many items there are, we just want the data written out, but we still need to ask the collection for this piece of its private state.

We also need to understand that this collection starts its index at zero and ends at count minus one. These are implementation details about the collection but we have to know this to make use of it. Any (classic) VB or Realbasic programmer knows all about getting this one wrong.

And then we access the actual object through an index. Effectively dictating that random access is the only access - even if the implementation is a linked list, the current index is thrown away and we re-access at I + 1 at the next iteration.

So the Delphi code relies upon us knowing private implementation details about our collection (the count and the beginning index) and means that the collection cannot optimise its own access, as everything is done through random access.

What about the equivalent in C#?


foreach (Person p in people) {
  System.Console.WriteLine(p.name);
}

That's better. Who cares if we are starting at index zero or index one? Who cares how many items there are in this collection? In fact, apart from the Smalltalk/Ruby, this is my favourite version of the code (even though it lacks duck typing and blocks).

The downside here is that we have had to add a new piece of syntax to the language - foreach. As Java programmers discovered moving from 1.4 to whatever they called 1.5 (I gave up at this point). And that means, as well as polluting the language and adding more reserved words, if we design our own fantastic, new, optimised collection, the foreach operator would know nothing about its internal optimised access methods and would resort to whatever it resorts to with any other collection.

Speaking of Java, how would you do the same in 1.4?


Iterator i = people.iterator();
while (i.hasNext()) {
  System.out.println(((Person)i.next()).getName());
}

This puts the duty of handling iteration where it belongs - it's an implementation detail of the collection itself. No knowledge about the number of elements is required. No knowledge of how the indexes are structured is required. And if we designed our super-fast collection we could also design our own implementation of Iterator to ensure that super-fast access is maintained (actually I'm hoping that foreach will just ask the collection for an Iterator - I have no idea if that's true).

My problem with the Java code is it looks ugly to me. The iteration is divorced from the collection rather than being intrinsic to it. This is a direct result of Java not having blocks and, in effect, allowing us to use the Visitor pattern with our collection. Speaking of which, I also think the Smalltalk looks better than the Ruby, but that's because Smalltalk treats a block as just another object, where Ruby treats it as a syntactic artefact that must be yielded to.

So Smalltalk (and hence Ruby) collections keep private state private, keep iteration as an intrinsic function internal to the collection, do not pollute the language namespace (Smalltalk has seven reserved words), allow painless implementation optimisation and the resulting code is compact, uses blocks and just looks good (IMO). And, given that after your base Object, your collections are the most important Class in your library, that is why they smell fantastic.

made of stone

What's going on?

19 November, 2006

Collecting things

No comments:

Places of Interest