Iterables and Iterators: a lazy approach to collections

For a long time I knew that lazily evaluated sequences existed. It was a thing present in a lot of programming languages, or their standard libraries, yet my brain still struggled with them. Mapping items one at a time was intuitive to me, but how did filtering work when you’re only looking at one item at a time? This post comes after I spent time wondering how these basic operations worked and then wondering how I’d implement them myself. In fact, that’s exactly what I’ve done here. This post follows the same learning I did, starting from what these objects really are and the building up to solving a problem.

I’m going to be using Javascript’s iterators in this post, but there are equivalents in other major modern programming languages. C# has IEnumerator, Java has Iterator, and functional languages will often have some form of lazily evaluated sequence built-in.

📖

Stephen Toub’s post on the Microsoft blog about how async/await really works in C# explains how IEnumerable was one way to manage async logic prior to the introduction of Task. Be prepared for a long and highly technical read if you do click that link.

⚠️

The code in this post is here to illustrate how these features work. Please don’t use the code I’ve written in your projects. Use someone else’s properly maintained library, you’ll thank yourself later.

Arrays

Arrays and lists are pretty standard collections in any programming language, and often make their appearances in early computer science courses. In Javascript we only have an array type, but has methods for it to be used in the way a list would in other languages. Array are objects that represent an ordered sequence of values, and those values are laid out in order in memory.

Setting the scene

As an example problem to set the scene for the real topic of this post, let’s imagine we want an array of 10 numbers that are not divisible by 5, then multiply by two. This is a fairly simple, if not contrived, problem that can be solved with a filter() and a map().

tsx
const array = range(1, 100);                     // Let's pretend this function exists already and returns an array from 1 to 100
const filtered = array.filter(n => n % 5 !== 0); // Remove numbers divisible by 5
const mapped = filtered.map(n => n * 2);         // Multiply all number by 2
const sliced = mapped.slice(0, 10);              // Keep only the first 10 numbers
console.log(sliced);
// [2, 4, 6, 8, 12, 14, 16, 18, 22, 24]

In order to try represent what the computer is doing, I’ve written out a table of the values created at each stage.

tsx
[1, 2, 3, 4, 5,  6,  7,  8,  9,  10, 11, 12, 13, 14, /* ... */  98,  99, 100]; // range(1, 100);
[1, 2, 3, 4,     6,  7,  8,  9,      11, 12, 13, 14, /* ... */  98,  99     ]; // array.filter(n => n % 5 !== 0)
[2, 4, 6, 8,    12, 14, 16, 18,      22, 24, 26, 28, /* ... */ 196, 198     ]; // filtered.map(n => n * 2);
[2, 4, 6, 8,    12, 14, 16, 18,      22, 24                                 ]; // mapped.slice(0, 10);

As each function in the chain must complete before the next one can start, and those functions return the computed arrays, every single value in this table (including the ones skipped by the /* ... */s) had to be calculated. If we look at the filter and map stages, we see almost all of the values (from the initial values of 13 all the way up to 100) have no bearing on the final result. All of that calculation work wasn’t necessary for the final solution, but had to be done anyway because of the data structure we’ve chosen to use.

What we need is a way to represent operations without actually executing them up-front, and then being able to do work on only the values we care about. It turns out we do have that, and they’re the topic of this very post.

Iterators and Iterables

Iterators are an extremely simple construct. They abstract away the exact how the data in a collection is stored in favour of the absolute simplest interface possible:

tsx
interface Iterator<T> {
  next(): IteratorResult<T>;
}
interface IteratorResult<T> {
  done: boolean;
  value: T; // Note: not required if `done` is true
}

👩‍💻

Sections with an arrow, like the one below, are expandable in case you want to read the code.

A really bad way to get an Iterator from an array

tsx
// Please use `array[Symbol.iterator]()` instead
function getIterator<T>(array: T[]): Iterator<T> {
  let currentIndex = 0;
	function next() {
		if (currentIndex >= array.length) {
			return { done: true };
    }
		const result = { done: false, value: array[currentIndex] };
		currentIndex++;
		return result;
  }
  return { next };
}
const iterator = getIterator([1, 2, 3]);
iterator.next(); // { done: false, value: 1 }
iterator.next(); // { done: false, value: 2 }
iterator.next(); // { done: false, value: 3 }
iterator.next(); // { done: true }

You’ll notice that the Iterator itself doesn’t contain any of the collection data itself. It’s really up to whatever created the Iterator to provide the next value whenever next() is called. You’ll also notice that this is not quite a linked list: there are no nodes here, just a single object that produces values.

Usually you find these wrapped up in another layer called Iterable. The Iterable interface exists to make it easy to provide separate Iterators based on the same underlying data structure to avoid conflicts between different sets of iteration.

tsx
interface Iterable<T> {
  [Symbol.iterator](): Iterator<T>
}

Symbol.iterator is a well-known Symbol, effectively a global constant that the Javascript engine can use for internal operations. Relevant to us here, it means that anything implementing the Iterable interface can be used in constructs like the for ... of loop.

From these interfaces it’s fairly easy to build up equivalents to the built-in array operations.

An implementation of map() using Iterables

tsx
function map<T, U>(iterable: Iterable<T>, mapper: (value: T) => U): Iterable<U> {
  // The Iterable object
  return {
    [Symbol.iterator]() {
      // Get a new iterator from the given Iterable.
      // For most implementations, this will start from the beginning of the collection.
      const iterator = iterable[Symbol.iterator]();
      function next() {
        const { value, done } = iterator.next();
        if (done) {
          return { done: true };
        }
        const newValue = mapper(value);
        return { done: false, value: newValue };
      }
      return { next };
    }
  }
}
const iterable = map([1, 2, 3], (n) => n * 2);
const iterator = iterable[Symbol.iterator]();
iterator.next(); // { done: false, value: 1 }
iterator.next(); // { done: false, value: 2 }
iterator.next(); // { done: false, value: 3 }
iterator.next(); // { done: true }

Generator functions

Generator functions are a way of defining and creating Iterators whose values require more complex logic. They also reduce a bit of the boilerplate required when implementing new Iterables. Calling a generator function returns a Generator object, which is both an Iterable and an Iterator (getting an iterator from a generator returns itself). Practically, this means that you can only iterate over the sequence a single time — if you try to traverse the sequence a second time it’ll already be finished.

Generator functions do come with a bit of a learning bump when trying to understand them, and that comes in the form of the yield keyword. When the returned generator’s .next() function is called, the generator function is executed up to the point a yield statement is reached. The value that gets yielded ends up as the value in the IteratorResult. The next time the .next() function is called, execution of the generator function resumes from where it left off and continues until the next yield. If the execution reaches a return statement, or reaches the end of the function, then the iteration is considered done.

Re-implementing getIterator for an array using a generator function

tsx
// Notice the * to denote this is a generator function
function* getGenerator<T>(array: T[]) {
  for (let i = 0; i < array.length; i++) {
    yield array[i];
  }
}
const generator = getGenerator([1, 2, 3]);
generator.next(); // { done: false, value: 1 }
generator.next(); // { done: false, value: 2 }
generator.next(); // { done: false, value: 3 }
generator.next(); // { done: true }

An implementation of map() using a generator function

tsx
function* map<T, U>(iterable: Iterable<T>, mapper: (value: T) => U) {
  const iterator = iterable[Symbol.iterator]();
  while (true) {
    const { value, done } = iterator.next();
    if (done) {
      break;
    }
    yield mapper(value);
  }
}

Solving the original problem with iterators

It’s cool that we have another way of thinking about iteration and writing code to work with sequences, but what purpose does this bring?

The major benefit is that Iterables are lazy sequences. In this model you’re just pulling out one value at a time. There is no need to calculate all of the values up-front, which is especially important if you have an expensive computation as part of your chain.

tsx
function* somethingExpensive<T>() {
  // The for loops are just to waste time
  for (let i = 0; i < 10_000_000; i++) {}
  yield 0;
  for (let i = 0; i < 10_000_000; i++) {}
  yield 1;
  for (let i = 0; i < 10_000_000; i++) {}
  yield 2;
}
const generator = somethingExpensive(); // Finishes quickly
generator.next(); // { done: false, value: 0 }, but slowly
generator.next(); // { done: false, value: 1 }, but slowly
generator.next(); // { done: false, value: 2 }, but slowly
generator.next(); // { done: true }

Note that the slowness was only present when .next() was called, and not up-front. In these situations it’s often useful to have a limit on how long your sequence really is, which you can easily do by limiting the number of iterations inside the loop of the generator function.

The second reason lazy sequences are cool is that you can generate infinite sequences. There’s a while loop in the middle of the upcoming integerSequence() function with no break or return, yet when calling .next() the generator still faithfully produces the next value of the sequence. Usually you don’t want every single possible value in existence, but you can limit the length to what you really need with that same take() function above.

tsx
function* integerSequence() {
  let n = 0;
  while (true) {
    yield n;
    n++;
  }
}
function *take<T>(iterable: Iterable<T>, n: number) {
  const iterator = iterable[Symbol.iterator]();
  // Because of this loop, the generator function will only yield at most `n` values
  for (let i = 0; i < n; i++) {
    const { value, done } = iterator.next();
    if (done) {
      break;
    }
    yield value;
  }
}
const shortSequence = take(integerSequence(), 2);
shortSequence.next(); // { done: false, value: 0 }
shortSequence.next(); // { done: false, value: 1 }
shortSequence.next(); // { done: true }

In fact, this is the point where we can circle all the way back around to the problem given at the start. The original array-based implementation was slow because it needed to .filter() every value in the array before it could go on to .map() every value in the array. Here we’ve just shown a system that only does calculations on the numbers it actually cares about. Let’s extend these ideas to filtering.

A filter implementation

tsx
function* filter<T, U>(iterable: Iterable<T>, predicate: (value: T) => boolean) {
  const iterator = iterable[Symbol.iterator]();
  while (true) {
    const { value, done } = iterator.next();
    if (done) {
      break;
    }
    // The magic is here:
    // If the predicate returns false then the loop continues without yielding,
    // which then means it grabs the next value from the iterator to test.
    if (predicate(value)) {
      yield value;
    }
  }
}

We now have all of the pieces to rewrite the original problem from the start of this post, but using Iterables this time. Using the functions defined above, let’s write it again.

tsx
// Run it yourself: https://jsfiddle.net/o4a9jn5L/
const seq = integerSequence();
const filtered = filter(seq, n => n % 5 !== 0);
const mapped = map(filtered, n => n * 2);
const taken = take(mapped, 10);
// At this point, no actual calculation has been done.
// It's only when we call `.next()` that any values are generated.
// We can greedily get as many values as we can with `Array.from()`.
console.log(Array.from(taken));
// [2, 4, 6, 8, 12, 14, 16, 18, 22, 24]

If you’re following along at home, add some logging in the callbacks to see how the order of operations differs from the array version. Instead of all of the filter()s happening then all of the map()s, the Iterable versions of filter() and map() are called as each value in the sequence gets evaluated.

But why?

It’s just another way of thinking about collections, and it’s suited for certain problems. I’m not saying you should go change all of your logic to use lazy sequences by default, that’s a path to madness. For me, knowing about different ways of solving problems is valuable as it provides me with ways to explore problem solving. From that exploration and experience I gain an intuition of when certain concepts are useful.

Honestly, this mindset goes for a lot of computer science, and even technology as a whole. Use what works, but don’t get too stuck in your ways. If a problem is harder than it should be, there’s usually a good reason why. Stepping back, reevaluating, and changing your approach often leads to a better outcome than just trying to slog through it all.

What’s next?

There is another layer often added on top of this: transducers. They’re a way of combining a set of individual processing steps into one, and interest in them was reinvigorated when Clojure added them. Transducers were originally going to be in scope of this blog post, but it ended up being just a little bit too much. I may cover them in a future post if I find an interesting way to present them.

Stuart Thomson

Iterables and Iterators: a lazy approach to collections

Arrays

Setting the scene

Iterators and Iterables

Generator functions

Solving the original problem with iterators

But why?

What’s next?

CSS spritesheets for the modern age

How I’ve been hosting my projects in 2022

Blog