The Iterator Protocol: How “For Loops” Work in Python
We’re interviewing for a job and our interviewer has asked us to remove all
for
loops from a block of code. They then mentioned something about iterators and cackled maniacally while rapping their fingers on the table. We’re nervous and frustrated about being assigned this ridiculous task, but we’ll try our best.
To understand how to loop without a
for
loop, we’ll need to discover what makes for
loops tick.
We’re about to learn how
for
loops work in Python. Along the way we’ll need to learn about iterables, iterators, and the iterator protocol. Let’s loop. ➿Looping with indexes: a failed attempt
We might initially try to remove our
for
loops by using a traditional looping idiom from the world of C: looping with indexes.
This works on lists, but it fails on sets:
This approach only works on sequences, which are data types that have indexes from
0
to one less than their length. Lists, strings, and tuples are sequences. Dictionaries, sets, and many other iterables are not sequences.
We’ve been instructed to implement a looping construct that works on all iterables, not just sequences.
Iterables: what are they?
In the Python world, an iterable is any object that you can loop over with a for loop.
Iterables are not always indexable, they don’t always have lengths, and they’re not always finite.
Here’s an infinite iterable which provides every multiple of 5 as you loop over it:
When we were using
for
loops, we could have looped over the beginning of this iterable like this:
If we removed the
break
condition from that for
loop, it would go on printing forever.
So iterables can be infinitely long: which means that we can’t always convert an iterable to a
list
(or any other sequence) before we loop over it. We need to somehow ask our iterable for each item of our iterable individually, the same way our for
loop works.Iterables & Iterators
Okay we’ve defined iterable, but how do iterables actually work in Python?
All iterables can be passed to the built-in
iter
function to get an iterator from them.
That’s an interesting fact but… what’s an iterator?
Iterators have exactly one job: return the “next” item in our iterable. They’re sort of like tally counters, but they don’t have a reset button and instead of giving the next number they give the next item in our iterable.
You can get an iterator from any iterable:
And iterators can be passed to
next
to get their next item:
So iterators can be passed to the built-in
next
function to get the next item from them and if there is no next item (because we reached the end), a StopIteration
exception will be raised.Iterators are also iterables
So calling
iter
on an iterable gives us an iterator. And calling next
on an iterator gives us the next item or raises a StopIteration
exception if there aren’t any more items.
There’s actually a bit more to it than that though. You can pass iterators to the built-in
iter
function to get themselves back. That means that iterators are also iterables.
That fact leads to some interesting consequences that we don’t have time to go into right now. We’ll save that discussion for a future learning adventure…
The Iterator Protocol
The iterator protocol is a fancy term meaning “how iterables actually work in Python”.
Let’s redefine iterables from Python’s perspective.
Iterables:
- Can be passed to the
iter
function to get an iterator for them. - There is no 2. That’s really all that’s needed to be an iterable.
Iterators:
- Can be passed to the
next
function which gives their next item or raisesStopIteration
- Return themselves when passed to the
iter
function.
The inverse of these statements should also hold true. Which means:
- Anything that can be passed to
iter
without an error is an iterable. - Anything that can be passed to
next
without an error (except forStopIteration
) is an iterator. - Anything that returns itself when passed to
iter
is an iterator.
Looping with iterators
With what we’ve learned about iterables and iterators, we should now be able to recreate a
for
loop without actually using a for
loop.
This
while
loop manually loops over some iterable
, printing out each item as it goes:
We can call this function with any iterable and it will loop over it:
The above function is essentially the same as this one which uses a
for
loop:
This
for
loop is automatically doing what we were doing manually: calling iter
to get an iterator and then calling next
over and over until a StopIteration
exception is raised.
The iterator protocol is used by
for
loops, tuple unpacking, and all built-in functions that work on generic iterables. Using the iterator protocol (either manually or automatically) is the only universal way to loop over any iterable in Python.For loops: more complex than they seem
We’re now ready to complete the very silly task our interviewer assigned to us. We’ll remove all
for
loops from our code by manually using iter
and next
to loop over iterables. What did we learn in exploring this task?
Everything you can loop over is an iterable. Looping over iterables works via getting an iteratorfrom an iterable and then repeatedly asking the iterator for the next item.
The way iterators and iterables work is called the iterator protocol. List comprehensions, tuple unpacking,
for
loops, and all other forms of iteration rely on the iterator protocol.
I’ll explore iterators more in future articles. For now know that iterators are hiding behind the scenes of all iteration in Python.