After this presentation, you will …
-
be able to recognize certain loop constructs as candidate for refactoring,
-
know how to convert these constructs into more maintainable Pythonic code,
by refactoring them into elegant pipelines of generators.
-
be more acquainted with the standard library
itertools
module and themore-itertools
package
-
So what are the goals of this presentation?
-
Well, after this presentation you will be able to recognize certain loop constructs as candidate for refactoring
-
You will know how to refactor these constructs into more maintainable, elegant, Pythonic code
-
And you will have had your first acquaintance with the standard library’s
itertools
module and the wonderfulmore-itertools
package
Jan-Hein Bührman <[email protected]>
@janheinb
-
Java vs Python: Basic Python for Java Developers on Real Python
-
Witnessed Python’s first baby steps very close at cwi.nl
-
Various developer, management, and developer roles
-
(Co-)Founder of Ordina Pythoneers,
-
Practice within ordina.nl, dedicated to Python
-
Participating Sponsor of the PSF
-
-
Nowadays a happy Python developer™️
-
But first, let me introduce myself: My name is Jan-Hein Bührman
-
Author of the Basic Python for Java Developers tutorial on Real Python
-
I witnessed the first baby steps of Python very close more than 30 years ago, when I was working as a programmer for the Dutch national research institute for mathematics and computer science
-
While I was working mainly as a C and C++ developer, I always kept an eye on Python’s developments and occasionally used it
-
My employer is Ordina, a Benelux based IT service provider
-
When I was a Java unit manager, I came up with the idea of starting a special Python practice;
-
This practice focuses on Python and software artisanship
-
I think it is quite special to a have a dedicated Python practice within a "wide-spectrum" IT service provider company
-
The Ordina Pythoneers are Participating Sponsor of the PSF
-
-
The last couple of years, I do what I like most: Python programming!
-
Family of similar functions
-
involving loops
-
containing fragments of similar code
-
-
Recognition of patterns of loop constructs ← refactoring
-
Generators Functions
-
itertools
(module) andmore_itertools
(package)
In this presentation…
-
I will introduce you to a family of similar functions,
-
involving loops,
-
containing fragments of similar code
-
And I will show you how you can identify distinct responsibilities within seemingly entangled code
-
I will touch upon the topic of Generator Functions
-
How these Generator Functions will enable you to Refactor this entangled code into separate functions
-
And finally I will introduce you to the utility generators and functions from the
itertools
module and themore_itertools
package
-
Suppose your Product Owner requires a function that returns a list of all Fibonacci number less than n
-
Who knows what a Fibonacci number is? ✋
-
So I’m going to tell you a fictional story, not a true story, about a Python team that will develop functionality for a Product Owner
-
And this Product Owner is representing the Fibonacci Sequence Fan Club
-
And the PO would like you to make a function returning a list of all Fibonacci Numbers less than a certain value (N)
-
✋ So, first of all, who knows what a Fibonacci number is?
In mathematics, the Fibonacci numbers […] form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, […]
Starting from 0 and 1, the next few values in the sequence are:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …
Fibonacci number
The definition from Wikipedia states:
-
In mathematics, the Fibonacci numbers […] form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones
-
So this is the essence, you start with two numbers, usually zero and one,
-
and the next value is simply the addition of the previous two…
-
So in this example, zero is the first Fibonacci number, one the second Fibonacci number, and the next one, the third, is the sum of the previous two numbers, zero and one;
-
zero plus one makes one, so one is the third value of the sequence.
-
The next value is the sum of one and one, which is two, and
-
The subsequent value is the sum of one and two, making three,
-
Two plus three makes five; three plus five makes eight, and so on, and so on… and that’s how the sequence is being built up
So let’s implement that in Python.
link:./src/fibonacci/before.py[role=include]
A bit of Googling might bring you to Python’s own documentation site, which contains an implementation of exactly this functionality. This implementation has another name, it contains type annotations and a docstring, but it’s functionally equivalent
-
First, you create an initially emtpy result list, and subsequently, two variables are initialized to the two first Fibonacci numbers
-
You can see the variables
a
andb
as Fibonacci Registers -
Register
a
holds the "current" Fibonacci number, and registerb
holds the "next" Fibonacci number
-
-
You keep executing the loop body, as long as the "current" Fibonacci number is less than the provided threshold value
-
Within the loop, you append the "current" Fibonacci number to the result list
-
Next, you calculate the new values of the "current" and "next" Fibonacci numbers in one go, using Python’s elegant feature named sequence unpacking. So the new current Fibonacci number becomes the old next Fibonacci number, and the new next Fibonacci number, is the sum of the old two numbers.
-
Finally, you return the result list
-
Happy with the delivered result, the Product Owner asks for more functionality:
Can you also make a function that returns the n'th Fibonacci number (counting from zero)?
More Features
-
So your team demonstrates the functionality and the Product Owner is happy with the result.
-
The PO now asks us to make a function that returns the N'th Fibonacci number, zero based,
-
So Fibonacci number zero is the first Fibonacci number, Fibonacci number one is the second, number two the third, and so on, and so on.
-
[He is just facilitating us programmers to make this zero based 😊]
-
So let’s code this new feature…
link:./src/fibonacci/before.py[role=include]
-
As you can see, the code is similar to the previous function, but there are also differences: …
-
There is no list initialization; the function returns just one value
-
You use the
range()
function to execute the requested number of times the loop body, but you don’t need the counter value itself -
And within the loop body, the new pair of Fibonacci numbers is calculated again.
-
After the loop finishes, you return the "then current" Fibonacci number
-
[show example with zero iterations, and with one iteration]
Now that we’ve got functions for …
Fibonacci numbers up to N,
the N’th Fibonacci number,
…, can you also make functions that return …
the first N Fibonacci numbers, and
the smallest Fibonacci number greater than or equal to N?
Even More features
-
Now the Product Owner really gets enthusiastic and requests for two more Fibonacci related functions, …:
-
a function that returns the first N Fibonacci numbers, and
-
a function that returns the smallest Fibonacci number that is greater than or equal to N.
link:./src/fibonacci/before_compressed.py[role=include] |
link:./src/fibonacci/before_compressed.py[role=include] |
link:./src/fibonacci/before_compressed.py[role=include] |
link:./src/fibonacci/before_compressed.py[role=include] |
Do you see a pattern? ✋
So you see all four implementations collected here together in one slide
|
|
|
|
-
✋ We’re seeing a pattern here, don’t we?
-
[next slide]
def some_fib_related_func(some_parameter):
# (SOMETIMES: INITIALIZE THE LIST)
a, b = 0, 1 # (1)
while "some condition or otherwise a for-loop": # (2)
# (SOMETIMES: APPEND SOMETHING TO THE LIST)
a, b = b, a + b # (3)
return a # (OR SOMETIMES: RETURN THE LIST) (4)
-
The variables
a
andb
are initialized to their initial values -
Some
while
loop orfor
loop is set up -
The variables
a
andb
are assigned their new respective values -
The desired calculated value, or a list of these values, is returned
The pattern: …
-
It starts with that sometimes, the calculated values need to be collected in some container, as
list
for example; this container needs to be initialized first. -
Then you see the familiar initialization of the two Fibonacci registers,
a
andb
. -
Subsequently, some sort of loop is set up, counting, or with an end-condition.
-
Within the loop-body, sometimes the "current" value is added to the container, and
-
the new values of the two Fibonacci registers are calculated.
-
Finally, the "current" Fibonacci value, or the built-up container is returned.
Now that you see that the same code appears in different functions, you start looking for ways to get rid of the apparent redundancy of code.
(because you want to keep your code DRY[1])
❓ Refactoring ✋
-
I don’t know how you experience it, but I always get this itchy feeling if I see repetitions of code; you would like to keep your code DRY.
-
So you start looking for ways how to refactor the code, extracting the common part into a function or method
-
✋ So who knows what refactoring is?
-
Restructuring existing computer code
-
Keeping external behaviour the same
-
Less complexity
-
Improves …
-
… design and structure
-
… code readability
-
… maintainability
-
Code Refactoring
— Wikipedia
-
So what’s code refactoring about?
-
You restructure your code without changing its external behavior
-
You usually do this to improve the design, structure, and implementation
(but you make sure that the original functionality of the code remains the same). -
Your code becomes more readable, less complex, better maintainable
(while trying to extract common parts of the code into a function: …)
-
The code is entangled with control flow constructs —
while
loops orfor
loops -
Experiment: combine the requested functions❓
-
(by using a "mode" flag or an
enum
designating the mode)
-
-
So it seems impossible to factor those common parts out and put them in a separate, distinct function
-
The code is mixed up with control flow constructs,
while
loop orfor
loop -
… ¿Maybe you could try to combine the functions all in one golden "super" function?
-
Let’s start with trying the first two requested features, and we will use a "mode" flag (a boolean) to designate which of the two features is requested…
Warning
|
Put on your Peril-Sensitive Sunglasses[2] |
link:./src/fibonacci/first_two_combined.py[role=include]
-
Warning: what you see here, really hurts the eye; so put on your Peril-Sensitive Sunglasses…
[next click]
-
As you can see, it starts already with the ugly return type annotation, and as the author of
Black
, Łukasz Langa, told us in his excellent PyCon US keynote: "ugly type annotations are an indication of ugly code" -
You should not take this contrived example too seriously; it merely demonstrates that the resulting code is not quite maintainable and this isn’t the way to go…
(take a step back: think in terms of responsibilities)
-
The common part between all functions:
-
Producing Fibonacci numbers
-
-
The mutually differing parts between all functions:
-
Consuming (further processing) the Fibonacci numbers:
-
collecting them in a container, or
-
waiting for a loop condition to occur, signalling it’s time to return the last calculated "current" Fibonacci number
-
-
how to separate the common part from the mutually differing parts?
-
The first step is to look at your code in a more conceptual way…,
-
… from a larger distance, a larger flying altitude
-
And then you realize that there is a data producing part in your code, … the code that is calculating and providing the Fibonacci numbers
-
And a data consuming part, … the code that either collect these values in a container, or is checking some condition that identifies the moment to return the "current" Fibonacci number.
-
The data producing part is the part that you can isolate.
-
It looks like a normal function, but it contains
yield
expression(s) -
When called, it returns a generator iterator, implementing the iterator protocol
-
Each plain
yield
…-
produces one item
-
saves the state of the generator function
-
passes control back to the calling code
-
-
When a new value is requested, the generator function resumes its execution from where it left off
-
The iterator can be used in a
for
loop, for example.
-
Enter Generator Functions
-
You just make a function, however you create a
yield
expression for each value that you would like to give back to the "caller" (the "consuming" party)
-
So let’s make a very basic generator function
link:./src/example_generator_function.py[role=include]
for
loop>>> for i in gen2():
... print(i)
...
[==before yield 0==]
0
[==before yield 1==]
1
[==before return==]
>>> itr = gen2()
>>> next(itr)
[==before yield 0==]
0
>>> next(itr)
[==before yield 1==]
1
>>> next(itr)
[==before return==]
Traceback (most recent call last):
...
StopIteration
[click to show definition]
-
You see here an example of a basic generator function, mimicing the behavior of the
range()
function with an argument of2
-
When called, it produces a generator iterator, that implements the iterator protocol
-
So if you call
next()
on it, it gives you the first value (0
). -
another
next()
gives the second value (1
) -
When you request another value, it raises a
StopIteration
exception, signaling that the iterator has no more value to give -
When used in a
for
loop, it also behaves just like any other iterable
So how can you code a Fibonacci generator function?
link:./src/fibonacci/refactored.py[role=include]
-
So this is how the Fibonacci Number Generator looks like
-
[shortly explain the code of
fib_gen()
] -
What you see here, is in essence the "canonical" code representation of the Fibonacci sequence (when coded in an imperative programming style).
-
So how would you use this generator?
-
[next slide]
Print the first 8 Fibonacci numbers using fib_gen()
:
>>> for _, fib in zip(range(8), fib_gen()):
... print(f"{fib:2d}")
...
0
1
1
2
3
5
8
13
So now you can refactor the requested Fibonacci functions, making use of this Fibonacci number generator.
-
[Explain that
zip()
stops as soon as one of the iterators is exhausted.] -
Making the (by the Product Owner requested) 4 Fibonacci functions, has now become relatively easy.
-
But we can go even one step further …
-
[next slide]
The next step: make use of the building blocks provided by:
The [itertools
] module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an "iterator algebra" making it possible to construct specialized tools succinctly and efficiently in pure Python.
itertools
-
… because Python’s standard library provides a wonderful module, delivering this functionality without needing to code any loop constructs yourself, the
itertools
module [read out the quote] -
what is called the "iterator algebra", is what I would call "a pipeline of iterators"
-
more-itertools
implements the recipes listed in theitertools
documentation and provides additional building blocks.
link:./src/fibonacci/refactored.py[role=include]
itertools.
takewhile
(
predicate
,
iterable
)
:
-
Make an iterator that returns elements from the iterable as long as the predicate is true.
-
[First explain
takewhile
]: Thetakewhile
"filter" fetches the elements one by one, calls the predicate function with fetched element as the single parameter, and passes through each element as long as the predicate returns a truthy value, and stops otherwise. -
[explain how the function uses these building blocks]
-
Subsequently, you collect the resulting truncated Fibonacci sequence in a list.
link:./src/fibonacci/refactored.py[role=include]
itertools.
islice
(
iterable
,
stop
)
:
itertools.
islice
(
iterable
,
start
,
stop
[,
step
])
:
-
Make an iterator that returns selected elements from the iterable, just like regular slicing, but without support for negative values for start, stop, or step.
more_itertools.
one
(
iterable
,
too_short
=ValueError,
too_long
=ValueError
)
:
-
Return the first item from iterable, which is expected to contain only that item. Raise an exception if iterable is empty or has more than one item.
-
[First explain
islice
andone
]: Theitertools
'islice
generator selects elements from the iterable, like slicing would do from a sequence. Either you specify only the stop value, or you can specify start, stop, and optionally the step value -
And the
more_itertools
'one
function ensures that there is exactly one element in the iterable, and returns that element. You can specify additional keyword arguments to specify your own exceptions, other than the defaultValueError
-
[explain how the function uses these building blocks]
link:./src/fibonacci/refactored.py[role=include]
Now making use of the itertools.
islice
(
iterable
,
stop
)
variant,
only providing the stop
argument
-
In this case, you use the
itertools
'islice
"selection generator" with only the `stop` argument. -
And you use the
list
constructor again to collect the elements in a list. -
[explain how the function uses these building blocks]
link:./src/fibonacci/refactored.py[role=include]
itertools.
dropwhile
(
predicate
,
iterable
)
:
-
Make an iterator that drops elements from the iterable as long as the predicate is true; afterwards, returns every element.
more_itertools.
first
(
iterable
[,
default
])
:
-
Return the first item of iterable, or a default if iterable is empty.
-
[First explain
dropwhile
andfirst
] Theitertools
'dropwhile
generator filters away items from the passediterable
. For each item from the iterable, it calls the provided predicate function with the item as its single argument, and as long as this function provides a "truthy" value, the element is discarded. Once the predicate function returns a "falsy" value, it starts passing through the remaining items. -
The
more_itertools
'first
function just returns the first value of the passediterable
. -
[explain how the function uses these building blocks]
-
Check out the documentation of
itertools
andmore_itertools
-
A valuable collection of utilities to have in your toolbox
Just an Example:
-
Using
more_itertools.
chunked()
(orichunked()
) to break an iterable into smaller parts. -
For bulk data - can be used to insert manageable sized "batches" of values into a database, for example.
-
I encourage you to read through the documentation of the
itertools
module and themore_itertools
package. It might probably save you some work if you have a good idea about the functionality it offers. -
It often makes your code more expressive as well
-
There is a lot more to discover from these utilities. A final example is the
chunked()
orichunked()
function, that partitions your generated data into chunks; this can be convenient for loading large amounts of data into a database, for example, keeping the bulk uploads into manageable amounts.
-
Recognize the "pattern" of certain loop constructs as candidate for refactoring,
-
Introduce generator functions to extract the "data-producing" part
-
itertools
andmore_itertools
might save a lot of coding
-
You’ve heard about the following:…
-
[see slides]
- Fibonacci number
- Code refactoring
- Generator Function
itertools
more_itertools
All the presented code has been verified using pytest-bdd
, see
github.com/jhbuhrman/refactor-into-generators
❓