Skip to content

Latest commit

 

History

History
871 lines (709 loc) · 25.8 KB

refactor-into-generators.adoc

File metadata and controls

871 lines (709 loc) · 25.8 KB

When To Refactor Your Code Into Generators And How

Goals

After this presentation, you will …​

  • be able to recognize certain loop constructs as candidate for refactoring,

  • know how to convert these constructs into more maintainable Pythonic code,

    by refactoring them into elegant pipelines of generators.

  • be more acquainted with the standard library itertools module and the more-itertools package

  • So what are the goals of this presentation?

  • Well, after this presentation you will be able to recognize certain loop constructs as candidate for refactoring

  • You will know how to refactor these constructs into more maintainable, elegant, Pythonic code

  • And you will have had your first acquaintance with the standard library’s itertools module and the wonderful more-itertools package

Hello! 🙂

Jan-Hein Bührman <[email protected]>
@janheinb

  • But first, let me introduce myself: My name is Jan-Hein Bührman

  • Author of the Basic Python for Java Developers tutorial on Real Python

  • I witnessed the first baby steps of Python very close more than 30 years ago, when I was working as a programmer for the Dutch national research institute for mathematics and computer science

  • While I was working mainly as a C and C++ developer, I always kept an eye on Python’s developments and occasionally used it

  • My employer is Ordina, a Benelux based IT service provider

    • When I was a Java unit manager, I came up with the idea of starting a special Python practice;

    • This practice focuses on Python and software artisanship

    • I think it is quite special to a have a dedicated Python practice within a "wide-spectrum" IT service provider company

    • The Ordina Pythoneers are Participating Sponsor of the PSF

  • The last couple of years, I do what I like most: Python programming!

Topics

  • Family of similar functions

    • involving loops

    • containing fragments of similar code

  • Recognition of patterns of loop constructs ← refactoring

  • Generators Functions

  • itertools (module) and more_itertools (package)

In this presentation…​

  • I will introduce you to a family of similar functions,

  • involving loops,

  • containing fragments of similar code

  • And I will show you how you can identify distinct responsibilities within seemingly entangled code

  • I will touch upon the topic of Generator Functions

  • How these Generator Functions will enable you to Refactor this entangled code into separate functions

  • And finally I will introduce you to the utility generators and functions from the itertools module and the more_itertools package

Variations on Fibonacci Number Selections

(a fictional story)
  • Suppose your Product Owner requires a function that returns a list of all Fibonacci number less than n

  • Who knows what a Fibonacci number is? ✋

  • So I’m going to tell you a fictional story, not a true story, about a Python team that will develop functionality for a Product Owner

  • And this Product Owner is representing the Fibonacci Sequence Fan Club

  • And the PO would like you to make a function returning a list of all Fibonacci Numbers less than a certain value (N)

  • ✋ So, first of all, who knows what a Fibonacci number is?

Fibonacci Number (definition)

In mathematics, the Fibonacci numbers […​] form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, […​]

Starting from 0 and 1, the next few values in the sequence are:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …​

— Wikipedia
Fibonacci number

The definition from Wikipedia states:

  • In mathematics, the Fibonacci numbers […​] form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones

  • So this is the essence, you start with two numbers, usually zero and one,

  • and the next value is simply the addition of the previous two…​

  • So in this example, zero is the first Fibonacci number, one the second Fibonacci number, and the next one, the third, is the sum of the previous two numbers, zero and one;

  • zero plus one makes one, so one is the third value of the sequence.

  • The next value is the sum of one and one, which is two, and

  • The subsequent value is the sum of one and two, making three,

  • Two plus three makes five; three plus five makes eight, and so on, and so on…​ and that’s how the sequence is being built up

So let’s implement that in Python.

Fibs Less Than N (Python)

link:./src/fibonacci/before.py[role=include]

A bit of Googling might bring you to Python’s own documentation site, which contains an implementation of exactly this functionality. This implementation has another name, it contains type annotations and a docstring, but it’s functionally equivalent

  • First, you create an initially emtpy result list, and subsequently, two variables are initialized to the two first Fibonacci numbers

    • You can see the variables a and b as Fibonacci Registers

    • Register a holds the "current" Fibonacci number, and register b holds the "next" Fibonacci number

  • You keep executing the loop body, as long as the "current" Fibonacci number is less than the provided threshold value

  • Within the loop, you append the "current" Fibonacci number to the result list

  • Next, you calculate the new values of the "current" and "next" Fibonacci numbers in one go, using Python’s elegant feature named sequence unpacking. So the new current Fibonacci number becomes the old next Fibonacci number, and the new next Fibonacci number, is the sum of the old two numbers.

  • Finally, you return the result list

Product Owner is happy and wants more

  • Happy with the delivered result, the Product Owner asks for more functionality:

Can you also make a function that returns the n'th Fibonacci number (counting from zero)?

— Happy Product Owner
More Features
  • So your team demonstrates the functionality and the Product Owner is happy with the result.

  • The PO now asks us to make a function that returns the N'th Fibonacci number, zero based,

  • So Fibonacci number zero is the first Fibonacci number, Fibonacci number one is the second, number two the third, and so on, and so on.

  • [He is just facilitating us programmers to make this zero based 😊]

  • So let’s code this new feature…​

The N’th Fibonacci number (Python)

link:./src/fibonacci/before.py[role=include]
  • As you can see, the code is similar to the previous function, but there are also differences: …​

  • There is no list initialization; the function returns just one value

  • You use the range() function to execute the requested number of times the loop body, but you don’t need the counter value itself

  • And within the loop body, the new pair of Fibonacci numbers is calculated again.

  • After the loop finishes, you return the "then current" Fibonacci number

  • [show example with zero iterations, and with one iteration]

Now the Product Owner really gets enthusiastic!

Now that we’ve got functions for …​

  • Fibonacci numbers up to N,

  • the N’th Fibonacci number,

    …​, can you also make functions that return …​

  • the first N Fibonacci numbers, and

  • the smallest Fibonacci number greater than or equal to N?

— Very Enthusiastic Product Owner
Even More features
  • Now the Product Owner really gets enthusiastic and requests for two more Fibonacci related functions, …​:

  • a function that returns the first N Fibonacci numbers, and

  • a function that returns the smallest Fibonacci number that is greater than or equal to N.

The First N Fibs (Python)

link:./src/fibonacci/before.py[role=include]

Smallest Fib Greater/Equal N (Python)

link:./src/fibonacci/before.py[role=include]

All (shortened) functions next to each other

link:./src/fibonacci/before_compressed.py[role=include]
link:./src/fibonacci/before_compressed.py[role=include]
link:./src/fibonacci/before_compressed.py[role=include]
link:./src/fibonacci/before_compressed.py[role=include]

 
Do you see a pattern? ✋

So you see all four implementations collected here together in one slide

fib_list_to

first_n_fibs

fib_ordinal

smallest_fib_greater_equal

  • ✋ We’re seeing a pattern here, don’t we?

  • [next slide]

The pattern

def some_fib_related_func(some_parameter):
    # (SOMETIMES: INITIALIZE THE LIST)
    a, b = 0, 1  # (1)
    while "some condition or otherwise a for-loop":  # (2)
        # (SOMETIMES: APPEND SOMETHING TO THE LIST)
        a, b = b, a + b  # (3)
    return a  # (OR SOMETIMES: RETURN THE LIST) (4)
  1. The variables a and b are initialized to their initial values

  2. Some while loop or for loop is set up

  3. The variables a and b are assigned their new respective values

  4. The desired calculated value, or a list of these values, is returned

The pattern: …​

  • It starts with that sometimes, the calculated values need to be collected in some container, as list for example; this container needs to be initialized first.

  • Then you see the familiar initialization of the two Fibonacci registers, a and b.

  • Subsequently, some sort of loop is set up, counting, or with an end-condition.

  • Within the loop-body, sometimes the "current" value is added to the container, and

  • the new values of the two Fibonacci registers are calculated.

  • Finally, the "current" Fibonacci value, or the built-up container is returned.

Now that you see that the same code appears in different functions, you start looking for ways to get rid of the apparent redundancy of code.

Time to Start Refactoring Your Code

(because you want to keep your code DRY[1])  
 

❓ Refactoring ✋

  • I don’t know how you experience it, but I always get this itchy feeling if I see repetitions of code; you would like to keep your code DRY.

  • So you start looking for ways how to refactor the code, extracting the common part into a function or method
     

  • ✋ So who knows what refactoring is?

Code Refactoring

  • Restructuring existing computer code

  • Keeping external behaviour the same

  • Less complexity

  • Improves …​

    • …​ design and structure

    • …​ code readability

    • …​ maintainability


Code Refactoring
​ — Wikipedia

  • So what’s code refactoring about?

  • You restructure your code without changing its external behavior

  • You usually do this to improve the design, structure, and implementation
    (but you make sure that the original functionality of the code remains the same).

  • Your code becomes more readable, less complex, better maintainable

Extracting Common Parts Not Doable?

(while trying to extract common parts of the code into a function: …​)

  • The code is entangled with control flow constructs — while loops or for loops

  • Experiment: combine the requested functions❓

    • (by using a "mode" flag or an enum designating the mode)

  • So it seems impossible to factor those common parts out and put them in a separate, distinct function

  • The code is mixed up with control flow constructs, while loop or for loop

  • …​ ¿Maybe you could try to combine the functions all in one golden "super" function?

  • Let’s start with trying the first two requested features, and we will use a "mode" flag (a boolean) to designate which of the two features is requested…​

Experiment: Combine The First Two Functions

Warning
Put on your Peril-Sensitive Sunglasses[2]
link:./src/fibonacci/first_two_combined.py[role=include]
  • Warning: what you see here, really hurts the eye; so put on your Peril-Sensitive Sunglasses…​

[next click]

  • As you can see, it starts already with the ugly return type annotation, and as the author of Black, Łukasz Langa, told us in his excellent PyCon US keynote: "ugly type annotations are an indication of ugly code"

  • You should not take this contrived example too seriously; it merely demonstrates that the resulting code is not quite maintainable and this isn’t the way to go…​

Refactoring The Right Way: Single Responsibility

(take a step back: think in terms of responsibilities)

  • The common part between all functions:

    • Producing Fibonacci numbers

  • The mutually differing parts between all functions:

    • Consuming (further processing) the Fibonacci numbers:

      • collecting them in a container, or

      • waiting for a loop condition to occur, signalling it’s time to return the last calculated "current" Fibonacci number

how to separate the common part from the mutually differing parts?

  • The first step is to look at your code in a more conceptual way…​,

  • …​ from a larger distance, a larger flying altitude

  • And then you realize that there is a data producing part in your code, …​ the code that is calculating and providing the Fibonacci numbers

  • And a data consuming part, …​ the code that either collect these values in a container, or is checking some condition that identifies the moment to return the "current" Fibonacci number.

  • The data producing part is the part that you can isolate.

Context Switch: Generator Functions

  • It looks like a normal function, but it contains yield expression(s)

  • When called, it returns a generator iterator, implementing the iterator protocol

  • Each plain yield …​

    • produces one item

    • saves the state of the generator function

    • passes control back to the calling code

  • When a new value is requested, the generator function resumes its execution from where it left off

  • The iterator can be used in a for loop, for example.

  • Enter Generator Functions

  • You just make a function, however you create a yield expression for each value that you would like to give back to the "caller" (the "consuming" party)
     

  • So let’s make a very basic generator function

An Example Generator Function

Definition
link:./src/example_generator_function.py[role=include]
Usage in a for loop
>>> for i in gen2():
...     print(i)
...
 [==before yield 0==]
0
 [==before yield 1==]
1
 [==before return==]
Usage
>>> itr = gen2()
>>> next(itr)
 [==before yield 0==]
0
>>> next(itr)
 [==before yield 1==]
1
>>> next(itr)
 [==before return==]
Traceback (most recent call last):
  ...
StopIteration

[click to show definition]

  • You see here an example of a basic generator function, mimicing the behavior of the range() function with an argument of 2

  • When called, it produces a generator iterator, that implements the iterator protocol

  • So if you call next() on it, it gives you the first value (0).

  • another next() gives the second value (1)

  • When you request another value, it raises a StopIteration exception, signaling that the iterator has no more value to give

  • When used in a for loop, it also behaves just like any other iterable

So how can you code a Fibonacci generator function?

The Fibonacci Number Generator

link:./src/fibonacci/refactored.py[role=include]
  • So this is how the Fibonacci Number Generator looks like

  • [shortly explain the code of fib_gen()]

  • What you see here, is in essence the "canonical" code representation of the Fibonacci sequence (when coded in an imperative programming style).

  • So how would you use this generator?

  • [next slide]

Example usage of the Fibonacci generator

Print the first 8 Fibonacci numbers using fib_gen():

>>> for _, fib in zip(range(8), fib_gen()):
...     print(f"{fib:2d}")
...
 0
 1
 1
 2
 3
 5
 8
13

 

So now you can refactor the requested Fibonacci functions, making use of this Fibonacci number generator.

  • [Explain that zip() stops as soon as one of the iterators is exhausted.]

  • Making the (by the Product Owner requested) 4 Fibonacci functions, has now become relatively easy.

  • But we can go even one step further …​

  • [next slide]

Are We Set? Problem Solved?

The next step: make use of the building blocks provided by:

  • itertools[3] from the standard library, and

  • the more-itertools[4] external package.

The [itertools] module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an "iterator algebra" making it possible to construct specialized tools succinctly and efficiently in pure Python.
— Python documentation, library
itertools
  • …​ because Python’s standard library provides a wonderful module, delivering this functionality without needing to code any loop constructs yourself, the itertools module [read out the quote]

  • what is called the "iterator algebra", is what I would call "a pipeline of iterators"

  • more-itertools implements the recipes listed in the itertools documentation and provides additional building blocks.

Fibs Less Than N (Python, Refactored)

link:./src/fibonacci/refactored.py[role=include]

 

itertools.takewhile(predicate, iterable):

Make an iterator that returns elements from the iterable as long as the predicate is true.

  • [First explain takewhile]: The takewhile "filter" fetches the elements one by one, calls the predicate function with fetched element as the single parameter, and passes through each element as long as the predicate returns a truthy value, and stops otherwise.

  • [explain how the function uses these building blocks]

  • Subsequently, you collect the resulting truncated Fibonacci sequence in a list.

The N’th Fibonacci number (Python, Refactored)

link:./src/fibonacci/refactored.py[role=include]

itertools.islice(iterable, stop):
itertools.islice(iterable, start, stop[, step]):

Make an iterator that returns selected elements from the iterable, just like regular slicing, but without support for negative values for start, stop, or step.

more_itertools.one(
    
iterable, too_short=ValueError, too_long=ValueError
):

Return the first item from iterable, which is expected to contain only that item. Raise an exception if iterable is empty or has more than one item.

  • [First explain islice and one]: The itertools' islice generator selects elements from the iterable, like slicing would do from a sequence. Either you specify only the stop value, or you can specify start, stop, and optionally the step value

  • And the more_itertools' one function ensures that there is exactly one element in the iterable, and returns that element. You can specify additional keyword arguments to specify your own exceptions, other than the default ValueError

  • [explain how the function uses these building blocks]

The First N Fibs (Python, Refactored)

link:./src/fibonacci/refactored.py[role=include]


Now making use of the itertools.islice(iterable, stop) variant,
only providing the stop argument

  • In this case, you use the itertools' islice "selection generator" with only the `stop` argument.

  • And you use the list constructor again to collect the elements in a list.

  • [explain how the function uses these building blocks]

The Smallest Fib Greater/Equal N (Python, Refactored)

link:./src/fibonacci/refactored.py[role=include]

itertools.dropwhile(predicate, iterable):

Make an iterator that drops elements from the iterable as long as the predicate is true; afterwards, returns every element.

more_itertools.first(iterable[, default]):

Return the first item of iterable, or a default if iterable is empty.

  • [First explain dropwhile and first] The itertools' dropwhile generator filters away items from the passed iterable. For each item from the iterable, it calls the provided predicate function with the item as its single argument, and as long as this function provides a "truthy" value, the element is discarded. Once the predicate function returns a "falsy" value, it starts passing through the remaining items.

  • The more_itertools' first function just returns the first value of the passed iterable.

  • [explain how the function uses these building blocks]

A Lot of Useful Itertools

  • Check out the documentation of itertools and more_itertools

  • A valuable collection of utilities to have in your toolbox

Just an Example:

  • Using more_itertools.chunked() (or ichunked()) to break an iterable into smaller parts.

  • For bulk data - can be used to insert manageable sized "batches" of values into a database, for example.

  • I encourage you to read through the documentation of the itertools module and the more_itertools package. It might probably save you some work if you have a good idea about the functionality it offers.

  • It often makes your code more expressive as well

  • There is a lot more to discover from these utilities. A final example is the chunked() or ichunked() function, that partitions your generated data into chunks; this can be convenient for loading large amounts of data into a database, for example, keeping the bulk uploads into manageable amounts.

Recap

  • Recognize the "pattern" of certain loop constructs as candidate for refactoring,

  • Introduce generator functions to extract the "data-producing" part

  • itertools and more_itertools might save a lot of coding

  • You’ve heard about the following:…​

  • [see slides]

Thank You

All the presented code has been verified using pytest-bdd, see github.com/jhbuhrman/refactor-into-generators


Questions