Lazy Evaluation
Lazy evaluation is a strategy implemented in some programming languages whereby certain objects are not produced until they are needed; this strategy is often used in conjunction with functions that produce collections of objects. This can have important performance implications, both for memory management and function run time. If you only need to iterate through a sequence of objects that can be generated and acted upon one at a time, then you do not need to explicitly construct a big container like a list, only to process the elements one-by-one.
Generators
One general approach to this within Python involves generators, objects that act a bit like functions that return a value, but which remember their internal state in order to return a succession of values upon successive calls to the generator. So instead of having a function that returns a list, for example, one can use a generator that returns each element of the sequence in turn when needed. Generators use the keyword yield
to output a value, instead of the function-based keyword return
.
Successive values from a generator can be accessed either by iterating over the generator (e.g., in a for loop) or via successive calls to the built-in function next()
. If a generator only produces a finite sequence of values (i.e., it does not keep returning values forever), it raises a StopIteration
exception once the end of the finite sequence is reached. If a fully-formed list or tuple containing the generated elements is desired, the appropriate constructor can be used to create that object.
In addition to providing an ability to define arbitrary generators (via the yield
keyword), Python also provides specialized iterable types that use lazy evaluation to produce a sequence of objects, although this behavior differs between Python 2 and Python 3, with Python 3 adopting lazy evaluation techniques much more aggressively.
range
The Python function range()
is used to produce a sequence of integers, starting from a prescribed start and ending one integer short of a prescribed stop, with optionally a step size between successive integers (default step = 1). In Python 2, range()
returned a list of integers: for example, range(0,10)
returned the list [0,1,2,3,4,5,6,7,8,9]
. In Python 3, the range function returns an object of type range
, which can be iterated over to yield the same sequence of integers, e.g.,
The for loop stops once a StopIteration exception is raised by the range object. The advantage of this is that the full list is never created, resulting in a smaller memory footprint for the interpreter. The for loop above, when executed in Python 2, generates a temporary list when range(0,10) is called, only to delete the temporary list once the iteration is complete. In Python 2, there was a range()
function — which created a list of integers — and an xrange()
function, which produced the same sequence through lazy evaluation, whereas in Python 3 there is only the lazy range()
function. For small sequences, the overhead and extra memory usage of temporary creation and deletion might be negligible, but it can be substantial for very large range sequences. If an explicit list from a range object is in fact desired, one can simply call the list constructor:
zip
A similar example is the zip()
function, which is used to "zip" two or more iterables together, that is, to produce a sequence of tuples where the i-th element of the tuple comes from the i-th iterable. In Python 2, zip() returned a list of tuples, whereas in Python 3, it returns a zip object that can be iterated over to produce a sequence of tuples.
open
The Python command to open a file returns a generator, which means that it doesn't read the whole file at once, by default. It reads parts as you request them from the generator. When you execute the following,
every time through the loop, the variable line
is assigned to the next line in the file produced by the file generator f
.
Experiment with range, zip, and open to make sure you know how to manipulate the objects produced by those commands. As an iterable, a range object supports membership testing via the in
keyword; run some timing tests to explore such testing as a function of the number of integers encompassed by the range.