Serialization
Instead of writing and debugging code to save complicated data types, Python provides a standard module called pickle. This module can take almost any Python object (even some forms of Python code!), and convert it to a string;; this process is called pickling. (More generally, the process is known as serializing.) Reconstructing the object from the string representation is called unpickling. Between the operations of pickling and unpickling, the string representing the object may be stored in a file or variable,, or sent over a network connection to some other machine.
If you have an object x, and a file object f that’s been opened for writing, you can pickle the object in just one line of code. The object will then be stored in the file. When f is re-opened later, a clone of x can be retrieved in just one more line of code.
Example: Pickling and unpickling a file
>>> x = [1234, 'hello\n', 2.345]
>>> x
[1234, 'hello\n', 2.345]
>>> f = open('test3.pkl','wb')
>>> import pickle
>>> pickle.dump(x,f)
>>> f.close()
>>> print('Now we are going to unpickle the file')
Now we are going to unpickle the file
>>> f = open('test3.pkl','rb')
>>> y = pickle.load(f)
>>> y
[1234, 'hello\n', 2.345]
>>> y == x
True
As you can see, pickle successfully stored and retrieved all the information about x from file object f. By unpickling, we recovered the proper types of the data elements within x. You are also able to pickle custom objects (i.e., instances of new classes), but the source file defining those classes must be within your import path in order for the pickle.load()
to know how to reconstruct those datatypes. In Python 3, the file modes used for dumping and loading must be binary (i.e., 'wb' and 'rb'), although in Python 2, that is not a requirement. If you forget to do this properly in Python 3, you will see an error message like: TypeError: write() argument must be str, not bytes
.
Incidentally, the pickle functions that we used above are not automatically available from the Python command line. We must import the pickle module first. We’ll have more to say about modules later.
Dill
Dill is an extension of pickle in serializing objects. It can handle objects that pickle can handle, but it also can handle objects that pickle fails to serialize. For example, dill can pickle larger, complex objects, lambda functions, and even the current Python session. Dill is a third-party Python package and can be installed with the Python Package Index.
Example: Pickling the current Python interpreter session
>>> import math
>>> def random_func(x):
... return 4 + math.cos(x)
>>> random_func(1)
4.54030230586814
>>> math
<module 'math' from '.../base/lib/python3.12/lib-dynload/math.cpython-312-darwin.so'>
>>> import dill
>>> dill.dump_module('example.pkl')
In another Python interpreter session,
>>> import dill
>>> dill.load_module('example.pkl')
>>> random_func(1)
4.54030230586814
>>> random_func
<function a at 0x104fabc40>
>>> math
<module 'math' from '.../base/lib/python3.12/lib-dynload/math.cpython-312-darwin.so'>
With dill, your functions, objects, and current loaded modules can all be serialized.
Caution with serialization in Python
Serialization should not be used for long-term data storage, as there are some risks involved. It’s not a guarantee that pickle files can be loaded successfully in a different Python environment or with different package versions. Suppose you want to transfer data from one computer to another, pickle might succeed in a local machine during testing, but fails to unpickle in another machine of a different environment. This does not mean serialization should be avoided at all costs. Serialization is better for short-term storage for items that need to be communicated to other Python processes that use the same Python environment. For example, the package multiprocessing frequently uses pickle to transfer data across processes.
Second, unpickling files of unknown content is a security issue. Because it is hard to determine the contents of a pickle file without loading it into Python, loading a malicious pickle file can lead to arbitrary code execution. However, if you created the pickle file yourself, there is little risk involved. Like with any downloaded file from the internet, a reasonable amount of precaution should be taken.