Cornell Virtual Workshop > Introduction to Python Programming > Working with Modules

Exercise

In this exercise we make use of lists, loops, functions, file I/O, and classes. In each case we produce Python files, which you can run as:

$ python python_file.py

Or, if you want to include the shebang (#!/usr/bin/env python) at the top of your files and give them executable permission (chmod +x), you can run them like this:

$ ./python_file.py

Note that we have included certain comments for pedagogical and explanatory purposes. These are not comments that one might include in a production file. Additionally, for reasons of space we have not included docstrings: triple-quoted strings that explain what code elements such as functions do. They should always be present in production code.

Now, create the file io_dot_product.py using the source code below. You can click the "Copy" button that appears in the upper-right corner of the code block if you point your cursor inside the block, and then save the copied text to a file with the suggested name. After you have written and saved the file, you can run it using the instructions above.

import sys

# We can use a \ to break the prompt string over two lines
vector1string = input("Enter the components of the \
first vector separated by whitespace: ")
vector2string = input("Enter the components of the \
second vector separated by whitespace: ")

# Make lists which act as the vectors
vector1 = vector1string.split()
vector2 = vector2string.split()

# Check that vectors are of same length and set dimension
dimension = len(vector1)
if dimension != len(vector2):
    print("The two vectors must be the same length")
    #Stops execution with exit code of 1
    #which by convention is a failure code
    sys.exit(1)

dot_product = 0
# Loop through vectors, convert their components to floats
# and calculate dot product, else print
# explanatory error message and exit
for x in range(dimension):
    try:
        vector1[x] = float(vector1[x])
        vector2[x] = float(vector2[x])
        dot_product += vector1[x] * vector2[x]
    #If it can't be cast, we print an error message and exit
    except ValueError as e:
        print(e)
        print("exiting...")
        sys.exit(1)

print("Dot product is " + str(dot_product))

You should play around with different values, including vectors of unequal lengths and with non-numeric components.

In this short code, we have accomplished all of the following:

taken input from the keyboard,
made lists from it to act as our vectors,
checked they’re the same length,
converted the components to numbers, and
calculated and printed the dot product.

Python typically offers many ways to perform any given task, with an aesthetic preference for pursuing the most “pythonic” method. But it should be fairly clear what our script is doing. We are taking a try...except approach to evaluating whether the vector components are numbers: we try to convert each string component to a float and use the float to replace its string version, knowing that if the cast to float fails, a ValueError exception will be raised; in that case, we simply print the error message and exit the program. This is a pythonic way to check whether we have a string representation of a number, an example of the EAFP coding style.

Calculating a dot product is something which might be used by multiple parts of a program and is a natural choice for inclusion in a simple function. We might event want to write two functions, one to calculate the dot product of two vectors and another to calculate the modulus of a single vector. And we would want to include error-checking in these functions, returning an error to the calling code if validation fails, so that the calling code can handle the errors. This leads us to produce our next code version — copy the source code using the "Copy" button and store it in a file named vector2scalar.py.

import sys
from math import sqrt

def dot_product(vector1, vector2):
    dimension = len(vector1)
    if dimension != len(vector2):
        print ("The two vectors must be the same length")
        sys.exit(1)
    dot_product = 0
    for x in range(dimension):
        try:
            vector1[x] = float(vector1[x])
            vector2[x] = float(vector2[x])
            dot_product += vector1[x] * vector2[x]
        #If it can't be cast, print Exception message and exit
        except ValueError as e:
            print(e.message)
            print("Exiting...")
            sys.exit(1)
    return dot_product

def modulus(vector1):
    return sqrt(dot_product(vector1,vector1))

#Checking whether the file is being executed
#from the command line or whether functions
#are being called from other code
if __name__ == "__main__":
    vector1string = input("Enter the components of the \
first vector separated by whitespace: ")
    vector2string = input("Enter the components of the \
second vector separated by whitespace: ")
    vector1 = vector1string.split()
    vector2 = vector2string.split()
    #String concatenation can be split over lines
    print("The dot product of the vectors is "
          + str(dot_product(vector1, vector2)))
    print("The modulus of the first vector is "
          + str(modulus(vector1)))
    print("The modulus of the second vector is "
          + str(modulus(vector2)))

You should test that it behaves the same as the previous example. The code is more reusable now that the important activities are in functions. Also, the use of the if __name__ == "__main__" conditional helps make the code testable. The conditional will be True, and the test will be run, if the code is called from the command line—as with $ python vector2scalar.py. This is a good thing to do while one is in the process of writing the the code. But the conditional will be False, and the test will be skipped, if some method in the file has been called from other code via the import vector2scalar statement. This might be what we ultimately want to happen when our completed code becomes a component of a larger software product, although the if __name__ == "__main__" conditional also enables us to use this code in a standalone program.

Note that we have imported the function sqrt from the math module in order to calculate square roots. Because we import it by name and separately, we can reference it directly rather than with a qualifier, as we have to do with sys.exit. Alternatively, we could have used the exponentiation operator ** to compute a square root (e.g., as a**0.5). Using the sqrt function makes the code a bit more readable, but it appears that the exponentiation operator is somewhat faster than the sqrt function. If you're curious, you can explore the timing of different options (sqrt, a**0.5, numpy.sqrt, pow) using the timeit module or the %timeit magic function in ipython).

Our final exercise illustrates several more useful tricks, including managing input and output from files and the command line, designing a class and creating instances of it, and pickling and unpickling. Create a new directory for these files and include in it an empty file called __init__.py, which you can create this way:

touch __init__.py

This file won’t be strictly necessary for us at this point, but having it is good practice, as its presence means Python will consider other files in the directory to be modules, and thus importable. The next step is to produce a file which we’ll call vector.py:

import sys
from math import sqrt

class Vector(list):

    def get_modulus(self):
        square_sum = 0
        for element in self:
            square_sum += element**2
        return sqrt(square_sum)

    def __init__(self, iterable):
        list.__init__(self, iterable)
        try:
            for x in range(len(self)):
                self[x] = float(self[x])
        except ValueError as e:
            print("Failed to create vector")
            print(e.message)
            print("Exiting...")
            sys.exit(1)

This file defines a class called Vector (although it’s not necessary for it to have the same name as the containing file) as an inheritor of the list type. For the constructor, we call the standard list initializer, which acts on any iterable object, and then cast all the elements to float, returning a ValueError as before if the conversion fails. Any instance of our Vector class has all the functionality of a list, plus a vectorName.get_modulus method which performs the expected calculation on the instance.

Next, we create dot_product.py in the same folder using the code below:

import sys

def dot_product(vector1, vector2):
    dimension = len(vector1)
    if dimension != len(vector2):
        print ("the two vectors must be the same length!")
        sys.exit(1)
    dot_product = 0
    for x in range(dimension):
        dot_product += vector1[x] * vector2[x]
    return dot_product

Finally, we write the script file which will call the others — store it in a file named vector_test.py:

import vector
import dot_product
import sys
import pickle
import os

if __name__ == '__main__':
    inputFile = ""
    if len(sys.argv) > 1:
        #Any other command line input ignored
        inputFile = sys.argv[1]
    else:
        vectorAstring = input("enter the components of" +
            " the first vector separated by whitespace: ")
        vectorBstring = input("enter the components of" +
            " the second vector separated by whitespace: ")
        vectorA = vector.Vector(vectorAstring.split())
        vectorB = vector.Vector(vectorBstring.split())
        #We'll write them as a pickled tuple
        #to a file in our current working directory
        pickle_file = os.path.join(os.getcwd(),
                                   "my_pickle_file.pkl")
        f = open(pickle_file, 'wb')
        pickle.dump((vectorA, vectorB), f)
        f.close()
        inputFile = pickle_file
    f = open(inputFile, 'rb')
    vector1, vector2 = pickle.load(f)
    print("The dot product of these two vectors is: " +
           str(dot_product.dot_product(vector1,vector2)))
    print("The modulus of vector1 is: " +
           str(vector1.get_modulus()))
    print("The modulus of vector2 is: " +
           str(vector2.get_modulus()))

This script illustrates taking input from the command line, writing to and reading from files, pickling, and the use of user-defined classes. If a file location is specified at the command line, the script sets that file as the input source; otherwise, it queries the user for the components of 2 vectors, creates vector instances based on the keyboard input, and writes a file which becomes the input source. Either way, the file must contain a pickle of a tuple of type (vectorA,vectorB). The script then reads the pickled vector tuple from file, calculates the dot product of the pair, and writes out modulus for each. The first time through, skip the final command-line argument so you can enter components through the keyboard.

$ python vector_test.py

Then run the code again, but this time give the pickle file as a command-line argument. If you are running in the same directory as the python files, it would look like this:

$ python vector_test.py my_pickle_file.pkl

You should get the same results both times!

The Python standard library provides sophisticated functionality for parsing command line arguments, including both the argparse and getopt modules. Furthermore, sys.argv produces a simple list comprised of the name of the code file in position 0 and the command line arguments as the rest of the elements, so that the argument list is sys.argv[1:]. But any complex command-line output should be parsed with the more specialized modules mentioned agove.

The intent of this exercise has been pedagogical, to illustrate many common and useful things you can do in Python. But if you're more interested in manipulating vectors, and in computing their dot products and other properties, you are better off using NumPy rather than reinventing the wheel as we have done here.

Back