ColumnDataSources

To provide data to plotting routines, Bokeh introduces a new datatype named ColumnDataSource (CDS). For many applications, a CDS is conceptually equivalent to a data table, or a DataFrame in Pandas, with equal-length data columns associated with column labels. In this sense, they are also mostly equivalent to a Python dictionary that stores lists or NumPy arrays of data, keyed by labels, with the caveat that all the data arrays need to be of the same length.

In the previous page, we made some simple plots by passing in either two lists of numbers or two NumPy arrays to various plotting methods. Internally, Bokeh converted these separate lists and arrays into CDS's with two columns, which it then used for plotting. And in fact, if you passed in two lists that had different lengths, as discussed on the previous page, the warning message that you would see would include the phrase ColumnDataSource's columns must be of the same length, even though the code itself in the example made no reference to a CDS. For simple datasets, it is often easier to pass in lists or arrays of data with explicitly constructing a CDS, but for more complicated data and applications, working with a CDS directly is preferable. We'll show a few examples of ColumnDataSources in action below.

In the code example above, we created an instance of a ColumnDataSource object named source from a Python dictionary containing the x and y data arrays, which we labeled by the keys 'x' and 'y', respectively. Then, when we call the circle and line methods for plotting, we passed the source as an argument, and specified the columns of source that we wanted to use for plotting by specifying their names 'x' and 'y'.

Once we've constructed a CDS, we can update the data stored in the source rather easily, by setting the appropriate fields in the source's data attribute, as shown below. This is especially useful if a dataset is being updated, for example, in a time series animation or in response to user inputs.

ColumnDataSources from Pandas DataFrames

While we created a CDS with data in a dictionary above, we can also do so using data in a Pandas DataFrame. DataFrames are very useful data structures for manipulating tabular data. In the code example below, a sample dataset is loaded into a DataFrame, which is subsequently used to populate a CDS. The sample data is the widely used automobile mpg dataset that lists information on a variety of different car models. Once the CDS is created, it is easy to generate plots of different columns against each other. If we plot the weight of each car model vs. the miles per gallon (mpg) for that model, we see an inverse relationship: lighter cars generally get better gas mileage.

The head (first 5 lines) of the mpg DataFrame
mpg cyl displ hp weight accel yr origin name mfr
0 18.0 8 307.0 130 3504 12.0 70 North America chevrolet chevelle malibu chevrolet
1 15.0 8 350.0 165 3693 11.5 70 North America buick skylark 320 buick
2 18.0 8 318.0 150 3436 11.0 70 North America plymouth satellite plymouth
3 16.0 8 304.0 150 3433 12.0 70 North America amc rebel sst amc
4 17.0 8 302.0 140 3449 10.5 70 North America ford torino ford
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement