Cornell Virtual Workshop > Python for Data Science > Data Access and Input

Accessing Data via API: Twitter

Many datasets are not available in static files, but instead through an Application Programming Interface (API) that delivers data from a source through programmatic function calls.

Accessing Twitter data through the Twitter API requires having a Twitter account which is authenticated to use the Twitter API. This is a straightforward process which we briefly walk through in the next steps, but it is not required that you go through these steps yourself if you do not want to set up a Twitter account. In later pages, we will provide some downloaded Twitter data that you can use in conjunction with the tutorial, without requiring an account for the API.

Get Authenticated For the Twitter API

The first step is to authenticate your account which you do by following the steps at the link below:

https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

Create a Twitter Application

You must create a Twitter application to store your authentication keys and tokens. Follow the steps at the link below:

https://developer.twitter.com/en/docs/basics/apps

Once you have created your application and have your authentication keys and tokens, you will need to install the tweepy package in Python. For information about installing packages into your Python environment using tools such as pip or conda, consult our page on Python Distributions. If you're using Anaconda Python, for example, you could use the conda package manager with a command like conda install -c conda-forge tweepy.

Using the Twitter Streaming API

Most users probably have already collected their Twitter data and simply want to analyze it, but we will walk through a simple example of how to collect streaming tweet data that is collected using a filter, such as containing a specific hashtag.

The code below is a simple example of how to collect tweets with hashtag #nerd, print the tweet data and save it to a file. You can either access the code through the accompanying github repository, copy and paste the code below, or download the Python script directly from the repository.


"""
Created on Sat Oct 27 08:59:27 2018

    @author: XSEDE
"""

import re
import datetime
from __future__ import absolute_import, print_function

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

# Go to https://apps.twitter.com and create an app.
# The consumer key and secret will be generated for you after
consumer_key="your_consumer_key"
consumer_secret="your_consumer_secret"

# After the step above, you will be redirected to your app's page.
# Create an access token under the the "Your access token" section
access_token="your_access_token"
access_token_secret="your_access_token_secret"

# Create variables to use for output file and tweet filter
hashtag = "climatechange"
date_time_temp = str(datetime.datetime.now())

# Replace all characters except letters and numbers with "_" for filename
current_date_time = re.sub('[^a-zA-Z0-9]','_', date_time_temp)
file_out = open(hashtag + "_" + current_date_time + ".json", 'a')

# Define the Stream Listener
class StdOutListener(StreamListener):
    """ A listener handles tweets that are received from the stream.
    This is a basic listener that just prints received tweets to stdout.

    """
    def on_data(self, data):
        print(data) # Print output to console
        file_out.write(data) # Write output to file
        return True

    def on_error(self, status):
        print(status)

# Run the main program and collect tweets with hashtag "climatechange"
if __name__ == '__main__':
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['climatechange'])
file_out.close()

Filled in with valid API credentials, the above script will save the Twitter data in a text file using the JSON (JavaScript Object Notation) format. For the remaining lessons we will be using a dataset collected using the Twitter Streaming API on the hashtag #climatechange. We will discuss this dataset in greater detail in subsequent lessons.

Back