You are probably already familiar with the Twitter social network, which is built on a messaging system that allows people to send and receive messages (and attached images) and other limited types of metadata. It has been used effectively as a near real-time communications tool in numerous global social events. Twitter also provides an API that allows anyone to collect large amounts of data and perform a wide range of analyses to better understand these networks. This strength can also be a weakness, introducing vulnerabilities to ‘bots’ or ‘state-backed’ accounts which are used to spread disinformation in critical ways.
The Twitter dataset we will be using was created using the Twitter Streaming API and hashtag '#climatechange'. This is a particularly interesting dataset since it was collected from November 20 to December 5, 2018 during which the U.S. government released its latest findings on climate change. We will learn more about the dataset in a subsequent section.
{"created_at":"Tue Nov 27 00:19:11 +0000 2018","id":1067211197412388864,"id_str":"1067211197412388864","text":"RT @DocsEnvAus: @susanprescott88 paediatrician representing the @TheRACP - #ClimateChange already affecting the physical and mental health\u2026","source":"\u003ca href=\"https:\/\/mobile.twitter.com\" rel=\"nofollow\"\u003eTwitter Lite\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2479088160,"id_str":"2479088160","name":"Marie Coleman","screen_name":"MarieCo92176893","location":"Canberra","url":"http:\/\/www.nfaw.org","description":"feminist, social policy analyst. All comments personal views","translator_type":"none","protected":false,"verified":false,"followers_count":1874,"friends_count":260,"listed_count":83,"favourites_count":103790,"statuses_count":101995,"created_at":"Tue May 06 01:55:58 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"40203A","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/794101726374600704\/vykECRS2_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/794101726374600704\/vykECRS2_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2479088160\/1478164006","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Mon Nov 26 23:03:25 +0000 2018","id":1067192127124164612,"id_str":"1067192127124164612","text":"@susanprescott88 paediatrician representing the @TheRACP - #ClimateChange already affecting the physical and mental\u2026 https:\/\/t.co\/AzaqkwAoj3","display_text_range":[0,140],"source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":1213283748,"in_reply_to_user_id_str":"1213283748","in_reply_to_screen_name":"susanprescott88","user":{"id":1542665564,"id_str":"1542665564","name":"DrsForTheEnvironment","screen_name":"DocsEnvAus","location":"Australia","url":"http:\/\/www.dea.org.au","description":"Non-profit organisation dedicated to improving the environment for human health","translator_type":"none","protected":false,"verified":false,"followers_count":3422,"friends_count":1734,"listed_count":145,"favourites_count":2705,"statuses_count":8596,"created_at":"Mon Jun 24 06:45:28 +0000 2013","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/838641283375616000\/wLo35xnp_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/838641283375616000\/wLo35xnp_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1542665564\/1503892819","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"@susanprescott88 paediatrician representing the @TheRACP - #ClimateChange already affecting the physical and mental health of children. The #health of our children should be our priority - we urgently need #climateaction #NoTimeForGames @ama_media @RNBreakfast @DrGCrisp https:\/\/t.co\/8RQwPCHt8m","display_text_range":[0,270],"entities":{"hashtags":[{"text":"ClimateChange","indices":[59,73]},{"text":"health","indices":[140,147]},{"text":"climateaction","indices":[206,220]},{"text":"NoTimeForGames","indices":[221,236]}],"urls":[],"user_mentions":[{"screen_name":"susanprescott88","name":"Susan Prescott MDPhD","id":1213283748,"id_str":"1213283748","indices":[0,16]},{"screen_name":"TheRACP","name":"The RACP","id":1117895137,"id_str":"1117895137","indices":[48,56]},{"screen_name":"ama_media","name":"AMA Media","id":59024550,"id_str":"59024550","indices":[237,247]},{"screen_name":"RNBreakfast","name":"RN Breakfast","id":20138772,"id_str":"20138772","indices":[248,260]},{"screen_name":"DrGCrisp","name":"George Crisp","id":157568648,"id_str":"157568648","indices":[261,270]}],"symbols":[],"media":[{"id":1067192113324904449,"id_str":"1067192113324904449","indices":[271,294],"media_url":"http:\/\/pbs.twimg.com\/media\/Ds9tkqXUUAEk6Rk.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/Ds9tkqXUUAEk6Rk.jpg","url":"https:\/\/t.co\/8RQwPCHt8m","display_url":"pic.twitter.com\/8RQwPCHt8m","expanded_url":"https:\/\/twitter.com\/DocsEnvAus\/status\/1067192127124164612\/photo\/1","type":"photo","sizes":{"thumb":{"w":150,"h":150,"resize":"crop"},"small":{"w":680,"h":383,"resize":"fit"},"medium":{"w":1200,"h":675,"resize":"fit"},"large":{"w":2048,"h":1152,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":1067192113324904449,"id_str":"1067192113324904449","indices":[271,294],"media_url":"http:\/\/pbs.twimg.com\/media\/Ds9tkqXUUAEk6Rk.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/Ds9tkqXUUAEk6Rk.jpg","url":"https:\/\/t.co\/8RQwPCHt8m","display_url":"pic.twitter.com\/8RQwPCHt8m","expanded_url":"https:\/\/twitter.com\/DocsEnvAus\/status\/1067192127124164612\/photo\/1","type":"photo","sizes":{"thumb":{"w":150,"h":150,"resize":"crop"},"small":{"w":680,"h":383,"resize":"fit"},"medium":{"w":1200,"h":675,"resize":"fit"},"large":{"w":2048,"h":1152,"resize":"fit"}}}]}},"quote_count":0,"reply_count":1,"retweet_count":5,"favorite_count":4,"entities":{"hashtags":[{"text":"ClimateChange","indices":[59,73]}],"urls":[{"url":"https:\/\/t.co\/AzaqkwAoj3","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/1067192127124164612","display_url":"twitter.com\/i\/web\/status\/1\u2026","indices":[117,140]}],"user_mentions":[{"screen_name":"susanprescott88","name":"Susan Prescott MDPhD","id":1213283748,"id_str":"1213283748","indices":[0,16]},{"screen_name":"TheRACP","name":"The RACP","id":1117895137,"id_str":"1117895137","indices":[48,56]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"ClimateChange","indices":[75,89]}],"urls":[],"user_mentions":[{"screen_name":"DocsEnvAus","name":"DrsForTheEnvironment","id":1542665564,"id_str":"1542665564","indices":[3,14]},{"screen_name":"susanprescott88","name":"Susan Prescott MDPhD","id":1213283748,"id_str":"1213283748","indices":[16,32]},{"screen_name":"TheRACP","name":"The RACP","id":1117895137,"id_str":"1117895137","indices":[64,72]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1543277951945"}
All tweets have this type of content in general, but the details may vary depending on things such as whether it is an original tweet or a retweet. Retweets include a "retweeted_status" parameter which contains all of the information about the original tweet which was retweeted. Other differences may be whether the tweet is geolocated with a latitude and longitude of the user's location, and so on.
We will not be using the Twitter Search API in these lessons but it is useful to be aware of its purpose. The main difference between using the Search API and the Streaming API is the Search API is typically for collecting past tweet data whereas the Streaming API is for current near real-time tweet data.
Twitter limits the Search API to roughly 100 tweets per 15 minute epoch. If you include a parameter to maximize the tweet count during a 15 minute epoch and that parameter exceeds the number of tweets during that 15 minute period, tweets which are duplicates of a previous 15 minute epoch will be included and must be filtered prior to analysis.
The Search API is often used to collect tweets within the past week or two and can be useful for filling in gaps of recent tweet activity. However, Twitter limits how far back in time you can search using a hashtag search query to roughly 1-2 weeks.
You can search a Twitter user's timeline and collect up to the most recent 3200 tweets regardless of how far back in time they occurred. You can find more online about the Twitter Search API.