# Better Data: https://www.kaggle.com/datasets/bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows # Data: ### Russian Troll Tweets Great stuff in here for targets: https://github.com/fivethirtyeight/russian-troll-tweets/ Dictionary: | Header | Definition | | -------------------- | ----------------------------------------------------------------------------------------------------------------------------- | | `external_author_id` | An author account ID from Twitter | | `author` | The handle sending the tweet | | `content` | The text of the tweet | | `region` | A region classification, as [determined by Social Studio](https://help.salesforce.com/articleView?id=000199367&type=1) | | `language` | The language of the tweet | | `publish_date` | The date and time the tweet was sent | | `harvested_date` | The date and time the tweet was collected by Social Studio | | `following` | The number of accounts the handle was following at the time of the tweet | | `followers` | The number of followers the handle had at the time of the tweet | | `updates` | The number of “update actions” on the account that authored the tweet, including tweets, retweets and likes | | `post_type` | Indicates if the tweet was a retweet or a quote-tweet | | `account_type` | Specific account theme, as coded by Linvill and Warren | | `retweet` | A binary indicator of whether or not the tweet is a retweet | | `account_category` | General account theme, as coded by Linvill and Warren | | `new_june_2018` | A binary indicator of whether the handle was newly listed in June 2018 | | `alt_external_id` | Reconstruction of author account ID from Twitter, derived from `article_url` variable and the first list provided to Congress | | `tweet_id` | Unique id assigned by twitter to each status update, derived from `article_url` | | `article_url` | Link to original tweet. Now redirects to "Account Suspended" page | | `tco1_step1` | First redirect for the first http(s)://t.co/ link in a tweet, if it exists | | `tco2_step1` | First redirect for the second http(s)://t.co/ link in a tweet, if it exists | | `tco3_step1` | First redirect for the third http(s)://t.co/ link in a tweet, if it exists | ### now what? Precise: - Find intersting date range from target data - Archive everything https://github.com/twintproject/twint - Check target/archive duplicates lazy: - grab any other collection of tweets - check for duplicates