Fetching twitter conversations via DM’s

Fetching twitter DM messages can be quite tricky especially when the conversation goes on for more than 200 messages. What this post helps in doing is getting an entire conversation of a user (say X) with you on twitter in a csv. Note that, the conversation is in the form of direct messages and not tweets.

The most challenging part is to get more than 200 messages. Twitter API allows only to fetch the latest 200 messages. Latest 200 which are received by you and latest 200 which are sent by you. So irrespective of how many times you call the API it would always return the same set of 200 messages. What can be done if we need to extract more than 200 messages? (The trick follows 😉 )

Well, the first thing is to have a twitter account and register your app on https://apps.twitter.com/ . We had already covered a basic tutorial on the same before. I have used the python Tweepy package which is a python wrapper to all the API calls to twitter.

api.direct_messages() and api.sent_direct_messages() are the two functions which will return the messages sent to you and sent by you respectively. As we are trying to fetch a conversation only with a particular user (User X) after getting the messages from the above functions we filter them based on the username we are looking for. So to get the next set of messages we need to delete these messages. (this is the trick ) Irrespective if it belongs to user X or not we should delete them. Its only when we delete these messages we would get the new latest messages.

This delete, deletes the messages permanently from your account and there is no way you can get them back so please be cautious while you are deleting them. Also, this is obvious but just to clarify although we are filtering messages only for user X but the other messages would also be deleted if they fall in the range.

Sharing a snippet of a code and I am not responsible for any of the goof ups in your account. 😀 😉

import sys
import tweepy
import pandas as pd
import pdb
import random


def setupTwitterAutorization() :

    api_key = "api_key"
    api_secret = "api_secret"
    access_token = "access_token"
    access_token_secret = "access_token_secret"

    auth = tweepy.OAuthHandler(api_key, api_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    return api

#Extract Direct messages which are recieved and sent

def GetDirectMessagesInOrder (username) :

    receivedMsgs = api.direct_messages(count = 200)
    sentMsgs = api.sent_direct_messages(count = 200)
    if len(receivedMsgs) == 0 and len(sentMsgs) == 0 :
	return 0
  #Now extract the convesation only between the two users which we 
  #are interested in
    receiveText = []; recieveTime = []; saidBy = [];tweetId = [];

    for eachRDM in receivedMsgs :
        if eachRDM.sender_screen_name == username :
            receiveText.append(eachRDM.text)
            recieveTime.append(eachRDM.created_at)
            tweetId.append(eachRDM.id)
            saidBy.append(username)
        else :
            api.destroy_direct_message(eachRDM.id)

    for eachSDM in sentMsgs :
        if eachSDM.recipient_screen_name == username:
            receiveText.append(eachSDM.text)
            recieveTime.append(eachSDM.created_at)
            tweetId.append(eachSDM.id)
            saidBy.append('You')
        else :
            api.destroy_direct_message(eachSDM.id)

   abc = pd.DataFrame({'Tweet Id' : tweetId,'Timestamp' : recieveTime, 'Text' : receiveText, 'Said By' : saidBy})
    #Sort by timestamp so you could get an exact conversation
    if(abc.shape[0] > 0) :
        abc.sort_values('Timestamp', inplace = True)
     return abc

def DeleteRetrievedMessages(tweetIDs) :
    try :
        for id in tweetIDs :
            api.destroy_direct_message(id)
    except :
        pass

if __name__ == "__main__":
    completeList = []
    dataLeft = True
    #Setting up twitter authorization
    api = setupTwitterAutorization()
    while(dataLeft) :

        # Get one batch of msgs
        newDF = GetDirectMessagesInOrder(‘twitter_handle’)
	  if newDF == 0 :
	     dataLeft = False
        if (newDF.shape[0] > 0):
            completeList.append(newDF)
            # Delete those msgs
            DeleteRetrievedMessages(newDF['Tweet Id'])
        
    df = pd.concat(completeList)
    df.sort_values('Timestamp', inplace=True)
    df.to_csv("F:\Final.csv", encoding='utf-8')

The csv which is written is sorted by time so when you read it, it feels as if you are reading the conversation in the same flow. You can modify the script to get the answer based on your requirement.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s