Skip to content

Troubleshooting A Collector

Alexander O. Smith edited this page Jul 6, 2019 · 1 revision

Auditing Running Collections

Check if the twitter scraper script is running

In command, run the following line:

ps -ef | grep python

You should see something that looks like the following:

root      6773     1  0 Jun11 ?        00:00:00 sudo python3 main.py <scraper name>.txt

The quickest way to see if your collection has tweets is to check Mongo.

Check MongoDB

  1. Check Mongo by opening mongo in command.

    mongo admin

  2. Depending upon Mongo security you may need to log in.

    db.auth('username','password')

  3. Use the preferred project using and count the tweets

    show dbs

    use your_project_name

    db.TW_cand.count()

NOTE: this last line is slightly different from the STACK command which is db.tweets.count()

  1. Take a note of the number that prints. If it is more than 0, it has collected, however this does not mean the script is currently scraping. Come back some time later that you can be assured there should be more tweets in the collection. Run through 1-3 again, and check to see if there are more tweets.

Checking Output File for Handle Specific Issues:

In command, go to the following location.

cd /home/bits/twitter-scraper-mongo

NOTE: this location might be slightly different on your server. Find what you named your outfile for your project, and look at the last several lines.

tail -100 your-out-file.txt

You should see 100 lines of the outfile. This might give you a list of things collected. If things are going without error, you will see something like the following:

COLLECTING FOR: GovBillWeld
2019-07-05 19:20:33.542752

Successfully authenticated with Twitter.
Collecting GovBillWeld's timeline
Collecting for GovBillWeld

TOTAL Tweets Collected: 439

Now inserting...
Insertion completed

This tells us that we have successfully collected 439 tweets for the handle attached to GovBillWeld on the given date time and they were successfully inserted into Mongo. Occasionally you may find errors here. Most often, these errors are affiliated with accounts being deleted, banned by Twitter, or moving to a secured account. In order to check for this, for this example, go to https://twitter.com/GovBillWeld and see if there is an error on the page. If not, then something else is wrong with the handle or the collector.

Also, the last thing in this output file will tell you the last time the log has been written to. If it seems that it has been a while since it was written to, run the following:

ls -lat

This will show all files and attributes (-la) in the current directory sorted (-t). It will give you the last time the file has been written to. If it hasn't been written to for within the past few days, there is likely a problem with the scraper. We suggest checking your twitter app credentials as a starting place.