-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting A Collector
In command, run the following line:
ps -ef | grep python
You should see something that looks like the following:
root 6773 1 0 Jun11 ? 00:00:00 sudo python3 main.py <scraper name>.txt
The quickest way to see if your collection has tweets is to check Mongo.
-
Check Mongo by opening mongo in command.
mongo admin
-
Depending upon Mongo security you may need to log in.
db.auth('username','password')
-
Use the preferred project using and count the tweets
show dbs
use your_project_name
db.TW_cand.count()
NOTE: this last line is slightly different from the STACK command which is db.tweets.count()
- Take a note of the number that prints. If it is more than 0, it has collected, however this does not mean the script is currently scraping. Come back some time later that you can be assured there should be more tweets in the collection. Run through 1-3 again, and check to see if there are more tweets.
In command, go to the following location.
cd /home/bits/twitter-scraper-mongo
NOTE: this location might be slightly different on your server. Find what you named your outfile for your project, and look at the last several lines.
tail -100 your-out-file.txt
You should see 100 lines of the outfile. This might give you a list of things collected. If things are going without error, you will see something like the following:
COLLECTING FOR: GovBillWeld
2019-07-05 19:20:33.542752
Successfully authenticated with Twitter.
Collecting GovBillWeld's timeline
Collecting for GovBillWeld
TOTAL Tweets Collected: 439
Now inserting...
Insertion completed
This tells us that we have successfully collected 439 tweets for the handle attached to GovBillWeld on the given date time and they were successfully inserted into Mongo. Occasionally you may find errors here. Most often, these errors are affiliated with accounts being deleted, banned by Twitter, or moving to a secured account. In order to check for this, for this example, go to https://twitter.com/GovBillWeld and see if there is an error on the page. If not, then something else is wrong with the handle or the collector.
Also, the last thing in this output file will tell you the last time the log has been written to. If it seems that it has been a while since it was written to, run the following:
ls -lat
This will show all files and attributes (-la) in the current directory sorted (-t). It will give you the last time the file has been written to. If it hasn't been written to for within the past few days, there is likely a problem with the scraper. We suggest checking your twitter app credentials as a starting place.