foca-google

foca-google searches and downloads documents from a specified domain using Google (e.g. PDFs, DOCs, XLS', RTFs, RARs etc — full list of extensions is listed on lines 11-36).

This is a partial alternative to the original FOCA since Google Search doesn't work there sometimes.

Example

If you want to find documents on example.com, run:

foca-google example.com

It will

Create a folder example.com
Download all found documents (PDFs, DOCs, XLS' etc) into example.com folder
Save URLs of all found documents into example.com/0_log.txt

Alternatively, run:

foca-google example.com <extension[e.g. pdf|docx|ppt etc]>

It will find and download only the files with the extension you specified.

Installation

You must have wget, pip and Google Chrome installed.

Download foca-google.py
mv foca-google.py /usr/local/bin/foca-google
Install Selenium via pip: sudo -H pip install selenium
Install Selenium Chrome Driver and also move the binary into /usr/local/bin/
Linux users, see Troubleshooting#1 (below)
Your foca-google is ready to go

Troubleshooting

Linux users: on line 60, change user-data-dir path to yours (where {0} is your user's home directory). You can find it at chrome://version => Profile Path (but remove "/Default" at the end of the path)
If the script crashes with the error "Chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.", kill all Chrome entities and run the script again.
Let me know if you see any other errors (create an issue on Github)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

foca-google

Example

Installation

Troubleshooting

Files

README.md

Latest commit

History

README.md

File metadata and controls

foca-google

Example

Installation

Troubleshooting