foca-google
searches and downloads documents from a specified domain using Google (e.g. PDFs, DOCs, XLS', RTFs, RARs etc — full list of extensions is listed on lines 11-36).
This is a partial alternative to the original FOCA since Google Search doesn't work there sometimes.
If you want to find documents on example.com
, run:
foca-google example.com
It will
- Create a folder
example.com
- Download all found documents (PDFs, DOCs, XLS' etc) into
example.com
folder - Save URLs of all found documents into
example.com/0_log.txt
Alternatively, run:
foca-google example.com <extension[e.g. pdf|docx|ppt etc]>
It will find and download only the files with the extension you specified.
You must have wget
, pip
and Google Chrome installed.
- Download foca-google.py
mv foca-google.py /usr/local/bin/foca-google
- Install Selenium via pip:
sudo -H pip install selenium
- Install Selenium Chrome Driver and also move the binary into
/usr/local/bin/
- Linux users, see Troubleshooting#1 (below)
- Your
foca-google
is ready to go
- Linux users: on line 60, change
user-data-dir
path to yours (where{0}
is your user's home directory). You can find it atchrome://version
=> Profile Path (but remove "/Default" at the end of the path) - If the script crashes with the error "Chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.", kill all Chrome entities and run the script again.
- Let me know if you see any other errors (create an issue on Github)