Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails when using relative paths - depends on platform? #72

Open
robintw opened this issue Apr 2, 2016 · 0 comments
Open

Fails when using relative paths - depends on platform? #72

robintw opened this issue Apr 2, 2016 · 0 comments

Comments

@robintw
Copy link

robintw commented Apr 2, 2016

I have a slightly strange problem with quickscrape.

I want to run something like this: quickscrape --urllist test_dois.txt --scraper ../journal-scrapers/scrapers/plos.json --output plos-test2

That is, I want to use relative paths for the URL list and the scraper file.

When running this on OS X it works fine, but when running on my Linux server I get an error saying that it can't find the urllist file.

Simplifying this a bit and looking just at the urllist file, if I run ./quickscrape.js --urllist test_dois.txt --scraper /mnt/cm-volume/content-mine/journal-scrapers/scrapers/plos.json --output plos-test2 I get:

info: quickscrape 0.4.7 launched with...
info: - URLs from file: undefined
info: - Scraper: /mnt/cm-volume/content-mine/journal-scrapers/scrapers/plos.json
info: - Rate limit: 3 per minute
info: - Log level: info

fs.js:427
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^
Error: ENOENT, no such file or directory 'test_dois.txt'
    at Object.fs.openSync (fs.js:427:18)
    at Object.fs.readFileSync (fs.js:284:15)
    at loadUrls (/mnt/cm-volume/content-mine/quickscrape/bin/quickscrape.js:154:17)
    at Object.<anonymous> (/mnt/cm-volume/content-mine/quickscrape/bin/quickscrape.js:164:41)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)

I have absolutely no idea why this is behaving differently on Linux to OS X.

Interesting, I seem to be able to fix this error by moving the process.chdir call further down the file - so that it is called only after the URL list has been loaded (see the diff at master...robintw:relative-paths). This seems to work on both Linux and OS X, and I'm happy to submit this as a PR if that would be useful.

I must say, I'm a bit confused by all of this though - and wondering whether I am being really stupid!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant