Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download file function for large files/private repos #1

Open
noamross opened this issue Feb 13, 2018 · 6 comments
Open

Download file function for large files/private repos #1

noamross opened this issue Feb 13, 2018 · 6 comments

Comments

@noamross
Copy link

noamross commented Feb 13, 2018

I recently wrote this function for downloading files from GH private repos. The GH API restricts downloads from the /contents endpoint to 1MB. One can get the file from the /blob endpoint, but that first requires querying the /commit endpoint to get the file SHA. The function wraps all this so you can just drop the URL of a file in a private repo and download the file using gh's auth infrastructure. Would you be interested in it as a PR? It needs a good home.

https://gist.github.com/noamross/73944d85cad545ae89efaa4d90b049db

@coatless
Copy link
Collaborator

@noamross Of course =) Feel free to PR away.

I'll likely tweak the gh(...) calls once I've implemented the methods required for this script.

@noamross
Copy link
Author

Great, I'll PR once I've gotten the vectorization TODOs finished.

@noamross
Copy link
Author

How do you feel about dependencies? I'd like to parse path input with either stringi or urltools, the combinations I need are something of a pain with the base regex tools. I got rid of the need for base64enc with a PR to gh to handle raw responses correctly.

@coatless
Copy link
Collaborator

Dependency wise:

  • If we could avoid stringi, I would be ecstatic.
    • I'm more than willing to address this myself this weekend.
  • urltools looks like there is enough overlap downstream that it shouldn't be too bad.

There is also a dependency in the script on purrr, which is more than fine as it is the easiest way I think to work with gh() output.

@noamross
Copy link
Author

Actually I don't need the purrr dependency, and it's not in the version I'll PR in.

I'll be busy this weekend but start the PR before then. My aim with the regex is to allow user to drop in any link which may or may not have

(https://)?something?github.com/user/repo/(usually raw or blob)/ref/path 

and extract the user, repo, ref and path. The plan is to have a gh_list_files(path, pattern, recursive) which will list files in the repo/ref directory, with recursive being an option, and keep the response include the files' SHAs and such as an attribute, and that will also be used internally by gh_download_files() which will download either into raw vector(s) or directly to disk (the latter functionality being in PR r-lib/gh#77).

@coatless
Copy link
Collaborator

Where do I sign? 😃

Let's aim to include the code with all dependencies and I'll refactor it later if they become problematic / we aim to go lightweight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants