Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems to not be able to access/modify private repos with current github token? #19

Closed
yarikoptic opened this issue Jan 24, 2024 · 25 comments · Fixed by #21
Closed

Seems to not be able to access/modify private repos with current github token? #19

yarikoptic opened this issue Jan 24, 2024 · 25 comments · Fixed by #21
Assignees
Labels
bug:crash report Issue describing an undesirable failure of backups2datalad embargo Handling embargoed Dandisets & assets on the Archive

Comments

@yarikoptic
Copy link
Member

dandi@drogon:~$ flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'
2024-01-23T20:24:28-0500 [WARNING ] dandi: A newer version (0.59.0) of dandi/dandi-cli is available. You are using 0.55.1
2024-01-23T20:25:01-0500 [WARNING ] backups2datalad: Failed [rc=1]: git -c receive.autogc=0 -c gc.auto=0 config --file .datalad/config --get dandi.dandiset.embargo-status [cwd=/mnt/backup/dandi/dandisets/000248]

Output: <empty>
2024-01-23T20:26:27-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000224 in 0.998511 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000224'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:28-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000224 in 2.032623 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000224'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:31-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000224 in 3.899044 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000224'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:35-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000224 in 8.193906 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000224'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:43-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000224 in 15.389361 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000224'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:48-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000242 in 0.989735 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000242'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:48-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000240 in 0.950652 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000240'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:49-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000242 in 2.032527 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000242'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:49-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000240 in 1.979520 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000240'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:51-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000242 in 3.934316 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000242'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:51-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000240 in 4.149744 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000240'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
2024-01-23T20:26:55-0500 [WARNING ] backups2datalad: Retrying PATCH request to https://api.github.com/repos/dandisets/000242 in 8.383834 seconds as it raised HTTPStatusError: Client error '404 Not Found' for url 'https://api.github.com/repos/dandisets/000242'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
...

although we do have private https://github.com/dandisets/000224

@jwodder
Copy link
Member

jwodder commented Jan 24, 2024

@yarikoptic Correct. The token used by backups2datalad for editing GitHub repository details (retrieved from the hub.oauthtoken Git config key) belongs to you and only has the following permissions: public_repo, read:packages, repo:status, workflow.

What token is used by DataLad's create_sibling_github operation? That one belongs to dandibot and clearly has write permission to the dandisets org, so maybe hub.oauthtoken should be set to that token instead.

@yarikoptic
Copy link
Member Author

thank you for the analysis!!! for now just generated a new token with full repo permission for dandibot user

dandi@drogon:~$ curl -sS -f -I -H "Authorization: token $token" https://api.github.com | grep -i x-oauth-scopes
x-oauth-scopes: read:packages, repo, workflow
access-control-expose-headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset

let's see if resolves

@jwodder jwodder added the bug:crash report Issue describing an undesirable failure of backups2datalad label Jan 24, 2024
@yarikoptic
Copy link
Member Author

now we are to

dandi@drogon:~$ flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron'
2024-01-24T11:36:38-0500 [WARNING ] dandi: A newer version (0.59.0) of dandi/dandi-cli is available. You are using 0.55.1
2024-01-24T11:38:40-0500 [ERROR   ] backups2datalad: Dandiset 000344: README.md: download failed:   download failed: Unauthorized
addurl: 1 failed
2024-01-24T11:38:52-0500 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000344/draft>:
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/aioutil.py", line 174, in dowork
    outp = await func(inp)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 147, in update_dandiset
    changed = await self.sync_dataset(dandiset, ds, dmanager)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 199, in sync_dataset
    await syncer.sync_assets()
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/syncer.py", line 85, in sync_assets
    report.check()
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/asyncer.py", line 99, in check
    raise RuntimeError(
RuntimeError: Errors occurred while downloading: 1 asset failed to download
2024-01-24T11:39:55-0500 [ERROR   ] backups2datalad: Dandiset 000403: participants.tsv: download failed:   download failed: Unauthorized
2024-01-24T11:39:55-0500 [ERROR   ] backups2datalad: Dandiset 000403: samples.tsv: download failed:   download failed: Unauthorized

this one is about DANDI_API_KEY now I guess, but odd one -- are we using two different keys somehow or admin level api key gives us ability to list but not access assets?

@jwodder
Copy link
Member

jwodder commented Jan 24, 2024

@yarikoptic Text assets aren't stored in git-annex, so they have to be downloaded (via git-annex) to the repository rather than just having a key minted with associated URLs. However, the git-annex process doesn't have any credentials for downloading embargoed text assets (and I don't know how to provide them), hence the error.

@yarikoptic
Copy link
Member Author

gotcha... feels like we need to establish datalad provider to provide authorization header token for download of such assets -- will need to look into it later (busy atm).

@yarikoptic
Copy link
Member Author

do we download there from API or S3 URL?

@jwodder
Copy link
Member

jwodder commented Jan 24, 2024

@yarikoptic From an API URL:

await self.ensure_addurl()
url = blob.asset.base_download_url
blob.log.info(
"File is text; sending off for download from %s",
url,
)
await sender.send(ToDownload(blob=blob, url=url))

@yarikoptic
Copy link
Member Author

@yarikoptic
Copy link
Member Author

grr... I was certain I replied but response didn't post. We should proceed following way

as demonstrated in this script
#!/bin/bash

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eux

git init
git annex init

# generate config to ship along

mkdir -p .datalad/providers

cat >| .datalad/providers/dandi.cfg << EOF
[provider:dandi]
url_re = https?://api\.dandiarchive\.org/api/.*
authentication_type = http_token
credential = dandi

[credential:dandi]
type = token
EOF

git add .datalad/providers/dandi.cfg; git commit -m 'Added dandi provider config' 

git annex initremote datalad type=external externaltype=datalad encryption=none autoenable=true uuid=cf13d535-b47c-5df6-8590-0793cb08a90a

git annex addurl  --file dataset_description.json https://api.dandiarchive.org/api/dandisets/000403/versions/draft/assets/0cb481e5-b979-4eed-8ea4-9cd0d78276a3/download/

cat dataset_description.json

git annex whereis
  • whenever unembargoing (going public) I guess ideally we should now disable datalad special remote, but AFAIK git-annex would associate URLs with that remote and would not handle them as 'web' URLs... @joeyh - what is the best way to reassociate URL (well -- all of the ones for keys in the tree) with web instead of another custom remote?

@jwodder
Copy link
Member

jwodder commented Jan 26, 2024

@yarikoptic

  • How is the API key provided to git-annex? I don't see that in your script.
  • How should this setup be done for repositories already created for embargoed Dandisets? Should I just write & run a one-time script to initialize the remote for them, or should backups2datalad ensure that embargoed repositories have this set up each time the program is run?

@jwodder jwodder added the embargo Handling embargoed Dandisets & assets on the Archive label Jan 26, 2024
@yarikoptic
Copy link
Member Author

  • oh, right -- forgot to note that. On the first run, if unknown, it should be asked. An alternative is to provide it via env variable DATALAD_dandi_token
  • ideally I guess it should be idempotent and check/fix every time so that if some other fail happens we could just recover. I think the checks are quick enough -- just that the file exists, and then that remote exists.

@jwodder
Copy link
Member

jwodder commented Jan 26, 2024

@yarikoptic Which command uses the envvar? Is just running DATALAD_dandi_token=... git annex initremote datalad ... sufficient for the token to be saved and used in subsequent addurl commands, or does it have to be set on all addurl commands?

@yarikoptic
Copy link
Member Author

@yarikoptic Which command uses the envvar?

the one which needs to access our api server to download files -- annex addurl which would then in turn use git-annex-remote-datalad. I do not think that initremote would be using it or asking for it. You would need datalad download-url or git annex addurl on the asset url to get it asked for, then it should be saved indeed and could be reused in next sessions. But since we already have a file with secrets -- I would just better place that in that file so we centralized it a bit and clear that it is the same key as we use for talking to API. did that now -- so no further action is needed for the cron job to specify the key.

@jwodder
Copy link
Member

jwodder commented Jan 26, 2024

@yarikoptic But how should I supply the API key to git-annex when testing that this setup lets the program download embargoed text files?

@yarikoptic
Copy link
Member Author

I might be missing smth -- through the DATALAD_dandi_token variable -- just define within that testing invocation/session.

@jwodder
Copy link
Member

jwodder commented Jan 29, 2024

@yarikoptic I think I'm just going to pass the token to the addurl process so that the user doesn't have to specify it in two different envvars.

@jwodder
Copy link
Member

jwodder commented Jan 29, 2024

@yarikoptic How can I check whether the datalad remote has already been created/configured?

@jwodder
Copy link
Member

jwodder commented Jan 29, 2024

@yarikoptic How should the datalad remote be disabled when unembargoing?

@yarikoptic
Copy link
Member Author

@yarikoptic How should the datalad remote be disabled when unembargoing?

git remote remove datalad

@yarikoptic How can I check whether the datalad remote has already been created/configured?

  • if enabled already, then would be present among git remote listed ones and have desired UUID (we register it with hardcoded UUID 1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e). If present but with a different UUID -- must be smth else, error!
  • if configured, but not enabled (or already disabled/removed in git config), could be found via git annex info --json call to be listed among entries with that UUID, e.g.
> git annex info --json | jq '.["semitrusted repositories"]'
...
  {
    "description": "[datalad]",
    "here": false,
    "uuid": "1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e"
  },

(it is [datalad] not datalad) since it is still enabled in my case. If using datalad interfaces, there is a helper

❯ python -c "from datalad.support.annexrepo import AnnexRepo; r = AnnexRepo('.'); print(r.get_special_remotes())"
{'1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e': {'autoenable': 'true', 'encryption': 'none', 'externaltype': 'datalad', 'name': 'datalad', 'type': 'external', 'timestamp': '1464124519.012356s'}, '895b9a07-6613-4c8a-95ae-280d8119475c': {'autoenable': 'true', 'encryption': 'none', 'externaltype': 'datalad-archives', 'name': 'datalad-archives', 'type': 'external', 'timestamp': '1464124520.081473s'}}


@jwodder
Copy link
Member

jwodder commented Jan 29, 2024

@yarikoptic

if enabled already, then would be present among git remote listed ones and have desired UUID

git remote doesn't show UUIDs, does it? How can/should I get the UUID? Should I just use git annex info --json?

if configured, but not enabled (or already disabled/removed in git config)

Should I handle this case by enabling or re-adding the remote somehow?

@yarikoptic
Copy link
Member Author

@yarikoptic

if enabled already, then would be present among git remote listed ones and have desired UUID

git remote doesn't show UUIDs, does it? How can/should I get the UUID? Should I just use git annex info --json?

you can, or just read from .git/config since git-annex adds it there as well when it enables or senses that remote (or adds skip-annex if can't use it):

❯ grep -A2 'remote "datalad"' .git/config
[remote "datalad"]
	annex-externaltype = datalad
	annex-uuid = 1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e

❯ git config remote.datalad.annex-uuid
1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e

if configured, but not enabled (or already disabled/removed in git config)

Should I handle this case by enabling or re-adding the remote somehow?

❯ git remote remove datalad
❯ git annex enableremote datalad
enableremote datalad ok
(recording state in git...)

@jwodder
Copy link
Member

jwodder commented Jan 30, 2024

@yarikoptic So, to summarize:

  • Run ??? to determine the state of the remote
    • If it's enabled & configured with the correct ID, do nothing
    • If it exists (Is this equivalent to "configured"?) but has the wrong ID, error
    • If it's configured but not enabled: Run git remote remove datalad; git annex enableremote datalad
    • If it's absent: Run git annex initremote datalad type=external externaltype=datalad encryption=none autoenable=true uuid=cf13d535-b47c-5df6-8590-0793cb08a90a
    • Can it be enabled but not configured?
    • Any other states?

@yarikoptic
Copy link
Member Author

@jwodder please be more decisive and decide on what is needed to be done

  • Run git remote and if datalad is there, then
    • if git config remote.datalad.annex-uuid does not match cf13d535-b47c-5df6-8590-0793cb08a90a -- error
  • else: # no datalad remote enabled
    • check if 1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e among uuids of git annex info --json | jq -r '.["semitrusted repositories"]' and if so -- just git annex enableremote datalad; ir not -- git annex initremote ...

I think this should cover all the cases we should care about

@jwodder
Copy link
Member

jwodder commented Jan 30, 2024

@yarikoptic Your original script uses the UUID cf13d535-b47c-5df6-8590-0793cb08a90a for the datalad remote, but the following comments only use the UUID 1e41b9cf-4479-4ef1-8c66-e6cfbf861f2e. I'm going to assume that the latter UUID should be used for registering the remote.

@yarikoptic
Copy link
Member Author

sorry -- that was the paste error I guess -- it should be the cff13 one:

pwd
/home/yoh/proj/datalad/datalad-maint
❯ grep DATALAD_SPECIAL_REMOTE datalad/consts.py
DATALAD_SPECIAL_REMOTE = 'datalad'
DATALAD_SPECIAL_REMOTES_UUIDS = {
    DATALAD_SPECIAL_REMOTE: 'cf13d535-b47c-5df6-8590-0793cb08a90a',

although really what matters is the name datalad and that it is externalremote but I think matching by UUID is easier.

yarikoptic added a commit that referenced this issue Feb 2, 2024
Pass credentials to git-annex for downloading embargoed text assets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:crash report Issue describing an undesirable failure of backups2datalad embargo Handling embargoed Dandisets & assets on the Archive
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants