Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed to "conflict on lock" while updating zarr dandisets #42

Open
yarikoptic opened this issue Apr 8, 2024 · 2 comments
Open

Managed to "conflict on lock" while updating zarr dandisets #42

yarikoptic opened this issue Apr 8, 2024 · 2 comments
Labels
bug:crash report Issue describing an undesirable failure of backups2datalad zarrs Handling of Zarr assets

Comments

@yarikoptic
Copy link
Member

not yet sure what/why but during one of the recent updates of 000026 we got into failed state

fatal: Unable to create '/mnt/backup/dandi/dandizarrs/fa9baf63-da85-4623-b84c-551ff510574b/.git/index.lock': File exists.
fatal: Unable to create '/mnt/backup/dandi/dandizarrs/fa9baf63-da85-4623-b84c-551ff510574b/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
whereis: 17662 failed
2024-04-03T02:42:15-0400 [ERROR   ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'(scanning f...d files...)\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'(scanning f...d files...)\n')>
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
    reader.feed_data(data)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/streams.py", line 457, in feed_data
    assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
git-annex: waitToSetLock: interrupted (Interrupted system call)
whereis: 11749 failed
whereis: 17323 failed
whereis: 7469 failed
whereis: 12217 failed
2024-04-03T02:42:18-0400 [ERROR   ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'ok\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'ok\n')>
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
    reader.feed_data(data)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/streams.py", line 457, in feed_data
    assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
2024-04-03T02:42:18-0400 [ERROR   ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'(recording ... in git...)\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'(recording ... in git...)\n')>
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
    reader.feed_data(data)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/streams.py", line 457, in feed_data
    assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
git-annex: waitToSetLock: interrupted (Interrupted system call)
2024-04-03T02:42:41-0400 [ERROR   ] backups2datalad: Job failed on input <Dandiset 000026/draft>:
  + Exception Group Traceback (most recent call last):
  |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/aioutil.py", line 177, in dowork
  |     outp = await func(inp)
  |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 151, in update_dandiset
  |     changed = await self.sync_dataset(dandiset, ds, dmanager)
  |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 203, in sync_dataset
  |     await syncer.sync_assets()
  |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/syncer.py", line 77, in sync_assets
  |     report = await async_assets(
  |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/asyncer.py", line 499, in async_assets
  |     async with (
  |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 664, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/zarr.py", line 574, in sync_zarr
    |     await ds.push(to="github", jobs=manager.config.jobs, data="nothing")
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/adataset.py", line 297, in push
    |     await anyio.to_thread.run_sync(
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
    |     return await future
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 833, in run
    |     result = context.run(func, *args)
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/datalad/distribution/dataset.py", line 507, in apply_func
    |     return f(*args, **kwargs)
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/datalad/interface/base.py", line 773, in eval_func
    |     return return_func(*args, **kwargs)
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/datalad/interface/base.py", line 763, in return_func
    |     results = list(results)
    |   File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/datalad/interface/base.py", line 940, in _execute_command_
    |     raise IncompleteResultsError(
    | datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
    | [{'action': 'publish',
    |   'hints': None,
    |   'message': 'refs/heads/git-annex->github:refs/heads/git-annex [remote '
    |              'rejected] (push declined due to repository rule violations)',
    |   'operations': ['remote-rejected', 'error'],
    |   'path': '/mnt/backup/dandi/dandizarrs/a5908a61-437c-404e-87b6-7023b7c2a169',
    |   'refds': '/mnt/backup/dandi/dandizarrs/a5908a61-437c-404e-87b6-7023b7c2a169',
    |   'refspec': 'refs/heads/git-annex:refs/heads/git-annex',
    |   'status': 'error',
    |   'target': 'github',
    |   'type': 'dataset'}]
    +------------------------------------
2024-04-03T02:42:41-0400 [ERROR   ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'(recording ...d files...)\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'(recording ...d files...)\n')>
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
    reader.feed_data(data)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/streams.py", line 457, in feed_data
    assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
2024-04-03T02:42:41-0400 [ERROR   ] backups2datalad: An error occurred:
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 119, in wrapped
    await f(datasetter, *args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 228, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 94, in update_from_backup
    raise RuntimeError(
RuntimeError: Backups for 1 Dandiset failed
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2024.04.03.05.30.14Z.log
Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7fcc2bf03ac0>
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/base_subprocess.py", line 126, in __del__
    self.close()
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/base_subprocess.py", line 104, in close
    proto.pipe.close()
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/unix_events.py", line 546, in close
    self._close(None)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/unix_events.py", line 570, in _close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/base_events.py", line 745, in call_soon
    self._check_closed()
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/base_events.py", line 510, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
I double checked that that zarr is from 000026
❯ curl -X 'GET' --silent 'https://api.dandiarchive.org/api/zarr/fa9baf63-da85-4623-b84c-551ff510574b/' -H 'accept: application/json' | jq .
{
  "name": "sub-I61/ses-SPIM/micr/sub-I61_ses-SPIM_sample-BrocaAreaS07_stain-Somatostatin_SPIM.ome.zarr",
  "dandiset": "000026",
  "zarr_id": "fa9baf63-da85-4623-b84c-551ff510574b",
  "status": "Complete",
  "checksum": "50129984f801bfb4da1a366aeda40820-17746--25629330989",
  "file_count": 17746,
  "size": 25629330989
}

I now reset (git reset --hard; git clean -dfx)

dandi@drogon:/mnt/backup/dandi/dandizarrs/fa9baf63-da85-4623-b84c-551ff510574b$ git reset --hard ; git clean -dfx
Updating files: 100% (10241/10241), done.
HEAD is now at 1a537c8 Exclude .dandi/ from git-annex
Removing 0/
Removing 1/
Removing 2/
Removing 3/
Removing 4/

will check recent zarrs (Disabled cron) and will reset 000026 and redo cron...

@yarikoptic
Copy link
Member Author

result of check /reset of zarrs

dandi@drogon:/mnt/backup/dandi/dandizarrs$ cat tools/check-and-reset-recent-zarrs 
#!/bin/bash

set -eu

# find -maxdepth 1 -mtime -60 -type d -name '*-*-*' | parallel --jobs 5 --eta 'tools/quick_check_zarr.sh {}' | tee /tmp/check-zarrs.out
awk '/dirty/{print $1}' /tmp/check-zarrs.out | parallel --jobs 5 --eta 'git -C {} clean -dfx; git -C {} reset --hard; git -C {} gc --prune=now'
dandi@drogon:/mnt/backup/dandi/dandizarrs$ ls -l /tmp/ch^C
dandi@drogon:/mnt/backup/dandi/dandizarrs$ find -maxdepth 1 -mtime -60 -type d -name '*-*-*' | parallel --jobs 5 --eta 'tools/quick_check_zarr.sh {}' | tee /tmp/check-zarrs.out
parallel: Warning: Reading 0 arguments took longer than 10 seconds.
parallel: Warning: Consider removing --eta.

Computers / CPU cores / Max jobs to run
1:local / 4 / 5

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 26s Left: 40 AVG: 0.67s  local:5/54/100%/2.9s ./26f1a66a-d124-46ad-9380-6ba86f5f662c lacks 0
ETA: 25s Left: 39 AVG: 0.65s  local:5/55/100%/2.8s ./7a6ed1a4-9d00-45f0-ba27-cba83c198289 lacks 0
ETA: 29s Left: 37 AVG: 0.81s  local:5/57/100%/2.9s ./fd01fe56-9c26-4da3-9c78-e57e85d34033 lacks 0
ETA: 28s Left: 36 AVG: 0.79s  local:5/58/100%/2.8s ./5d1908d6-7522-419e-bfe8-295002d9bd08 lacks 0
ETA: 27s Left: 34 AVG: 0.80s  local:5/60/100%/2.8s ./2ff520ff-8b7e-4125-8792-e51d89387002 lacks 0
ETA: 26s Left: 33 AVG: 0.80s  local:5/61/100%/2.7s ./ef6b21c6-4617-453f-b1e2-d046df7b8c9a lacks 0
ETA: 9s Left: 14 AVG: 0.66s  local:5/80/100%/2.1s ./fa9baf63-da85-4623-b84c-551ff510574b lacks 0
ETA: 8s Left: 13 AVG: 0.67s  local:5/81/100%/2.1s ./7e1b3b36-a94a-427f-9793-b14e344a04f2 lacks 0
ETA: 7s Left: 12 AVG: 0.66s  local:5/82/100%/2.1s ./232c8023-ec10-44fa-882e-124aabf2d6da lacks 0
ETA: 2s Left: 4 AVG: 0.63s  local:4/90/100%/1.9s ./79e5c0a3-ff74-4f1f-ab17-a3b5ef635df1 is dirty
ETA: 1s Left: 3 AVG: 0.63s  local:3/91/100%/1.9s ./5b82f2df-85ce-4252-aad6-6e9cf85297e9 is dirty
ETA: 1s Left: 2 AVG: 0.62s  local:2/92/100%/1.9s ./1eb7ba2e-dc29-40ca-aa24-d9d22089e0bf is dirty
ETA: 0s Left: 1 AVG: 0.61s  local:1/93/100%/1.9s ./9ada9c31-a34e-4024-bd2c-c2aa975311af is dirty
ETA: 0s Left: 0 AVG: 0.61s  local:0/94/100%/1.9s 
dandi@drogon:/mnt/backup/dandi/dandizarrs$ bash tools/check-and-reset-recent-zarrs

Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 4 AVG: 0.00s  local:4/0/100%/0.0s HEAD is now at 0531417 Exclude .dandi/ from git-annex
ETA: 3s Left: 3 AVG: 1.00s  local:3/1/100%/4.0s HEAD is now at c2b5393 Exclude .dandi/ from git-annex
ETA: 1s Left: 2 AVG: 0.50s  local:2/2/100%/2.0s HEAD is now at 2abc44a Exclude .dandi/ from git-annex
ETA: 4s Left: 1 AVG: 4.67s  local:1/3/100%/5.7s Removing .dandi/zarr-checksum
HEAD is now at 4dbcba0 Exclude .dandi/ from git-annex
Updating files: 100% (17500/17500), done.
ETA: 0s Left: 0 AVG: 3.50s  local:0/4/100%/4.2s 

@yarikoptic
Copy link
Member Author

rerun seems to fail on pushing some zarrs

publish(ok): . (dataset) [refs/heads/draft->github:refs/heads/draft [new branch]]
publish(error): . (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [remote rejected] (push declined due to repository rule violations)]
action summary:
  publish (error: 1, ok: 1)
whereis: 16458 failed
2024-04-08T10:35:55-0400 [ERROR   ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'(scanning f...d files...)\n')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'(scanning f...d files...)\n')>
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/subprocess.py", line 72, in pipe_data_received
    reader.feed_data(data)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/asyncio/streams.py", line 457, in feed_data
    assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
publish(ok): . (dataset) [refs/heads/draft->github:refs/heads/draft [new branch]]
publish(error): . (dataset) [refs/heads/git-annex->github:refs/heads/git-annex [remote rejected] (push declined due to repository rule violations)]
action summary:
...
    | datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
    | [{'action': 'publish',
    |   'hints': None,
    |   'message': 'refs/heads/git-annex->github:refs/heads/git-annex [remote '
    |              'rejected] (push declined due to repository rule violations)',
    |   'operations': ['remote-rejected', 'error'],
    |   'path': '/mnt/backup/dandi/dandizarrs/7dcb4ef6-9502-41a4-a71f-c80422f7483c',
    |   'refds': '/mnt/backup/dandi/dandizarrs/7dcb4ef6-9502-41a4-a71f-c80422f7483c',
    |   'refspec': 'refs/heads/git-annex:refs/heads/git-annex',
    |   'status': 'error',
    |   'target': 'github',
    |   'type': 'dataset'}]
...
    | datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
    | [{'action': 'publish',
    |   'hints': None,
    |   'message': 'refs/heads/git-annex->github:refs/heads/git-annex [remote '
    |              'rejected] (push declined due to repository rule violations)',
    |   'operations': ['remote-rejected', 'error'],
    |   'path': '/mnt/backup/dandi/dandizarrs/97cfbd5a-8ec2-4829-a65f-89e7a3eb5bdd',
    |   'refds': '/mnt/backup/dandi/dandizarrs/97cfbd5a-8ec2-4829-a65f-89e7a3eb5bdd',
    |   'refspec': 'refs/heads/git-annex:refs/heads/git-annex',
    |   'status': 'error',
    |   'target': 'github',
    |   'type': 'dataset'}]
    +------------------------------------
2024-04-08T10:36:16-0400 [ERROR   ] backups2datalad: An error occurred:
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 119, in wrapped
    await f(datasetter, *args, **kwargs)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/__main__.py", line 228, in update_from_backup
    await datasetter.update_from_backup(dandisets, exclude=exclude)
  File "/home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages/backups2datalad/datasetter.py", line 94, in update_from_backup
    raise RuntimeError(
RuntimeError: Backups for 1 Dandiset failed
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2024.04.08.13.12.33Z.log

and 000026 remained dirty upon this

dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   dandiset.yaml

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   .dandi/assets.json

@jwodder jwodder added zarrs Handling of Zarr assets bug:crash report Issue describing an undesirable failure of backups2datalad labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:crash report Issue describing an undesirable failure of backups2datalad zarrs Handling of Zarr assets
Projects
None yet
Development

No branches or pull requests

2 participants