Unembargoing a simple Dandiset takes too long on staging #1943

jwodder · 2024-05-22T13:51:40Z

Trying to unembargo a Dandiset on staging — even one that just contains a couple of tiny files — consistently takes over five minutes, which is too long for my purposes (a backups2datalad test that needs to create an embargoed Dandiset and then unembargo it). In fact, I have yet to actually see an unembargoing procedure finish on staging; my longest-running unembargo is currently at 25 minutes and counting.

You can see this in action with the following script:

from datetime import datetime
import os
from pathlib import Path
import random
import sys
from tempfile import TemporaryDirectory
from time import sleep, time
from dandi.consts import EmbargoStatus, dandiset_metadata_file
from dandi.dandiapi import DandiAPIClient
from dandi.files import dandi_file

MAX_WAIT_SECONDS = 300  # 5 minutes

with DandiAPIClient.for_dandi_instance(
    "dandi-staging", token=os.environ["DANDI_API_KEY"]
) as client:
    d = client.create_dandiset(
        "Test embargoed Dandiset",
        {
            "schemaKey": "Dandiset",
            "name": "Test embargoed Dandiset",
            "description": "A test embargoed Dandiset",
            "contributor": [
                {
                    "schemaKey": "Person",
                    "name": "Wodder, John",
                    "roleName": ["dcite:Author", "dcite:ContactPerson"],
                }
            ],
            "license": ["spdx:CC0-1.0"],
        },
        embargo=True,
    )
    dandiset_id = d.identifier
    print("DANDISET ID:", dandiset_id)

    with TemporaryDirectory() as tmpdir:
        dspath = Path(tmpdir)
        (dspath / dandiset_metadata_file).write_text(f"identifier: '{dandiset_id}'\n")
        (dspath / "file.txt").write_text(
            "This is a brand new file that has never been uploaded before.\n"
            f"Date: {datetime.now()}\n"
            f"Random: {random.randrange(4294967296)}\n"
        )
        df = dandi_file(dspath / "file.txt", dandiset_path=dspath)
        print("Uploading file ...")
        df.upload(d, {"path": "file.txt", "description": "A file"})

    print("Unembargoing ...")
    client.post(f"{d.api_path}unembargo/")
    start = time()
    while time() - start < MAX_WAIT_SECONDS:
        d.refresh()
        if d.embargo_status is EmbargoStatus.OPEN:
            print("Done.")
            break
        sleep(2)
    else:
        sys.exit("Unembargoing is taking too long; not sticking around any more")

The text was updated successfully, but these errors were encountered:

jjnesbitt · 2024-05-28T20:05:37Z

In the wake of the embargo re-design, un-embargo is a manual process, as we're still trying to make sure everything works correctly before automating it. Since funding is paused at the moment, the un-embargo of these dandisets has been delayed. Once we're back in the full swing of things, one of our top priorities is to automate this process.

yarikoptic · 2024-05-29T01:20:12Z

Let's talk more on this: is absence of automation due to a technical limitation of the design or just a safety measure for now? If a safety measure, I would argue that we better have a way (eg flag) to trigger fully automated unembargo process since it would allow us to better test it all. Eg this particular issue came up during testing and absence of automation prevents us to automate integration testing downstream.

jjnesbitt · 2024-05-29T02:40:49Z

is absence of automation due to a technical limitation of the design or just a safety measure for now?

It's a safety measure. Since we've changed the un-embargo process, we want to make sure we fully understand the necessary steps required, before attempting to automate it. If we were to rush to automate it, that could lead to data corruption, and we'd like to avoid that.

If a safety measure, I would argue that we better have a way (eg flag) to trigger fully automated unembargo process since it would allow us to better test it all. Eg this particular issue came up during testing and absence of automation prevents us to automate integration testing downstream.

As I stated in the above comment, we plan on implementing this once we're able.

yarikoptic · 2024-06-07T20:59:17Z

Let's discuss during standup since I might be missing what "doing it manually" entails here. Meanwhile it blocks us implementing (since can't test) support of unembargoing (and some dandisets already were unembargoed AFAIK) in

Use reregisterurl to move datalad remotes to web backups2datalad#36

NB updated per comment below

jwodder · 2024-06-10T13:32:00Z

@yarikoptic Correction: dandi/backups2datalad#36 is the PR that's blocked.

jjnesbitt · 2024-07-22T19:49:11Z

Closed via #1965

jwodder added the embargo Issues around embargo functionality label May 22, 2024

jwodder mentioned this issue May 22, 2024

Use reregisterurl to move datalad remotes to web dandi/backups2datalad#36

Merged

jjnesbitt closed this as completed May 28, 2024

yarikoptic reopened this May 29, 2024

jjnesbitt closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unembargoing a simple Dandiset takes too long on staging #1943

Unembargoing a simple Dandiset takes too long on staging #1943

jwodder commented May 22, 2024

jjnesbitt commented May 28, 2024

yarikoptic commented May 29, 2024

jjnesbitt commented May 29, 2024

yarikoptic commented Jun 7, 2024 •

edited

Loading

jwodder commented Jun 10, 2024

jjnesbitt commented Jul 22, 2024

Unembargoing a simple Dandiset takes too long on staging #1943

Unembargoing a simple Dandiset takes too long on staging #1943

Comments

jwodder commented May 22, 2024

jjnesbitt commented May 28, 2024

yarikoptic commented May 29, 2024

jjnesbitt commented May 29, 2024

yarikoptic commented Jun 7, 2024 • edited Loading

jwodder commented Jun 10, 2024

jjnesbitt commented Jul 22, 2024

yarikoptic commented Jun 7, 2024 •

edited

Loading