Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unembargoing a simple Dandiset takes too long on staging #1943

Closed
jwodder opened this issue May 22, 2024 · 6 comments
Closed

Unembargoing a simple Dandiset takes too long on staging #1943

jwodder opened this issue May 22, 2024 · 6 comments
Labels
embargo Issues around embargo functionality

Comments

@jwodder
Copy link
Member

jwodder commented May 22, 2024

Trying to unembargo a Dandiset on staging — even one that just contains a couple of tiny files — consistently takes over five minutes, which is too long for my purposes (a backups2datalad test that needs to create an embargoed Dandiset and then unembargo it). In fact, I have yet to actually see an unembargoing procedure finish on staging; my longest-running unembargo is currently at 25 minutes and counting.

You can see this in action with the following script:

from datetime import datetime
import os
from pathlib import Path
import random
import sys
from tempfile import TemporaryDirectory
from time import sleep, time
from dandi.consts import EmbargoStatus, dandiset_metadata_file
from dandi.dandiapi import DandiAPIClient
from dandi.files import dandi_file

MAX_WAIT_SECONDS = 300  # 5 minutes

with DandiAPIClient.for_dandi_instance(
    "dandi-staging", token=os.environ["DANDI_API_KEY"]
) as client:
    d = client.create_dandiset(
        "Test embargoed Dandiset",
        {
            "schemaKey": "Dandiset",
            "name": "Test embargoed Dandiset",
            "description": "A test embargoed Dandiset",
            "contributor": [
                {
                    "schemaKey": "Person",
                    "name": "Wodder, John",
                    "roleName": ["dcite:Author", "dcite:ContactPerson"],
                }
            ],
            "license": ["spdx:CC0-1.0"],
        },
        embargo=True,
    )
    dandiset_id = d.identifier
    print("DANDISET ID:", dandiset_id)

    with TemporaryDirectory() as tmpdir:
        dspath = Path(tmpdir)
        (dspath / dandiset_metadata_file).write_text(f"identifier: '{dandiset_id}'\n")
        (dspath / "file.txt").write_text(
            "This is a brand new file that has never been uploaded before.\n"
            f"Date: {datetime.now()}\n"
            f"Random: {random.randrange(4294967296)}\n"
        )
        df = dandi_file(dspath / "file.txt", dandiset_path=dspath)
        print("Uploading file ...")
        df.upload(d, {"path": "file.txt", "description": "A file"})

    print("Unembargoing ...")
    client.post(f"{d.api_path}unembargo/")
    start = time()
    while time() - start < MAX_WAIT_SECONDS:
        d.refresh()
        if d.embargo_status is EmbargoStatus.OPEN:
            print("Done.")
            break
        sleep(2)
    else:
        sys.exit("Unembargoing is taking too long; not sticking around any more")
@jjnesbitt
Copy link
Member

In the wake of the embargo re-design, un-embargo is a manual process, as we're still trying to make sure everything works correctly before automating it. Since funding is paused at the moment, the un-embargo of these dandisets has been delayed. Once we're back in the full swing of things, one of our top priorities is to automate this process.

@yarikoptic
Copy link
Member

Let's talk more on this: is absence of automation due to a technical limitation of the design or just a safety measure for now? If a safety measure, I would argue that we better have a way (eg flag) to trigger fully automated unembargo process since it would allow us to better test it all. Eg this particular issue came up during testing and absence of automation prevents us to automate integration testing downstream.

@yarikoptic yarikoptic reopened this May 29, 2024
@jjnesbitt
Copy link
Member

is absence of automation due to a technical limitation of the design or just a safety measure for now?

It's a safety measure. Since we've changed the un-embargo process, we want to make sure we fully understand the necessary steps required, before attempting to automate it. If we were to rush to automate it, that could lead to data corruption, and we'd like to avoid that.

If a safety measure, I would argue that we better have a way (eg flag) to trigger fully automated unembargo process since it would allow us to better test it all. Eg this particular issue came up during testing and absence of automation prevents us to automate integration testing downstream.

As I stated in the above comment, we plan on implementing this once we're able.

@yarikoptic
Copy link
Member

yarikoptic commented Jun 7, 2024

Let's discuss during standup since I might be missing what "doing it manually" entails here. Meanwhile it blocks us implementing (since can't test) support of unembargoing (and some dandisets already were unembargoed AFAIK) in

NB updated per comment below

@jwodder
Copy link
Member Author

jwodder commented Jun 10, 2024

@yarikoptic Correction: dandi/backups2datalad#36 is the PR that's blocked.

@jjnesbitt
Copy link
Member

Closed via #1965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embargo Issues around embargo functionality
Projects
None yet
Development

No branches or pull requests

3 participants