Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign Airlock Export Process to Eliminate SAS URLs #4308

Open
TonyWildish-BH opened this issue Feb 4, 2025 · 13 comments
Open

Redesign Airlock Export Process to Eliminate SAS URLs #4308

TonyWildish-BH opened this issue Feb 4, 2025 · 13 comments
Labels

Comments

@TonyWildish-BH
Copy link
Contributor

The airlock process is effective, but cumbersome. In particular, the need to use SAS URLs inside the w/s VMs means we can't completely block pasting into the VMs, which is something we'd like to do by default - only allowing it on a per-case basis, but that's another ticket.

The need for a SAS URL inside the workspace could be eliminated if the process were redesigned. Once an import is approved, there's no reason to only make access to it ephemeral, it makes sense to have the file accessible for the lifetime of the project. The file can be pushed to the/a shared storage directly, so it's immediately accessible from all machines, eliminating the need for the storage explorer on import.

For exports, a staging storage space can be made available, and the user given access to it from within the workspace. The act of pushing a file there can be used to trigger creation of a draft export request, with the file being automatically moved to an inaccessible (to the user) storage which preserves the NONCE semantics.

This is related to #2402, about the need for access to the UI from within the workspace. However, this simplification of the airlock process is worth it on its own, regardless of that issue. It will greatly improve the user experience.

@marrobi
Copy link
Member

marrobi commented Feb 4, 2025

I'll ask others to comment, but believe the reason for dedicated storage per request was as we had a scenario where the PI wanted to do the import, and not share it with others in the workspace, hence it must not touch the shared storage.

We could have the option to automatically transfer the file. Maybe a destination for the request.

As for automating the creation, there is an amount of metadata needed alongside a request, how would that get provided?

@TonyWildish-BH
Copy link
Contributor Author

What metadata is required that isn't available in the workspace already? Presumably a function triggered by the file upload can access that metadata, something similar must be happening now, unless I've misunderstood the process?

@jonnyry
Copy link
Collaborator

jonnyry commented Feb 4, 2025

Throwing in my two pennies worth:

  • It would be a smoother experience if the researcher could upload files to / download files from the TRE portal itself to import and export from the Airlock. Realise there's a lot of work to make that happen.

  • Could we move to Entra auth on the airlock storage accounts and do away with the SAS key altogether? Is the time bound SAS token necessary?

  • Could we rationalise/reduce the number of storage account used the by airlock? I understand different network settings are needed on some accounts - for example the export needs to be publicly available. Though could some accounts be shared (e.g. blocked/rejected/in progress) using different containers to compartmentalise the data?

The number of accounts (particularly the workspace ones) seems to be a bit of a bind on scalability, and also ups the cost when Defender is part of the mix (although you can create exclusions).

Airlock storage accounts - core

Name Description
st + airlockp + <TRE_ID> Airlock Processor
st + alexapp + <TRE_ID> Airlock Export Approved
st + alimblocked + <TRE_ID> Airlock Import Blocked
st + alimex + <TRE_ID> Airlock Import External
st + alimip + <TRE_ID> Airlock Import In Progress
st + alimrej + <TRE_ID> Airlock Import Rejected

Airlock storage accounts - per workspace

Name Description
st + alexblocked + ws + <WS_ID> Airlock Export Blocked
st + alexint + ws + <WS_ID> Airlock Export Internal
st + alexip + ws + <WS_ID> Airlock Export In Progress
st + alexrej + ws + <WS_ID> Airlock Export Rejected
st + alimapp + ws + <WS_ID> Airlock Import Approved

@fortunkam
Copy link

It would be a smoother experience if the researcher could upload files to / download files from the TRE portal itself to import and export from the Airlock. Realise there's a lot of work to make that happen.

I have a branch I am working on which will enable this exact scenario (client side storage access using RBAC), will tag this issue once that PR is ready.

@marrobi
Copy link
Member

marrobi commented Feb 4, 2025

@fortunkam can you create an issue with details so we can assign it to you? thanks.

@TonyWildish-BH
Copy link
Contributor Author

It would be a smoother experience if the researcher could upload files to / download files from the TRE portal itself to import and export from the Airlock. Realise there's a lot of work to make that happen.

I have a branch I am working on which will enable this exact scenario (client side storage access using RBAC), will tag this issue once that PR is ready.

That sounds useful in some circumstances, but not all. We often have data sitting on VMs in Azure that needs to be imported, so pulling it locally and then uploading through the browser isn't a good solution in those cases.

Are there any limits on file size for upload via the UI? Now that the malware checking limit of 2 GB has been raised (I've tested up to 50 GB, just for fun), it would be good to know if the browser or UI impose their own limits.

@fortunkam
Copy link

I tested the client side file upload with files up to 2GB. Will give it a go with a larger file once I pulled the changes in.

@fortunkam
Copy link

See #4309

@TonyWildish-BH
Copy link
Contributor Author

thanks. Please don't remove the ability to upload via the CLI, we need that for cases where we don't have a browser, such as having the data on a Linux VM in the cloud. It's not always feasible or desirable to pull the data locally and then push it through the browser.

@marrobi
Copy link
Member

marrobi commented Feb 4, 2025

Thanks @fortunkam . Can any "upload via UI discussion, happen here - #4309 .

What are the other points to discuss here?

  • Restructuring airlock storage accounts to use fewer accounts
  • Auto creation of airlock requests from a file drop
  • SAS token vs RBAC for file upload?

Is that it? I think these are likely separate issues with separate implementation plans.

@marrobi
Copy link
Member

marrobi commented Feb 7, 2025

@TonyWildish-BH #4335 might be of interest?

@TonyWildish-BH
Copy link
Contributor Author

@TonyWildish-BH #4335 might be of interest?

Thanks @marrobi. That's an option for imports, yes, but I'd also like to make exports easier.

@marrobi
Copy link
Member

marrobi commented Feb 10, 2025

@TonyWildish-BH @jonnyry I'm going rename this to "Redesign Airlock Export Process to Eliminate SAS URLs"

If feel this misrepresents the issue, let me know. If there are other "airlock issues" please create new issues.

@marrobi marrobi changed the title Proposed change to airlock process Redesign Airlock Export Process to Eliminate SAS URLs Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants