Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Non-default processing config is not copied over to the transfer directory for zipped (based) transfers e.g. zipped bag #771

Closed
helenst opened this issue Jun 27, 2019 · 13 comments
Labels
Wellcome Wellcome Trust
Milestone

Comments

@helenst
Copy link

helenst commented Jun 27, 2019

Expected behaviour
When a transfer is initiated using a non-default processing config, (either using `window.processing_config = 'my_config' or initiating via the API) the setup of that configuration should be used to select options at decision points in the workflow.

Current behaviour
When a non-default processing config is selected, the workflow still appears to use the default one. I think this is because, whereas MCP Server copies the correct config into the working directory to start with (package._copy_processing_config) there is an early step in the workflow which overwrites it with defaultProcessingMCP.xml.

The filename that is copied over in the workflow is hardcoded in workflow.json currently. This could be fixed using a substitution variable (i.e. %processingConfig%processingMCP.xml) which could be supplied with the correct config value.

Steps to reproduce
Run a transfer using the builtin automated config (or create a custom one). The decisions in the workflow will still correspond to the default workflow. (e.g. with automated, questions will still be asked)

Your environment (version of Archivematica, OS version, etc)
qa/1.x branch

Additional (RS: 2010-11-06)

You can use the current sample-data layout in Docker to test this by:

  • Creating a brand new custom processing configuration.
  • Call it wellcome.
  • Leave all decision points in the configuration unresolved.
  • Using httpie run the following command:
http --pretty=format \
    POST 127.0.0.1:62080/api/v2beta/package \
    "Authorization: ApiKey test:test" \
    processing_config=wellcome \
    path=$(echo -n '/home/archivematica/archivematica-sampledata/SampleTransfers/ZippedBag.zip' | base64 -w 0) \
    name=v2_beta_endpoint_example \
    type="zipped bag"

For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@helenst
Copy link
Author

helenst commented Jun 27, 2019

Having chatted with @ross-spencer and looked into this some more, it seems the problem is not where I thought it was. Since the copy command mentioned above uses the -n flag (no clobber) it is not in fact overwriting anything.

It seems this only happens with zipfile based transfers (bagged and otherwise). For non zipped transfers, the temporary directory into which the the correct processing config was copied, is copied and becomes the transfer directory.

However for zipped transfers, the zipfile is extracted and the extract directory is used as the transfer directory - so there's no processing config in there. So when we get to the "copy default processing" step in the workflow, there is no processing config in the transfer and the default gets used.

I'm thinking the zip extraction step needs to ensure the processing config gets copied across.

@ross-spencer ross-spencer changed the title Non-default processing config is overwritten during workflow Problem: Non-default processing config is not copied over to the transfer directory for zipped (based) transfers e.g. zipped bag Jun 28, 2019
@ross-spencer ross-spencer added the Wellcome Wellcome Trust label Jun 28, 2019
@helenst
Copy link
Author

helenst commented Jun 28, 2019

I'm a little stuck on how to do this. I found _move_to_internal_shared_dir which moves the zipfile from the tmp directory to the processing dir, so we could move the processing config alongside it and later copy at the extraction stage but it would have to be uniquely named and tracked since it's in the shared currentlyProcessing directory at this point and there may be multiple transfers in progress at once. I don't know how easy it would be to hang onto that filename.

We could try pulling it out of the tmp directory at the extraction stage but that seems maybe a little messy (does MCP Client otherwise deal with the tmp directory?) and I'm not sure about the lifetime of the tmp directory. Another possibility would be to have MCP Server push the config file into the zip, but that doesn't seem ideal either.

Taking a step back to have a little think about this - thoughts welcome :)

@helenst
Copy link
Author

helenst commented Jun 28, 2019

I think it might work if I pre-create the directory into which the file will be unzipped during _move_to_internal_shared_dir and copy the xml config in at that stage. Then when 7z extracts it later, it'll do so into the directory that's already there, alongside the xml config. Just trying it out now. It'll mean duplicate code in generating that directory name between mcp client and mcp server but perhaps it can go into common.

@helenst
Copy link
Author

helenst commented Jul 2, 2019

I think this involves:

  • changes to move_to_internal_shared_dir in package.py to copy the xml file alongside the zipfile
  • change to restructure_for_compliance to ensure the xml file is preserved during restructuring
  • anything else?

(edit: Just realised the second of those two is not needed :)

@ross-spencer
Copy link
Contributor

This is good analysis @helenst, I see move_to_internal_shared_dir is called by _start_package_transfer and _start_package_transfer_with_auto_approval. it seems neater to focus a change on a single function, and its not clear what extra responsibility those two other functions should take on for different transfer types if any. Though they do deal in part with copying the processing configuration. I don't think I've any additional analysis I can help with at the moment, but very happy to CR when there's something in place. If you think it a good idea, we might also consider extra logging to help folks debug whether custom configs are being used/have been used, but that might be difficult as we do normalize the name in all cases, so maybe not...

@sromkey
Copy link
Contributor

sromkey commented Nov 25, 2019

@replaceafill could you get a size on this please?

@replaceafill
Copy link
Member

@sromkey This one has a PR from @helenst. I just commented after testing that branch.

@sromkey sromkey self-assigned this Dec 4, 2019
@replaceafill replaceafill removed their assignment Dec 4, 2019
@sromkey sromkey added this to the 1.11.0 milestone Jan 20, 2020
@sromkey sromkey assigned ross-spencer and unassigned sromkey Jan 20, 2020
@sromkey sromkey added Status: in progress Issue that is currently being worked on. and removed Status: refining The issue needs additional details to ensure that requirements are clear. labels Jan 20, 2020
@ross-spencer
Copy link
Contributor

ross-spencer commented Jan 21, 2020

Okay, so we lost track of this during the 1.11 milestone work (we'd forgotten to add it but had discussed adding most external PRs at the outset of the release). Looking at the history here, I worked with Helen to get this reviewed and into Archivematica over Christmas as a bug-fix for the Zipped Bag transfer type but it will eventually help support the zipfile type. Sarah R along with the Backlog Team added the milestone yesterday so it's now recorded in our burn-down.

I'll place this in review if that seems sensible? as the primary PR is merged. The sample-data PR is optional and submitted by me, but I'd like to chat with @sallain about adding it with an AMAUAT PR to improve our layout for testing.

For testing, I might suggest this be tested against:

@ross-spencer ross-spencer added Status: review The issue's code has been merged and is ready for testing/review. and removed Status: in progress Issue that is currently being worked on. labels Jan 21, 2020
@sallain
Copy link
Member

sallain commented Feb 14, 2020

I can confirm that the two scenarios in @ross-spencer's last comment now work:

  • Processed a zipped bag containing a custom processing config as a Zipped Bag transfer - workflow respects the custom processing config in the bag
  • Processed a zipped directory containing a custom processing config as a Zipped Directory transfer - workflow respects the custom processing config in the directory

It would be good to also test in a way more in-line with how @helenst was doing it - e.g. as an API call.

Tested on qa/1.x on Bionic.

@ross-spencer
Copy link
Contributor

ross-spencer commented Feb 14, 2020

Here's an example for you @sallain I have these at the ready today for the other pieces:

http --pretty=format \
 POST 127.0.0.1:62080/api/v2beta/package \
 "Authorization: ApiKey test:test" type="zipfile" \
 processing_config=default path=$(echo -n '/home/archivematica/archivematica-sampledata/SampleTransfers/ZippedDirectoryTransfers/DemoTransferCSV.zip' | base64 -w 0) \
 name=v2_beta_endpoint_example

This does rely on the new sample data but you can see how the path is setup relative to archivematica-sampledata so you can see how to configure it for your own samples.

You can do this remotely of course, so from your own Linux box you might need: apt-get install httpie. And then just swap-out the references to 127.0.0.1:62080/... for your am111bionic url.

Note there's a PR to discuss Tuesday for the type=zipfile line in that call above that's here: artefactual/archivematica#1569

Also, just remove processing_config=automated \ if you want to use the sample with custom processing.

@sallain
Copy link
Member

sallain commented Feb 27, 2020

I've initiated some tests using the API and can confirm that this works - I can define a custom config and apply it to a zipped bag transfer.

I was unable to try a zipped directory transfer as there seems to be a bug to prevent users from defining type="zipfile". I presume this will be addressed by the issue @ross-spencer linked, #1569.

@sallain
Copy link
Member

sallain commented Feb 27, 2020

@replaceafill maybe you can have a look at the problem with zipped directory transfers noted above?

@sallain
Copy link
Member

sallain commented Feb 27, 2020

Arrrghhhhhh I had a typo in my sampledata path - I can, in fact, start a zipped directory transfer by doing the following:

http --pretty=format \
               POST 127.0.0.1:62080/api/v2beta/package \
               "Authorization: ApiKey test:test" \
               type="zipfile" \
               processing_config=test \
               path=(echo -n '/home/archivematica/archivematica-sampledata/SampleTransfers/ZippedDirectoryTransfers/DemoTransferCSV.zip' | base64 -w 0) \
               name=v2_beta_endpoint_example

It follows my custom processing config.

So... that's good! This is fixed!

@sallain sallain closed this as completed Feb 27, 2020
@sallain sallain removed the Status: review The issue's code has been merged and is ready for testing/review. label Feb 27, 2020
sallain added a commit to archivematica/testing-checklists that referenced this issue Mar 1, 2020
I've added tests for the following:

* Transfer name with diacritics archivematica/Issues#1051
* Transfer has filenames with diacritics and a metadata.csv archivematica/Issues#1073
* PREMIS event import archivematica/Issues#710
* identifiers.json import archivematica/Issues#963
* Directory-level AIP metadata is indexed archivematica/Issues#888
* Directories not greyed out if there are still files that can be
arranged archivematica/Issues#822
* Reingest encrypted AIP archivematica/Issues#820
* Zipped bag containing custom processing config archivematica/Issues#771
* Download AIP METS file archivematica/Issues#644
* Descriptive metadata added through GUI is indexed for searching archivematica/Issues#547
* Processing storage usage screen update archivematica/Issues#312
* User with accented characters archivematica/Issues#261
* Change filenames archivematica/Issues#230
* Start SIP using tags archivematica/Issues#1052
* Zipped directory transfers archivematica/Issues#682
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Wellcome Wellcome Trust
Projects
None yet
Development

No branches or pull requests

5 participants