Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for unicode in transfer names #500

Closed
wants to merge 4 commits into from

Conversation

Hwesta
Copy link
Contributor

@Hwesta Hwesta commented Sep 21, 2016

  • Upgrade slumber so it doesn't use deprecated simplejson library (Edit: This won't be needed with Remove 'slumber' dependency #550)
  • Encode filesystem paths as bytestrings
  • In JS, assume strings are unicode and can be utf8 encoded. Note that this probably removes support for non-unicode transfer names and paths, but needs further testing

refs #9234

Edit:

@Hwesta Hwesta added the Request: discussion The path towards resolving the issue is unclear and opinion is sought from other community members. label Sep 21, 2016
@@ -15,7 +15,7 @@ git+https://github.com/artefactual-labs/agentarchives.git#egg=agentarchives
git+https://github.com/artefactual-labs/mets-reader-writer#egg=metsrw
mysqlclient==1.3.7
# Required by storage-service component
slumber==0.6.0
slumber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pin the dependency to the specific version you were testing?

@sevein
Copy link
Member

sevein commented Sep 22, 2016

Holly, what's the difference between this base64.js and the base64-helpers submodule?

@Hwesta Hwesta added this to the 1.7.0 milestone Sep 29, 2016
When matching between the file path on disk and the currentlocation stored in
the database, ensure that both are bytestrings.  By default, paths coming
from the database are unicode.  This fixes errors in finding filenames with
unicode between when they're assigned a UUID and when filename sanitization
happens.
@Hwesta
Copy link
Contributor Author

Hwesta commented Nov 8, 2016

Added some related unicode work. This fixes having filenames with unicode in the name (something I thought worked already?). It was failing to match the path-on-disk with the path-in-the-db because of a unicode/str mismatch. Encode the unicode as utf8 for the match.

This will probably not work with non-unicode filenames.

@Hwesta
Copy link
Contributor Author

Hwesta commented Mar 30, 2017

We haven't run into this issue recently, even though this fix hasn't been merged. This may not be required.

@jhsimpson jhsimpson removed this from the 1.7.0 milestone May 24, 2017
@jhsimpson
Copy link
Member

I am closing this PR, we are not going to merge it now. There is an issue that has been opened on the acceptance-tests repo to define user stories for unicode transfer names, so we know what needs to be supported. artefactual-labs/archivematica-acceptance-tests#11

@jhsimpson jhsimpson closed this May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Request: discussion The path towards resolving the issue is unclear and opinion is sought from other community members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants