Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-UTF-8 file name creation command fails with IOError #14

Open
jrwdunham opened this issue Dec 12, 2017 · 7 comments
Open

Non-UTF-8 file name creation command fails with IOError #14

jrwdunham opened this issue Dec 12, 2017 · 7 comments

Comments

@jrwdunham
Copy link
Contributor

The new createtransfers.py script fails when calling ./createtransfers.py create-variously-encoded-files with IOError: [Errno 84] Invalid or incomplete multibyte or wide character.

This failure happens on the following platforms:

  • Mac OS X 10.13.1 High Sierra with any version of python I tried
  • Debian 8.9 (jessie) with Python 2.7.13 (i.e., in the python:2.7 Docker container created by the SS Dockerfile)

This failure does not happen with:

  • Ubuntu 16.04 xenial and Python 2.7.12
@ross-spencer
Copy link
Contributor

@jrwdunham in your testing, did you come across this issue again? We can close this if not.

@jrwdunham
Copy link
Contributor Author

@ross-spencer Slightly different IOError when I run this on my Mac now:

$ ./createtransfers.py create-variously-encoded-files
Traceback (most recent call last):
  File "./createtransfers.py", line 115, in <module>
    COMMANDS[args.command]()
  File "./createtransfers.py", line 56, in create_variously_encoded_files
    with open(file_path_bytes, 'w') as fout:
IOError: [Errno 92] Illegal byte sequence: '/path/to/Artefactual/am/src/archivematica-sampledata/TestTransfers/files_with_various_encodings/windows_1252/s\xf8ster'

@ross-spencer
Copy link
Contributor

Talking to @jrwdunham, the issue resides in OSX so we'll update the README to let users know that the create-variously-encoded-files subcommand of createtransfers is not supported on mac os x.

@ross-spencer ross-spencer self-assigned this Mar 12, 2018
@jrwdunham
Copy link
Contributor Author

Debian 8.9 (jessie) with Python 2.7.13 (i.e., in the python:2.7 Docker container created by the SS Dockerfile)

I think when I said ^ I was confused. The issue is that our am.git Docker Compose configuration creates a volume on both the host and the container. The Max OS X host is probably also responsible for disallowing the strangely encoded files in this case too.

@mamedin
Copy link
Contributor

mamedin commented Aug 8, 2018

I was testing the archivematica CentOS deploy with Vagrant. I tested on linux and MacOS as host.

It works fine on linux but not on MacOS. I have compared environment settings, and I found that exporting LC_ALL as utf-8 fixes the issue.

On MacOS, after loging Vagrant VM with vagrant ssh, this env var is not defined and the make simple command fails:

[vagrant@localhost archivematica-sampledata]$ echo $LC_ALL

[vagrant@localhost archivematica-sampledata]$ make simple
./createtransfers/createtransfers.py create-variously-encoded-files
INFO      2018-08-08 19:14:55createtransfers.py:188  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/windows_1252/s�ster
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/shift_jis/�ۂ��Ղ郁�C��
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/big5/�s�{
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/emoji/hearts-❤💖💙💚💛💜💝.txt
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/emoji/chess-♕♖♗♘♙♚♛♜♝♞♟.txt
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/cp437/caf�
INFO      2018-08-08 19:14:55createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/cp437/a�o
./createtransfers/createtransfers.py create-deep-transfers
INFO      2018-08-08 19:14:55createtransfers.py:406  Creating default file to copy into deep transfer locations, /home/vagrant/archivematica-sampledata/TestTransfers/deep_transfer/README.md
INFO      2018-08-08 19:14:55createtransfers.py:451  Received 5 depth, 3 dirs, and 4 files. Outputting 363 folders, and 1452 files.
./createtransfers/createtransfers.py create-variously-encoded-dir-names
INFO      2018-08-08 19:14:55createtransfers.py:219  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings
Traceback (most recent call last):
  File "./createtransfers/createtransfers.py", line 594, in <module>
    main()
  File "./createtransfers/createtransfers.py", line 590, in main
    return commands[args.subcommand].cmd_name()
  File "./createtransfers/createtransfers.py", line 231, in create_variously_encoded_dir_names
    rm_dirs_and_create(encoding_dir_path)
  File "./createtransfers/createtransfers.py", line 110, in rm_dirs_and_create
    rm_dirs(dir_path)
  File "./createtransfers/createtransfers.py", line 120, in rm_dirs
    if os.path.exists(dir_path):
  File "/usr/lib64/python2.7/genericpath.py", line 18, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 95: ordinal not in range(128)
make: *** [simple] Error 1

Setting LC_ALL=C doesn't fix the issue:

[vagrant@localhost archivematica-sampledata]$ export LC_ALL='C'
[vagrant@localhost archivematica-sampledata]$ make simple
./createtransfers/createtransfers.py create-variously-encoded-files
INFO      2018-08-08 19:15:06createtransfers.py:188  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/windows_1252/s�ster
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/shift_jis/�ۂ��Ղ郁�C��
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/big5/�s�{
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/emoji/hearts-❤💖💙💚💛💜💝.txt
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/emoji/chess-♕♖♗♘♙♚♛♜♝♞♟.txt
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/cp437/caf�
INFO      2018-08-08 19:15:06createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/cp437/a�o
./createtransfers/createtransfers.py create-deep-transfers
INFO      2018-08-08 19:15:06createtransfers.py:406  Creating default file to copy into deep transfer locations, /home/vagrant/archivematica-sampledata/TestTransfers/deep_transfer/README.md
INFO      2018-08-08 19:15:06createtransfers.py:451  Received 5 depth, 3 dirs, and 4 files. Outputting 363 folders, and 1452 files.
./createtransfers/createtransfers.py create-variously-encoded-dir-names
INFO      2018-08-08 19:15:07createtransfers.py:219  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings
Traceback (most recent call last):
  File "./createtransfers/createtransfers.py", line 594, in <module>
    main()
  File "./createtransfers/createtransfers.py", line 590, in main
    return commands[args.subcommand].cmd_name()
  File "./createtransfers/createtransfers.py", line 231, in create_variously_encoded_dir_names
    rm_dirs_and_create(encoding_dir_path)
  File "./createtransfers/createtransfers.py", line 110, in rm_dirs_and_create
    rm_dirs(dir_path)
  File "./createtransfers/createtransfers.py", line 120, in rm_dirs
    if os.path.exists(dir_path):
  File "/usr/lib64/python2.7/genericpath.py", line 18, in exists
    os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 95: ordinal not in range(128)
make: *** [simple] Error 1

But setting LC_ALL='en_US.utf8' fixes the issue:

[vagrant@localhost archivematica-sampledata]$ export LC_ALL='en_US.utf8'
[vagrant@localhost archivematica-sampledata]$ make simple
./createtransfers/createtransfers.py create-variously-encoded-files
INFO      2018-08-08 19:15:12createtransfers.py:188  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/windows_1252/s�ster
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/shift_jis/�ۂ��Ղ郁�C��
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/big5/�s�{
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/emoji/hearts-❤💖💙💚💛💜💝.txt
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/emoji/chess-♕♖♗♘♙♚♛♜♝♞♟.txt
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/cp437/caf�
INFO      2018-08-08 19:15:12createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/files_with_various_encodings/cp437/a�o
./createtransfers/createtransfers.py create-deep-transfers
INFO      2018-08-08 19:15:12createtransfers.py:406  Creating default file to copy into deep transfer locations, /home/vagrant/archivematica-sampledata/TestTransfers/deep_transfer/README.md
INFO      2018-08-08 19:15:12createtransfers.py:451  Received 5 depth, 3 dirs, and 4 files. Outputting 363 folders, and 1452 files.
./createtransfers/createtransfers.py create-variously-encoded-dir-names
INFO      2018-08-08 19:15:13createtransfers.py:219  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/windows_1252/søster/cp1252_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/shift_jis/ぽっぷるメイル/shift-jis_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/big5/廣州/big5_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/emoji/hearts-❤💖💙💚💛💜💝/utf-8_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/emoji/chess-♕♖♗♘♙♚♛♜♝♞♟/utf-8_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/cp437/café/cp437_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/dirs_with_various_encodings/cp437/año/cp437_encoded_dirs.txt
./createtransfers/createtransfers.py create-deep-zip-packages
INFO      2018-08-08 19:15:13createtransfers.py:406  Creating default file to copy into deep transfer locations, /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/deep-transfer/deep_zip_transfer/README.md
INFO      2018-08-08 19:15:13createtransfers.py:451  Received 5 depth, 3 dirs, and 4 files. Outputting 363 folders, and 1452 files.
./createtransfers/createtransfers.py \
	create-zip-packages-with-var-encoded-fnames
INFO      2018-08-08 19:15:13createtransfers.py:188  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/windows_1252/s�ster
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/shift_jis/�ۂ��Ղ郁�C��
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/big5/�s�{
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/emoji/hearts-❤💖💙💚💛💜💝.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/emoji/chess-♕♖♗♘♙♚♛♜♝♞♟.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/cp437/caf�
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-files/cp437/a�o
./createtransfers/createtransfers.py \
	create-zip-packages-with-var-encoded-dirs
INFO      2018-08-08 19:15:13createtransfers.py:219  Transfer target path: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/windows_1252/søster/cp1252_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/shift_jis/ぽっぷるメイル/shift-jis_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/big5/廣州/big5_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/emoji/hearts-❤💖💙💚💛💜💝/utf-8_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/emoji/chess-♕♖♗♘♙♚♛♜♝♞♟/utf-8_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/cp437/café/cp437_encoded_dirs.txt
INFO      2018-08-08 19:15:13createtransfers.py:99   Created: /home/vagrant/archivematica-sampledata/TestTransfers/sample-zip-packages/variously-encoded-dirs/cp437/año/cp437_encoded_dirs.txt

@ross-spencer
Copy link
Contributor

I think this has impacted @jrwdunham on OSX too. What do you think the best approach is @mamedin? I am have two initial thoughts:

  • Check locale in Python and abort if not set?
  • Set the locale in Python and then set it back once the script has completed?

But maybe there are other better ways to look after this kind of issue?

@mamedin
Copy link
Contributor

mamedin commented Aug 8, 2018

I think we can export this variable in Makefile, for example:

[vagrant@localhost archivematica-sampledata]$ git diff Makefile
diff --git a/Makefile b/Makefile
index 400024c..db3521d 100644
--- a/Makefile
+++ b/Makefile
@@ -1,5 +1,7 @@
 .DEFAULT_GOAL := simple

+export LC_ALL=en_US.utf8
+
 simple:
        ./createtransfers/createtransfers.py create-variously-encoded-files
        ./createtransfers/createtransfers.py create-deep-transfers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants