Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-ASCII charater filenames in EnrollWithRequestIdBean #175

Open
wants to merge 10,000 commits into
base: main
Choose a base branch
from

Conversation

SirTediousOfFoo
Copy link

We had issues with issuing certs containing Croatian symbols - žđšćč as mentioned in #174 so I've looked into the parts that handle it and found slightly differing logic in the downloadToken function between the EnrollWithRequestIdBean and EnrollMakeNewRequestBean.java where one uses a getFileName() function to handle the name generation and the other one does it inline.
The one using the function also encodes the filename to a Base64 string which bypasses the issues with non-ASCII characters.

Tanmoy Kundu and others added 30 commits July 6, 2022 11:49
Resolve ECA-10481 "Fb  configdump protocol configs"

Closes ECA-10481

See merge request ejbca/ejbca!402
ECA-10793: Fix transaction handling in AdminPreferenceSessionBean

Closes ECA-10793

See merge request ejbca/ejbca!405
…-alias' into 'main'

ECA-10443 Added DNS identifier challenge type selection to ACME

Closes ECA-10443

See merge request ejbca/ejbca!406
ECA-10623 Allow quotation marks in the CDP

Closes ECA-10623

See merge request ejbca/ejbca!380
ECA-10811: Modified the CertificateCrlReader so that status gets updated

Closes ECA-10811

See merge request ejbca/ejbca!385
ECA-10853: Handle failure to load PKCS#11 library, to avoid partial lock-out from GUI

Closes ECA-10853

See merge request ejbca/ejbca!408
…'main'

ECA-10801: Modified IKB UI page to not display crypto tokens without

Closes ECA-10801

See merge request ejbca/ejbca!394
Resolve ECA-10730 "Fb  acme mode md issues"

Closes ECA-10730

See merge request ejbca/ejbca!409
Conflicts:
	build.xml
	modules/build-properties.xml
	modules/build.xml
	modules/ejbca-rest-configdump/src/org/ejbca/ui/web/rest/api/resource/ConfigdumpRestResource.java
…oval-pages' into 'main'

Resolve ECA-10786 "Fb  extra fields in ra web end entity and approval pages"

Closes ECA-10786

See merge request ejbca/ejbca!412
…'fb-ECA-10843-wsracli-stress-param-no-tests'

ECA-10843: ctb EjbcaWsRaCli stress: Make available to specify the number of tests to be run

See merge request ejbca/ejbca!403
For easier element selection in DOM.
dcarella and others added 10 commits November 17, 2022 01:03
…ypo)

Noticket: English language files fixed (MS Auto-Enrollment: error of button)

Noticket: English language files fixed (OAuth Key Management: typo, and spaces, for l10n people)

Noticket: English language files fixed (ACME protocol: spaces, typo, format titles)

Noticket: Dummy zz language file fixed (remove Id, UNIX End of Line)

Noticket: English language files fixed (ACME protocol: minor typo)

Noticket: English language files fixed (duplicated message, minor fixes)

Noticket: English language files updated (CA Structure & CRLs: better title for 1st column)
…anslated)

Noticket: French language files updated (OAuth Key Management: fully translated)

Noticket: French language files updated (OAuth Key Management: fixes)

Noticket: French language files updated (ACME protocol: spaces)

Noticket: French language files updated (ACME protocol: fully translated)

Noticket: French language files updated (Azure CRL Publisher)

Noticket: French language files updated (Intune Certificate Revocation)

Noticket: French language files updated (C-ITS certificates)

Noticket: French language files updated (CA Structure & CRLs: 'Download' messages)
….10-2

L10n: English fixes for the main branch (based on 7.10.0.2)
…10-2

L10n: French updates for the main branch (based on 7.10.0.2) Fully translated
if(fileName == null){
fileName = "certificatetoken";
}
final String fileName = getFileName();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed filename handling logic outside of the downloadToken function

*
* @return the file name to use in the content disposition header, filename safe characters
*/
private String getFileName() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added finemane handling function like the one in EnrollMakeNewRequestBean

@primetomas
Copy link
Collaborator

I'm not too fond of base64 as filename, it's very user non-friendly. Isn't there an apache string function somewhere that makes strings "filename" friendly?

@primetomas
Copy link
Collaborator

Ah I see, you just copied the behavior that was already part of another piece in EJBCA....

@primetomas
Copy link
Collaborator

Base64 encoding as the last resort is definitely a good way. One potential improvement, that could preserve some utf-8 characters would be to base64 encode it if it doesn't pass:
java.nio.file.Paths.get(filename);
If I understand it correctly it would check the filename for validity without doing any IO or anything costly like that.

@SirTediousOfFoo
Copy link
Author

Uh yeah, It's unwieldy and I didn't really like it either but seeing as it was already there for a different enrollment option I went with it as the obvious solution.
Best solution I found with apache string stuff was just checking if it's ASCIIPrintable and then dropping the special characters but that also results in weirdness since then you get a file where Šandor Štefanović would get a cert named andor tefanovi.pem

I'll check out your suggestion and get back to you here, I'm looking at what else could be done

@dobicinaitis
Copy link
Contributor

A nice solution could be replacing UTF-8 characters with corresponding ASCII ones - https://stackoverflow.com/a/4122207/19848036

@SirTediousOfFoo
Copy link
Author

About the time you posted this I found https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#stripAccents-java.lang.String- which eluded me at first, I'll test it out and update the PR although I wonder if the solution you posted would cover a wider range of edge cases

@SirTediousOfFoo
Copy link
Author

Alright, I've switched up the filename logic so now it uses the Apache commons lang3 StringUtils which implements a stripAccents function that does exactly what I wanted to achieve here.

I tested out this change with the names Lars Űmlaöt d'Ăȯçěny Fųßßb¤łł and Stevo Štefanovčić which yielded a functional filename of Lars Umlaot dAoceny Fußßb¤ll.pem and Stevo Stefanovcic.pem respectively which make the Apache server happy enough.

If nothing else this'll remedy some issues for users with less-than-standard alphabets

@@ -621,7 +620,8 @@ private String getFileName() {
if (StringUtils.isAsciiPrintable(commonName)) {
return StringTools.stripFilename(commonName);
}
return Base64.encodeBase64String(commonName.getBytes());
return org.apache.commons.lang3.StringUtils.stripAccents(commonName);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StringUtils from org.apache.commons.lang doesn't have a stripAccents method so in order to keep everything else in the code as it was I just went with using the lang3 StringUtils like this

@SirTediousOfFoo SirTediousOfFoo changed the title Add base64 filename encoding to EnrollWithRequestIdBean Handle non-ASCII charater filenames in EnrollWithRequestIdBean Nov 25, 2022
@Realiserad
Copy link
Contributor

Realiserad commented Dec 11, 2022

Unfortunately, StringUtils.stripAccents does not guarantee that the output is ASCII printable, as demonstrated by the following unit test:

@Test
public void testTextNormalisation() {
    assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Test CA")));
    assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("malmö.se")));
    assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Га́рри Ки́мович Каспа́ров")));
    assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("Mǎ Yún")));
    assertTrue(StringUtils.isAsciiPrintable(StringUtils.stripAccents("马云")));
}

This will cause problems for anyone using Chinese or Russian characters in their CN. I don't know how common that is, but there are a couple of people in China using EJBCA.

The standards don't really say what to do, so non-ASCII filenames are a bit of a mess, but my understanding is that most (all?) modern web browsers supports UTF-8 character in the Content-Disposition header. The trick is to URL-encode the filename. This should be done in DownloadHelper.sendFile 🔗 link. For example:

final String encodedFilename = URLEncoder.encode(filename, StandardCharsets.UTF_8);
ec.setResponseHeader("Content-Disposition", "attachment; filename*=UTF-8''" + encodedFilename);

As a sidenote, I asked ChatGPT and it seems to agree, and provides some more details:

To create a Content-Disposition header with a filename containing a UTF-8 string in Java, you can use the URLEncoder.encode() method to encode the filename according to the rules specified in the RFC 6266. The resulting string can then be used as the value of the filename parameter in the Content-Disposition header. Here is an example:

import java.net.URLEncoder;                                         

String filename = "my file with åäö characters.pdf";                    
String encodedFilename = URLEncoder.encode(filename, "UTF-8");                                                                                String contentDisposition = "attachment; filename*=UTF-8''" +           
String contentDisposition = "attachment; filename*=UTF-8''" + encodedFilename;                                                  
System.out.println(contentDisposition);                                 

This will output the following Content-Disposition header:

┌──────────────────────────────────────────────────────────────────────────┐ 
│ attachment;                                                              │ 
│ filename*=UTF-8''my%20file%20with%20%C3%A5%C3%A4%C3%B6%20characters.pdf  │ 
└──────────────────────────────────────────────────────────────────────────┘ 

This should work in most major browsers. Note that the actual behavior may vary depending on the browser and its settings.

To keep backwards compatibility with older browsers, one could look at the UserAgent string and fall back to the ugly, but reliable base64-encoding if needed. There are still some people using EJBCA with Internet Explorer, probably because that's what their smartcard solution supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants