Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing slash in links not preserved #94

Closed
chrysn opened this issue Oct 3, 2021 · 4 comments
Closed

Trailing slash in links not preserved #94

chrysn opened this issue Oct 3, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@chrysn
Copy link
Contributor

chrysn commented Oct 3, 2021

Even when the server puts a trailing slash at a directory in the link (which eg. nginx and Python3's http.server do), httpdirfs still goes for the slash-less version.

Paraphrasing a wireshark capture:

GET /tests/
200 OK, text/html, <a href="foo/" title="foo">foo/</a>
GET /tests/foo
301 Moved Permanently, Location: http://.../test/foo/
GET /tests/foo/
200 OK, ...

This has three downsides:

  • It's a needless roundtrip, contributing a lot to usage patterns where the first slow step is when the client gets a directory tree.
  • It depends on a behavior of the server that is not specified but just convention: While most servers do this, none is required to, for the URIs with and without a trailing slash are technically distinct.
  • If an erroneous server blunders around redirects (eg. nginx) and sends the client off to a bad location, there can even be a lock-up. (see 404'ing directory breaks httpdirfs #95).

If there's any way to store the original URI with the path component for each file and directory, that should probably be done. If there is no way to store these, I don't have any concrete suggestions (as adding the trailing slash causes the same trouble on servers that choose not to use a trailing slash, although they're probably rare for practical reasons). If it can be arranged but comes at some cost, I can probably come up with examples of other trouble that crop up if the originally encoded URI is not preserved ;-) (read: It's a larger discussion and several related threads which I don't want to bring in here if it can be resolved easily anyway).

[edit: pointing to known nginx and newly reported httpdirfs issues]

@chrysn
Copy link
Contributor Author

chrysn commented Oct 3, 2021

Coming back here after having filed #95, I realized that the trailing slash is the only way for httpdirfs to distinguish what's supposed to be a subdirectory and what's supposed to be a file (as even having a directory foo and an index.html that says <a href="foo">...</a> makes foo to appear as an HTML file with slightly misleading links).

If that is really the case (and I did not miss any additional mechanism in here), just sending the trailing slash might indeed be a viable shortcut -- the "other examples" might still crop up, but then they'd be unrelated.

(I.e., I'd still recommend remembering the full URI, but adding the slash that was already there in the requests would maybe solve some cases before we tackle the larger topic of ambiguously encoded :, @ or subdelim characters, which on some severs might still send us into 301 Moved Permanently situations and on some just turn up 404 dead).

@fangfufu fangfufu added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Oct 3, 2021
@fangfufu
Copy link
Owner

fangfufu commented Nov 7, 2022

I realized that the trailing slash is the only way for httpdirfs to distinguish what's supposed to be a subdirectory and what's supposed to be a file

Yes, this is one of the mechanism.

@fangfufu
Copy link
Owner

fangfufu commented Oct 1, 2023

This is probably resolved by 41cb4b8, let me know if the problem still occurs.

@fangfufu fangfufu closed this as completed Oct 1, 2023
@chrysn
Copy link
Contributor Author

chrysn commented Oct 1, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants