Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file+noindex parsing is broken for Windows paths #10703

Open
jasagredo opened this issue Jan 3, 2025 · 17 comments · May be fixed by #10728
Open

file+noindex parsing is broken for Windows paths #10703

jasagredo opened this issue Jan 3, 2025 · 17 comments · May be fixed by #10728

Comments

@jasagredo
Copy link
Collaborator

Describe the bug

The file+noindex scheme is underspecified for Windows paths. It seems to follow the file scheme (RFC 8089), but Windows paths are not properly parsed by network-uri. A path like file:///C:/some/path should produce the path C:/some/path, however it produces
/C:/some/path.

$ cabal repl network-uri
...
ghci> import Network.URI
ghci> uri = fromJust $ parseURI "file+noindex:///C:/some/path"
ghci> putStr $ unlines ["Scheme: " <> uriScheme uri, "Auth: " <> show (uriAuthority uri), "Path: " <> uriPath uri, "Frag: " <> uriFragment uri]
Scheme: local+noindex:
Auth: Just (URIAuth {uriUserInfo = "", uriRegName = "", uriPort = ""})
Path: /C:/some/path
Frag:
ghci> readFile $ uriPath uri
*** Exception: /C:/some/path: openFile: invalid argument (Invalid argument)

And if the last slash is removed, then the C: part is parsed as part of the auth:

$ cabal repl -b network-uri
...
ghci> import Network.URI
ghci> uri = fromJust $ parseURI "file+noindex://C:/some/path"
ghci> putStr $ unlines ["Scheme: " <> uriScheme uri, "Auth: " <> show (uriAuthority uri), "Path: " <> uriPath uri, "Frag: " <> uriFragment uri]
Scheme: local+noindex:
Auth: Just (URIAuth {uriUserInfo = "", uriRegName = "C", uriPort = ":"})
Path: /some/path
Frag:
ghci> readFile $ uriPath uri
"The contents of the file in C:/some/path\n"
ghci>

$ cd D:/
$ cabal repl -b network-uri
...
ghci> readFile "/some/path"
*** Exception: /some/path: openFile: does not exist (No such file or directory)

Just by luck, paths starting with / are interpreted as paths in the current drive so if we just happen to want that it might work.

If no volume or drive letter is specified and the directory name begins with the directory separator character, the path is relative from the root of the current drive.

from https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#example-ways-to-refer-to-the-same-file

The right parsing would be what file-uri does:

ghci> import System.URI.File
ghci> uri = fromRight undefined $ parseFileURI ExtendedWindows "file:///C:/some/path"
ghci> uri
FileURI {fileAuth = Nothing, filePath = "C:/some/path"}
ghci> readFile $ Data.ByteString.Char8.unpack $ filePath uri
"The contents of the file in C:/some/path\n"

Which results in a path accessible from everywhere in the system. However this is not the end of the story because:

  • file-uri only allows file: schemes, it will not allow file+noindex: uris.
  • The file uri RFC does not mention fragments, so file-uri does not parse those.

We can either define a parser for file+noindex from scratch, or do some tricks to parse the uri as a network-uri uri, then fabricate an intermediate uri with the file: scheme and use file-uri or something else. I just didn't want this finding to go missing again.

@jasagredo
Copy link
Collaborator Author

It probably doesn't make much sense fixing this before #8889 goes in.

@jasagredo
Copy link
Collaborator Author

jasagredo commented Jan 3, 2025

Also this PR fixes the issue but in a slightly hacky way MercuryTechnologies#3. It was dropped in the end from what was merged into master

@phadej
Copy link
Collaborator

phadej commented Jan 3, 2025

I don't think it make sense to support DOS filepaths. UNC paths could already work (I don't have windows machine to try).

@phadej
Copy link
Collaborator

phadej commented Jan 3, 2025

Also duplicate of #7065: looks like UNC paths do work!

@jasagredo
Copy link
Collaborator Author

Thanks @phadej. I can try with UNC paths next week. Not sure if they need to be normalised or one can open a path //?/C:/some/path

@jasagredo
Copy link
Collaborator Author

jasagredo commented Jan 7, 2025

@phadej It does not work with UNC paths:

ghci> printParsedFields "file+noindex:////?/C:/some/path"
Scheme: file+noindex:
Auth: Just (URIAuth {uriUserInfo = "", uriRegName = "", uriPort = ""})
Path: //
Query: ?/C:/some/path
Frag:

No matter which combination of / I use, it will consider the path part of the query. Also notice we would have to postprocess the filepath:

ghci> readFile "//?/C:/some/path"
*** Exception: //?/C:/some/path: openFile: does not exist (No such file or directory)
ghci> readFile $ normalise "//?/C:/some/path"
"The contents of the file in C:/some/path\n"

@phadej
Copy link
Collaborator

phadej commented Jan 7, 2025

This special folder is accessed via the DOS device path syntax, which is one of:

\\.\C:\Test\Foo.txt \\?\C:\Test\Foo.txt
*Network.URI> fmap uriPath $ parseURI "file:////./C:/Test/Foo.txt"
Just "//./C:/Test/Foo.txt"

@jasagredo
Copy link
Collaborator Author

jasagredo commented Jan 7, 2025

I see. \\.\ could indeed work. I find it quite convoluted and inconvenient for Windows users of cabal (as opposed to parsing a file uri with ExtendedWindows extension), but I agree that it would work.

Btw, where is your quote from?

@phadej
Copy link
Collaborator

phadej commented Jan 8, 2025

@phadej
Copy link
Collaborator

phadej commented Jan 8, 2025

To comment on original post

The right parsing would be what file-uri does:

It's not. file-uri says

ExtendedWindows Parses windows paths according to E.1, E.2 and E.3. Unlike the spec, posix paths are rejected.

And more importantly, we need to parse the schema first to even know we are parsing file-path like URIs. The repository location is a URL, it can be file: schema, but more often it's http or https.


I guess e.g. Chrome does some extra processing so file:///C:/foo.txt does work, but it's not just "always strip leading slash on Windows", as UNC paths must work too. If UNC paths work without extra processing, that's IMO good enough.

@jasagredo
Copy link
Collaborator Author

jasagredo commented Jan 8, 2025

Ok, I am convinced. BTW, //./.. paths also need to be normalised before being used.

@ulysses4ever
Copy link
Collaborator

Did #9540 run into this kind of issue? Could the discussion here be used to unblock #9540?

@jasagredo
Copy link
Collaborator Author

Maybe. I have not heard from Andrea in quite some time. Don't know what is the plan with those PRs.

@ulysses4ever
Copy link
Collaborator

ulysses4ever commented Jan 8, 2025

Yes, my understanding is that Andrea is currently not around, and if we want some of his drafts merged, we have to take care of them on our own. I've been hoping that we can bring #9540 over the line ourselves because (my impression) the feature is fully implemented; it just doesn't work on Windows because of the paths confusion.

@jasagredo
Copy link
Collaborator Author

@phadej to add to the problem: haskell/filepath#247 😢

@phadej
Copy link
Collaborator

phadej commented Jan 8, 2025

Prelude System.FilePath.Windows> normalise "//./foo/bar"
"\\\\.\\foo\\bar"

I don't see a problem.

EDIT:

Prelude System.FilePath.Windows> normalise "//./c:/foo/bar"
"\\\\.\\c:/foo\\bar"

Looks like a bug though.

@jasagredo
Copy link
Collaborator Author

I first saw your message on the mail (without the edited part) and I was questioning my own sanity once again 😁

@jasagredo jasagredo linked a pull request Jan 9, 2025 that will close this issue
6 tasks
@jasagredo jasagredo linked a pull request Jan 9, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants