Skip to content

Commit

Permalink
Prefer MIME type when determining extensions for MediaBag items (#10557)
Browse files Browse the repository at this point in the history
Currently, remote images added to the MediaBag are stored at paths with
extensions determined based on the external URI. For instance, an image
from https://example.com/image.png is stored as <hash>.png. If the
URI does not contain an extension (e.g., https://example.com/image),
then the content-type of the downloaded image is used to determine the
extension.

This change switches the precedence such that content-type is preferred
over extensions contained in the URI. This is necessary because some
images are located at URIs with misleading extensions -- shields.io,
for instance, serves SVGs from URIs with .yml extensions. With this
change, the image/svg+xml content-type is now preferred over the .yml
URI extension. This fixes a bug in the PDF writer in which such an image
would be mishandled due to not being identified as an SVG.
  • Loading branch information
max-heller authored Jan 22, 2025
1 parent ba04a99 commit b95645b
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
8 changes: 5 additions & 3 deletions src/Text/Pandoc/MediaBag.hs
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,11 @@ insertMedia fp mbMime contents (MediaBag mediamap)
_ -> getMimeTypeDef fp''
mt = fromMaybe fallback mbMime
path = maybe fp'' (unEscapeString . uriPath) uri
ext = case takeExtension path of
'.':e | '%' `notElem` e -> '.':e
_ -> maybe "" (\x -> '.':T.unpack x) $ extensionFromMimeType mt
ext = case extensionFromMimeType mt of
Just e -> '.':T.unpack e
Nothing -> case takeExtension path of
'.':e | '%' `notElem` e -> '.':e
_ -> ""

-- | Lookup a media item in a 'MediaBag', returning mime type and contents.
lookupMedia :: FilePath
Expand Down
2 changes: 1 addition & 1 deletion test/Tests/MediaBag.hs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ tests = [
assertBool "file in directory is not extracted with original name" exists1
exists2 <- doesFileExist ("foo" </> "f9d88c3dbe18f6a7f5670e994a947d51216cdf0e.jpg")
assertBool "file above directory is not extracted with hashed name" exists2
exists3 <- doesFileExist ("foo" </> "2a0eaa89f43fada3e6c577beea4f2f8f53ab6a1d.lua")
exists3 <- doesFileExist ("foo" </> "2a0eaa89f43fada3e6c577beea4f2f8f53ab6a1d.png")
exists4 <- doesFileExist "a.lua"
assertBool "data uri with malicious payload gets written outside of destination dir"
(exists3 && not exists4)
Expand Down

0 comments on commit b95645b

Please sign in to comment.