Fix indented docstrings with Unicode characters #801

devmotion · 2024-01-13T00:38:32Z

Fixes the Unicode problem described in #65 (comment) (the result is still a bit off/unintended but the same problem seems to happen on the master branch without Unicode characters).

devmotion · 2024-01-13T00:39:42Z

src/styles/default/pretty.jl

+    start_boundary = findfirst(!=('"'), text)
+    end_boundary = findlast(!=('"'), text)


Just a minor optimization while working on this part of the code (I'll revert if desired): Only the indices of the first and last non-" character are needed.

devmotion · 2024-01-13T00:43:21Z

src/styles/default/pretty.jl

    # first, we need to remove any user indent
    # only some lines will "count" towards increasing the user indent
    # start at a very big guess
    user_indent = typemax(Int)
-    user_indented = text[boundaries[1]:boundaries[end]]
+    user_indented = text[start_boundary:end_boundary]


findfirst and findlast return valid indices of text (given that there is a non-" character, which would be good to check...)

devmotion · 2024-01-13T00:47:01Z

src/styles/default/pretty.jl

    deindented = IOBuffer()
    user_lines = split(user_indented, '\n')
    for (index, line) in enumerate(user_lines)
        # the first line doesn't count
        if index != 1
-            first_character = findfirst(character -> !isspace(character), line)


This returns a valid index of the string line (or nothing), but generally first_character - 1 might neither be a valid index and nor be equal to the number of spaces before the first non-space character (in case there is a Unicode space character with > 1 codepoint?).

devmotion · 2024-01-13T00:48:20Z

src/styles/default/pretty.jl

@@ -320,7 +319,7 @@ function format_docstring(style::AbstractStyle, state::State, text::AbstractStri
                    write(deindented, line)
                else
                    write(deindented, '\n')
-                    write(deindented, SubString(line, user_indent + 1, length(line)))


Similarly, here generally neither user_indent + 1 nor length(line) (which is the number of characters, not necessarily the last index!) might be a valid index.

devmotion · 2024-01-13T00:50:06Z

src/styles/default/pretty.jl

@@ -320,7 +319,7 @@ function format_docstring(style::AbstractStyle, state::State, text::AbstractStri
                    write(deindented, line)
                else
                    write(deindented, '\n')
-                    write(deindented, SubString(line, user_indent + 1, length(line)))
+                    write(deindented, chop(line; head = user_indent, tail = 0))


It seemed a bit easier to use chop here (which returns a SubString) but if preferred this could also be written using SubString:

Suggested change

write(deindented, chop(line; head = user_indent, tail = 0))

write(deindented, SubString(line, nextind(line, firstindex(line), user_indent), lastindex(line))

domluna · 2024-01-19T22:48:02Z

this looks good to me. it's too bad commonmark doesn't do this by default though (or by an option afaik)

devmotion · 2024-02-19T14:42:24Z

With the latest changes of the master branch, tests seem to pass 🙂 The diff is viewed best with "hide whitespace" option enabled: https://github.com/domluna/JuliaFormatter.jl/pull/801/files?w=1

domluna · 2024-02-19T16:34:09Z

@devmotion are we good to merge this then?

devmotion · 2024-02-19T16:40:23Z

Yes, good from my side.

Fix indented docstrings with Unicode characters

9752a1d

devmotion commented Jan 13, 2024

View reviewed changes

devmotion marked this pull request as draft January 13, 2024 00:51

Handle empty docstrings

5fa7788

Merge branch 'master' into dw/substring

731011e

devmotion marked this pull request as ready for review February 19, 2024 14:42

domluna merged commit 8353a58 into domluna:master Feb 19, 2024
56 checks passed

devmotion deleted the dw/substring branch February 19, 2024 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix indented docstrings with Unicode characters #801

Fix indented docstrings with Unicode characters #801

devmotion commented Jan 13, 2024

devmotion Jan 13, 2024

devmotion Jan 13, 2024

devmotion Jan 13, 2024

devmotion Jan 13, 2024

devmotion Jan 13, 2024

domluna commented Jan 19, 2024

devmotion commented Feb 19, 2024

domluna commented Feb 19, 2024

devmotion commented Feb 19, 2024

		start_boundary = findfirst(!=('"'), text)
		end_boundary = findlast(!=('"'), text)

	write(deindented, chop(line; head = user_indent, tail = 0))
	write(deindented, SubString(line, nextind(line, firstindex(line), user_indent), lastindex(line))

Fix indented docstrings with Unicode characters #801

Fix indented docstrings with Unicode characters #801

Conversation

devmotion commented Jan 13, 2024

devmotion Jan 13, 2024

Choose a reason for hiding this comment

devmotion Jan 13, 2024

Choose a reason for hiding this comment

devmotion Jan 13, 2024

Choose a reason for hiding this comment

devmotion Jan 13, 2024

Choose a reason for hiding this comment

devmotion Jan 13, 2024

Choose a reason for hiding this comment

domluna commented Jan 19, 2024

devmotion commented Feb 19, 2024

domluna commented Feb 19, 2024

devmotion commented Feb 19, 2024