-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix indented docstrings with Unicode characters #801
Conversation
src/styles/default/pretty.jl
Outdated
start_boundary = findfirst(!=('"'), text) | ||
end_boundary = findlast(!=('"'), text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor optimization while working on this part of the code (I'll revert if desired): Only the indices of the first and last non-"
character are needed.
src/styles/default/pretty.jl
Outdated
# first, we need to remove any user indent | ||
# only some lines will "count" towards increasing the user indent | ||
# start at a very big guess | ||
user_indent = typemax(Int) | ||
user_indented = text[boundaries[1]:boundaries[end]] | ||
user_indented = text[start_boundary:end_boundary] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findfirst
and findlast
return valid indices of text
(given that there is a non-"
character, which would be good to check...)
deindented = IOBuffer() | ||
user_lines = split(user_indented, '\n') | ||
for (index, line) in enumerate(user_lines) | ||
# the first line doesn't count | ||
if index != 1 | ||
first_character = findfirst(character -> !isspace(character), line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a valid index of the string line
(or nothing
), but generally first_character - 1
might neither be a valid index and nor be equal to the number of spaces before the first non-space character (in case there is a Unicode space character with > 1 codepoint?).
@@ -320,7 +319,7 @@ function format_docstring(style::AbstractStyle, state::State, text::AbstractStri | |||
write(deindented, line) | |||
else | |||
write(deindented, '\n') | |||
write(deindented, SubString(line, user_indent + 1, length(line))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, here generally neither user_indent + 1
nor length(line)
(which is the number of characters, not necessarily the last index!) might be a valid index.
src/styles/default/pretty.jl
Outdated
@@ -320,7 +319,7 @@ function format_docstring(style::AbstractStyle, state::State, text::AbstractStri | |||
write(deindented, line) | |||
else | |||
write(deindented, '\n') | |||
write(deindented, SubString(line, user_indent + 1, length(line))) | |||
write(deindented, chop(line; head = user_indent, tail = 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed a bit easier to use chop
here (which returns a SubString
) but if preferred this could also be written using SubString
:
write(deindented, chop(line; head = user_indent, tail = 0)) | |
write(deindented, SubString(line, nextind(line, firstindex(line), user_indent), lastindex(line)) |
this looks good to me. it's too bad commonmark doesn't do this by default though (or by an option afaik) |
With the latest changes of the master branch, tests seem to pass 🙂 The diff is viewed best with "hide whitespace" option enabled: https://github.com/domluna/JuliaFormatter.jl/pull/801/files?w=1 |
@devmotion are we good to merge this then? |
Yes, good from my side. |
Fixes the Unicode problem described in #65 (comment) (the result is still a bit off/unintended but the same problem seems to happen on the master branch without Unicode characters).