Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indented docstrings with Unicode characters #801

Merged
merged 3 commits into from
Feb 19, 2024

Conversation

devmotion
Copy link
Contributor

Fixes the Unicode problem described in #65 (comment) (the result is still a bit off/unintended but the same problem seems to happen on the master branch without Unicode characters).

Comment on lines 286 to 287
start_boundary = findfirst(!=('"'), text)
end_boundary = findlast(!=('"'), text)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor optimization while working on this part of the code (I'll revert if desired): Only the indices of the first and last non-" character are needed.

# first, we need to remove any user indent
# only some lines will "count" towards increasing the user indent
# start at a very big guess
user_indent = typemax(Int)
user_indented = text[boundaries[1]:boundaries[end]]
user_indented = text[start_boundary:end_boundary]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findfirst and findlast return valid indices of text (given that there is a non-" character, which would be good to check...)

deindented = IOBuffer()
user_lines = split(user_indented, '\n')
for (index, line) in enumerate(user_lines)
# the first line doesn't count
if index != 1
first_character = findfirst(character -> !isspace(character), line)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a valid index of the string line (or nothing), but generally first_character - 1 might neither be a valid index and nor be equal to the number of spaces before the first non-space character (in case there is a Unicode space character with > 1 codepoint?).

@@ -320,7 +319,7 @@ function format_docstring(style::AbstractStyle, state::State, text::AbstractStri
write(deindented, line)
else
write(deindented, '\n')
write(deindented, SubString(line, user_indent + 1, length(line)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, here generally neither user_indent + 1 nor length(line) (which is the number of characters, not necessarily the last index!) might be a valid index.

@@ -320,7 +319,7 @@ function format_docstring(style::AbstractStyle, state::State, text::AbstractStri
write(deindented, line)
else
write(deindented, '\n')
write(deindented, SubString(line, user_indent + 1, length(line)))
write(deindented, chop(line; head = user_indent, tail = 0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed a bit easier to use chop here (which returns a SubString) but if preferred this could also be written using SubString:

Suggested change
write(deindented, chop(line; head = user_indent, tail = 0))
write(deindented, SubString(line, nextind(line, firstindex(line), user_indent), lastindex(line))

@devmotion devmotion marked this pull request as draft January 13, 2024 00:51
@domluna
Copy link
Owner

domluna commented Jan 19, 2024

this looks good to me. it's too bad commonmark doesn't do this by default though (or by an option afaik)

@devmotion
Copy link
Contributor Author

With the latest changes of the master branch, tests seem to pass 🙂 The diff is viewed best with "hide whitespace" option enabled: https://github.com/domluna/JuliaFormatter.jl/pull/801/files?w=1

@devmotion devmotion marked this pull request as ready for review February 19, 2024 14:42
@domluna
Copy link
Owner

domluna commented Feb 19, 2024

@devmotion are we good to merge this then?

@devmotion
Copy link
Contributor Author

Yes, good from my side.

@domluna domluna merged commit 8353a58 into domluna:master Feb 19, 2024
56 checks passed
@devmotion devmotion deleted the dw/substring branch February 19, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants