Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional absolute char indexing for highlighting #2584

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

AgentHagu
Copy link

@AgentHagu AgentHagu commented Jan 17, 2025

What is the purpose of this pull request?

  • Documentation update
  • Bug fix
  • Feature addition or enhancement
  • Code maintenance
  • DevOps
  • Improve developer experience
  • Others, please explain:

Overview of changes:
This PR closes #2573. In summary, it allows users to now optionally highlight leading whitespace, giving them more options when using MarkBind.

Users can now add an optional + before their char bound specifier to indicate that the index should be considered as its absolute value, i.e. inclusive of leading whitespace/indentation. For example, previously 1[:4] would highlight the abcd in abcd. But now, by using +1[:4], MarkBind will highlight ab in abcd (i.e. it will also highlight the first 2 spaces in the string since it considers the end bound of 4 as its absolute value, rather than relative to the indentation level).

Anything you'd like to highlight/discuss:
nil

Testing instructions:
A test similar to the example brought up in #2573 can be used to test the new functionality. For example:

```{highlight-lines="1[:4]"}
  abcd
```

abcd should be highlighted, i.e. exclusive of the 2 whitespaces in the beginning.

```{highlight-lines="+1[:4]"}
  abcd
```

Now, ab should be highlighted, since the "+" symbol was added, so the absolute value of the bounds is considered in the bounds calculation.

Proposed commit message: (wrap lines at 72 characters)
Add optional absolute char indexing for highlighting

Currently, there is no way to include highlighting of leading
whitespace when using the highlight-lines attribute. Providing such
an option may be beneficial to users, giving them more freedom
when using MarkBind.


Checklist: ☑️

  • Updated the documentation for feature additions and enhancements
  • Added tests for bug fixes or features
  • Linked all related issues
  • No unrelated changes

Reviewer checklist:

Indicate the SEMVER impact of the PR:

  • Major (when you make incompatible API changes)
  • Minor (when you add functionality in a backward compatible manner)
  • Patch (when you make backward compatible bug fixes)

At the end of the review, please label the PR with the appropriate label: r.Major, r.Minor, r.Patch.

Breaking change release note preparation (if applicable):

  • To be included in the release note for any feature that is made obsolete/breaking

Give a brief explanation note about:

  • what was the old feature that was made obsolete
  • any replacement feature (if any), and
  • how the author should modify his website to migrate from the old feature to the replacement feature (if possible).

groups.pop() was called without considering the
number of capturing groups within. This caused
some issues with the end bound of word matching
being popped and removed from the array.
Copy link
Contributor

@gerteck gerteck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the use case of highlighting, it seems that the reason why this feature was developed this way was because of the usecase of highlighting code blocks, which are appropriately indented. (Given that the documentation is under the code formatting section).

Additionally, to capture whitespace from front of line, it could be better to add symbol in front rather than the back?

* @returns {[number, number]} The actual bound computed
*/
static computeCharBounds(bound: [number, number], line: string): [number, number] {
static computeCharBounds(bound: [number, number], line: string,
highlightSpaces: boolean): [number, number] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHighlightSpaces instead of highlightSpaces for boolean name choice would be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks

@AgentHagu
Copy link
Author

Additionally, to capture whitespace from front of line, it could be better to add symbol in front rather than the back?

Good idea, I'll try moving the symbol to the front, such as an underscore like "_1[2:4]" to make it more intuitive

@gerteck
Copy link
Contributor

gerteck commented Jan 20, 2025

Additionally, to capture whitespace from front of line, it could be better to add symbol in front rather than the back?

Good idea, I'll try moving the symbol to the front, such as an underscore like "_1[2:4]" to make it more intuitive

Sorry, would something like

1[_:4] where _ replaces the digit if we want to include the whitespace? Not sure if it will complicate things, but it seems most intuitive to me, since it relates to the start of the highlight / line, without adding anything extra in the syntax, given it would be left empty.

Does anyone else have any suggestions?

@damithc
Copy link
Contributor

damithc commented Jan 20, 2025

Thanks @AgentHagu for this PR, and others for comments.
In terms of feature design, what if we treat spaces as regular characters, without using any special syntax? Would that be cleaner?
Don't worry about backward compatibility. It is unlikely to affect any existing MarkBind sites, as this is a very much an edge case of usage.

@AgentHagu
Copy link
Author

Thanks @AgentHagu for this PR, and others for comments. In terms of feature design, what if we treat spaces as regular characters, without using any special syntax? Would that be cleaner? Don't worry about backward compatibility. It is unlikely to affect any existing MarkBind sites, as this is a very much an edge case of usage.

Thanks for the feedback! Just to clarify, do you mean having char-bound highlighting ignore the indentation level by default? So the spaces within the indents are now automatically considered in the char bounds?

@damithc
Copy link
Contributor

damithc commented Jan 20, 2025

Thanks for the feedback! Just to clarify, do you mean having char-bound highlighting ignore the indentation level by default? So the spaces within the indents are now automatically considered in the char bounds?

Oh, so that's why spaces were not counted before! I didn't realise. So, the current counting is relative to the first non-space char, by default?
Perhaps the solution is to introduce a notation to indicate absolute char counting instead?

@AgentHagu
Copy link
Author

Oh, so that's why spaces were not counted before! I didn't realise. So, the current counting is relative to the first non-space char, by default?

Yep, by default the indentation level is checked and accounted for and the index counting is relative to the first non-whitespace char.

Perhaps the solution is to introduce a notation to indicate absolute char counting instead?

To clarify, do you mean introducing something like 1[0.3] instead of 1[0:3] for whitespace-inclusive char-bound highlighting? If so, I think this could be a good solution that doesn't affect existing MarkBind sites and also provides new options to users. It also means we wouldn't need the special syntax anymore.

@damithc
Copy link
Contributor

damithc commented Jan 21, 2025

To clarify, do you mean introducing something like 1[0.3] instead of 1[0:3] for whitespace-inclusive char-bound highlighting? If so, I think this could be a good solution that doesn't affect existing MarkBind sites and also provides new options to users. It also means we wouldn't need the special syntax anymore.

@AgentHagu I was thinking of something like +1[0:3] where + indicates numbers are absolute i.e., similar to your current syntax but the meaning is different.

@AgentHagu
Copy link
Author

AgentHagu commented Jan 21, 2025

@AgentHagu I was thinking of something like +1[0:3] where + indicates numbers are absolute i.e., similar to your current syntax but the meaning is different.

I see. In that case, I believe the implementation should already support absolute char counting, so only the syntax and PR description needs to be updated

@damithc
Copy link
Contributor

damithc commented Jan 21, 2025

Just so we don't forget later, this is a user-visible change, which means UG needs to be updated as well. But you can wait till the feature behaviour is finalised before adding the UG update to the PR.

Copy link
Contributor

@lhw-1 lhw-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @AgentHagu!

Revisiting a discussion point that we brought up earlier - in the cases of other kinds of whitespaces being present (e.g. tab or newline), how do we handle it? E.g. I can see a situation where the user copy-pastes in an existing chunk of code with whitespace, and non-space whitespace characters are added in directly. This may just be a documentation problem though given that this is not highly likely & usually these are converted to whitespaces in many IDEs etc, what do you think? Or is there a tweak we can do that will efficiently cover these cases?

const lineNumber = HighlightRuleComponent
.isValidLineNumber(groups.shift() ?? '', 1, lines.length, lineNumberOffset);
if (!lineNumber) return null;

const isUnbounded = groups.every(x => x === '');

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

start += indents.length;
// Clamp values
if (start < indents.length) {
start = indents.length;
} else if (start > line.length) {
start = line.length;
}
} else if (start > line.length) {
start = line.length;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional logic is slightly convoluted for me here. If start !== UNBOUNDED is true, and isHighlightSpaces is true, and start > line.length is false, what happens? (Will this even happen in theory?)

Also, given that Lines 129-130 and 132-133 are duplicated, I wonder if there is a way to simplify this chunk of code. We don't need to shy away from ternary conditional operators if they make the code easier to understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If start !== UNBOUNDED is true, and isHighlightSpaces is true, and start > line.length is false, what happens? (Will this even happen in theory?)

In theory, it's possible for that to occur. For example, for some text, start = 2 would be bounded and <= line.length and if isHighlightSpaces is true, then that means we should leave the start value as is, since we wish to use the absolute value of those bounds.

Also, given that Lines 129-130 and 132-133 are duplicated, I wonder if there is a way to simplify this chunk of code. We don't need to shy away from ternary conditional operators if they make the code easier to understand.

Thanks for the feedback! I'll see if there's a way to simplify it more to improve readability.

});

test('handles line-length end correctly', () => {
const bounds = HighlightRuleComponent.computeCharBounds([0, 4], ' abcd', true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why abcd here in particular? (Let me know if it's a non-issue)

Copy link
Author

@AgentHagu AgentHagu Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no underlying reason for abcd, I can update the test-case to follow the other test-cases' string input of ' some text'.

@AgentHagu
Copy link
Author

AgentHagu commented Jan 27, 2025

Thanks for the PR, @AgentHagu!

Revisiting a discussion point that we brought up earlier - in the cases of other kinds of whitespaces being present (e.g. tab or newline), how do we handle it? E.g. I can see a situation where the user copy-pastes in an existing chunk of code with whitespace, and non-space whitespace characters are added in directly. This may just be a documentation problem though given that this is not highly likely & usually these are converted to whitespaces in many IDEs etc, what do you think? Or is there a tweak we can do that will efficiently cover these cases?

Thanks for the feedback!

After investigating, it seems that tabs are treated as 1 "character" by the current code, leading to weird highlighting such as the following:
image
For e.g., the highlight rule for line 4 is +4[0:2], but rather than highlighting the first 2 characters of the indentation, it highlights the first 2 tabs as a whole.

My current idea is to see whether it's possible to convert such tabs to the corresponding number of whitespace, which should remedy the issue automatically. What do you think of this solution?

@AgentHagu AgentHagu changed the title Add optional leading whitespace highlighting Add optional absolute char indexing for highlighting Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Highlighting code blocks: leading spaces should be counted
4 participants