Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tagger for Code Prose Composition
Add a tagger that adds attributes for code-prose-other composition of files based on line classifications.
Produces tags like the following:
Recommended filter for mixed prose/code content based on these tags is:
The code entropy adjusts for bias towards code for short string including "code-y" characters like (, ), [, ], : etc due to a lack of nice negative examples. This is a TODO, to generate an appropriate set of examples that balance this. Regardless, for now, filtering for high confidence code predictions works well.
Usage Detail
The model path references a private hugging face model under allenai. Requires an access token with read permissions. Open to a discussion of making this public for simplicity, only reluctance is that it's very much still a prototype and has a long way to go. See above filter discussion.