Skip to content

Commit

Permalink
better tokenisation implementation
Browse files Browse the repository at this point in the history
  • Loading branch information
flammie committed Jan 23, 2025
1 parent 54371c5 commit f83244d
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions devtools/cheap-tokeniser.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
# cheap tokeniser if we can't run full tokeniser on every CI commit
# or all the other cases when you need all the tokens in a minute instead of an
# hour.
cat $@ |\
sed -e 's/[.,:!?;)"*]\+ / &/g'\
-e 's/[.,:!?:)"*]\+$/ &/' \
-e 's/ [(*"[]/& /g' |\
tr -s ' ' '\n'

0 comments on commit f83244d

Please sign in to comment.