Skip to content

Commit

Permalink
Add strspan-1.03.tgz to packages
Browse files Browse the repository at this point in the history
  • Loading branch information
jleffler committed Mar 19, 2020
1 parent 0d7ef28 commit 5a205eb
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 34 deletions.
127 changes: 93 additions & 34 deletions packages/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,40 +60,64 @@ The release has a SHA2-256 checksum and size as shown:

SHA-256 d116d9e85c77826c1fd3ff4d18c56c311a6295c8247f83686cd7e8805963220f 28953 newgid-1.10.tgz

### `utf8-unicode`

The code for `utf8-unicode` which processes named files or standard
input as UTF-8 and prints the sequence of bytes that make up a single
character and the Unicode character it corresponds to.

For example:

$ echo "grep –i where | wc -l" | utf8-unicode
0x67 = U+0067
0x72 = U+0072
0x65 = U+0065
0x70 = U+0070
0x20 = U+0020
0xE2 0x80 0x93 = U+2013
0x69 = U+0069
0x20 = U+0020
0x77 = U+0077
0x68 = U+0068
0x65 = U+0065
0x72 = U+0072
0x65 = U+0065
0x20 = U+0020
0x7C = U+007C
0x20 = U+0020
0x77 = U+0077
0x63 = U+0063
0x20 = U+0020
0x2D = U+002D
0x6C = U+006C
### `strspan`

The code for library functions `str_span()` and `str_cspan()`, which are
related to, but different from `strspn()` and `strcspn()`.
These functions are designed to be used repeatedly for the same set of
searching.
They precompute a table of which characters are to be matched.
One of the functions `set_span()` or `set_ranges()` is used to
initialize the precomputed data.
They readily out-perform `strspn()` and `strcspn()` on moderate size
searches.

The distribution includes a timing program `test2.strspan` which can be
run with a number files.
It measures the time to read the files — processing them with
`strlen()` and `strchr()` to warm up the cache and I/O buffers.
It then runs `str_span()` and `str_cspan()` on the same files, and then
`strspn()` and `strcspn()`.
It can be effective to name the same file multiple times on the command
line.

Example use:

$ test2.strspan bible-be.txt bible-be.txt bible-be.txt
# NB: The tests for str_span and strspn are comparable
# The tests for strlen and strchr are not comparable
strlen 0.187297 (4467663) bible-be.txt
strlen 0.186324 (4467663) bible-be.txt
strlen 0.187616 (4467663) bible-be.txt
strchr 0.182676 (4467663) bible-be.txt
strchr 0.185405 (4467663) bible-be.txt
strchr 0.184813 (4467663) bible-be.txt
str_span 0.195715 (4467663) bible-be.txt
str_span 0.199516 (4467663) bible-be.txt
str_span 0.194588 (4467663) bible-be.txt
strspn 0.347890 (4467663) bible-be.txt
strspn 0.346028 (4467663) bible-be.txt
strspn 0.347305 (4467663) bible-be.txt
$ test2.strspan great.panjandrum great.panjandrum great.panjandrum
# NB: The tests for str_span and strspn are comparable
# The tests for strlen and strchr are not comparable
strlen 0.000046 (487) great.panjandrum
strlen 0.000031 (487) great.panjandrum
strlen 0.000030 (487) great.panjandrum
strchr 0.000036 (487) great.panjandrum
strchr 0.000030 (487) great.panjandrum
strchr 0.000030 (487) great.panjandrum
str_span 0.000035 (487) great.panjandrum
str_span 0.000032 (487) great.panjandrum
str_span 0.000031 (487) great.panjandrum
strspn 0.000061 (487) great.panjandrum
strspn 0.000052 (487) great.panjandrum
strspn 0.000053 (487) great.panjandrum
$

The Unicode EN DASH U+2013 was why that `grep` command was failing with
an error about being unable to find the file `where`.
These results show that `str_span()` and `str_cspan()` are marginally
slower than using `strlen()` or `strchr(), but considerably quicker than
use `strspn()` and `strcspn()`.

### `timecmd`

Expand All @@ -115,7 +139,7 @@ The code for `timecmd` which measures elapsed time of commands specified as part

Example uses:

$ timecmd -m sleep 65
$ timecmd -m sleep 65
2020-03-01 08:42:58.079 [PID 16916] sleep 65
2020-03-01 08:44:03.086 [PID 16916; status 0x0000] - 1m 5.007s
$ timecmd -b -m sleep 65
Expand All @@ -135,5 +159,40 @@ Example uses:

The sample commands all produced no output. It works fine with commands that do.

### `utf8-unicode`

The code for `utf8-unicode` which processes named files or standard
input as UTF-8 and prints the sequence of bytes that make up a single
character and the Unicode character it corresponds to.

For example:

$ echo "grep –i where | wc -l" | utf8-unicode
0x67 = U+0067
0x72 = U+0072
0x65 = U+0065
0x70 = U+0070
0x20 = U+0020
0xE2 0x80 0x93 = U+2013
0x69 = U+0069
0x20 = U+0020
0x77 = U+0077
0x68 = U+0068
0x65 = U+0065
0x72 = U+0072
0x65 = U+0065
0x20 = U+0020
0x7C = U+007C
0x20 = U+0020
0x77 = U+0077
0x63 = U+0063
0x20 = U+0020
0x2D = U+002D
0x6C = U+006C
$

The Unicode EN DASH U+2013 was why that `grep` command was failing with
an error about being unable to find the file `where`.

Jonathan Leffler ([email protected])
Sunday 1st March 2020
Wednesday 18th March 2020
Binary file added packages/strspan-1.03.tgz
Binary file not shown.

0 comments on commit 5a205eb

Please sign in to comment.