-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
93 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,40 +60,64 @@ The release has a SHA2-256 checksum and size as shown: | |
|
||
SHA-256 d116d9e85c77826c1fd3ff4d18c56c311a6295c8247f83686cd7e8805963220f 28953 newgid-1.10.tgz | ||
|
||
### `utf8-unicode` | ||
|
||
The code for `utf8-unicode` which processes named files or standard | ||
input as UTF-8 and prints the sequence of bytes that make up a single | ||
character and the Unicode character it corresponds to. | ||
|
||
For example: | ||
|
||
$ echo "grep –i where | wc -l" | utf8-unicode | ||
0x67 = U+0067 | ||
0x72 = U+0072 | ||
0x65 = U+0065 | ||
0x70 = U+0070 | ||
0x20 = U+0020 | ||
0xE2 0x80 0x93 = U+2013 | ||
0x69 = U+0069 | ||
0x20 = U+0020 | ||
0x77 = U+0077 | ||
0x68 = U+0068 | ||
0x65 = U+0065 | ||
0x72 = U+0072 | ||
0x65 = U+0065 | ||
0x20 = U+0020 | ||
0x7C = U+007C | ||
0x20 = U+0020 | ||
0x77 = U+0077 | ||
0x63 = U+0063 | ||
0x20 = U+0020 | ||
0x2D = U+002D | ||
0x6C = U+006C | ||
### `strspan` | ||
|
||
The code for library functions `str_span()` and `str_cspan()`, which are | ||
related to, but different from `strspn()` and `strcspn()`. | ||
These functions are designed to be used repeatedly for the same set of | ||
searching. | ||
They precompute a table of which characters are to be matched. | ||
One of the functions `set_span()` or `set_ranges()` is used to | ||
initialize the precomputed data. | ||
They readily out-perform `strspn()` and `strcspn()` on moderate size | ||
searches. | ||
|
||
The distribution includes a timing program `test2.strspan` which can be | ||
run with a number files. | ||
It measures the time to read the files — processing them with | ||
`strlen()` and `strchr()` to warm up the cache and I/O buffers. | ||
It then runs `str_span()` and `str_cspan()` on the same files, and then | ||
`strspn()` and `strcspn()`. | ||
It can be effective to name the same file multiple times on the command | ||
line. | ||
|
||
Example use: | ||
|
||
$ test2.strspan bible-be.txt bible-be.txt bible-be.txt | ||
# NB: The tests for str_span and strspn are comparable | ||
# The tests for strlen and strchr are not comparable | ||
strlen 0.187297 (4467663) bible-be.txt | ||
strlen 0.186324 (4467663) bible-be.txt | ||
strlen 0.187616 (4467663) bible-be.txt | ||
strchr 0.182676 (4467663) bible-be.txt | ||
strchr 0.185405 (4467663) bible-be.txt | ||
strchr 0.184813 (4467663) bible-be.txt | ||
str_span 0.195715 (4467663) bible-be.txt | ||
str_span 0.199516 (4467663) bible-be.txt | ||
str_span 0.194588 (4467663) bible-be.txt | ||
strspn 0.347890 (4467663) bible-be.txt | ||
strspn 0.346028 (4467663) bible-be.txt | ||
strspn 0.347305 (4467663) bible-be.txt | ||
$ test2.strspan great.panjandrum great.panjandrum great.panjandrum | ||
# NB: The tests for str_span and strspn are comparable | ||
# The tests for strlen and strchr are not comparable | ||
strlen 0.000046 (487) great.panjandrum | ||
strlen 0.000031 (487) great.panjandrum | ||
strlen 0.000030 (487) great.panjandrum | ||
strchr 0.000036 (487) great.panjandrum | ||
strchr 0.000030 (487) great.panjandrum | ||
strchr 0.000030 (487) great.panjandrum | ||
str_span 0.000035 (487) great.panjandrum | ||
str_span 0.000032 (487) great.panjandrum | ||
str_span 0.000031 (487) great.panjandrum | ||
strspn 0.000061 (487) great.panjandrum | ||
strspn 0.000052 (487) great.panjandrum | ||
strspn 0.000053 (487) great.panjandrum | ||
$ | ||
|
||
The Unicode EN DASH U+2013 was why that `grep` command was failing with | ||
an error about being unable to find the file `where`. | ||
These results show that `str_span()` and `str_cspan()` are marginally | ||
slower than using `strlen()` or `strchr(), but considerably quicker than | ||
use `strspn()` and `strcspn()`. | ||
|
||
### `timecmd` | ||
|
||
|
@@ -115,7 +139,7 @@ The code for `timecmd` which measures elapsed time of commands specified as part | |
|
||
Example uses: | ||
|
||
$ timecmd -m sleep 65 | ||
$ timecmd -m sleep 65 | ||
2020-03-01 08:42:58.079 [PID 16916] sleep 65 | ||
2020-03-01 08:44:03.086 [PID 16916; status 0x0000] - 1m 5.007s | ||
$ timecmd -b -m sleep 65 | ||
|
@@ -135,5 +159,40 @@ Example uses: | |
|
||
The sample commands all produced no output. It works fine with commands that do. | ||
|
||
### `utf8-unicode` | ||
|
||
The code for `utf8-unicode` which processes named files or standard | ||
input as UTF-8 and prints the sequence of bytes that make up a single | ||
character and the Unicode character it corresponds to. | ||
|
||
For example: | ||
|
||
$ echo "grep –i where | wc -l" | utf8-unicode | ||
0x67 = U+0067 | ||
0x72 = U+0072 | ||
0x65 = U+0065 | ||
0x70 = U+0070 | ||
0x20 = U+0020 | ||
0xE2 0x80 0x93 = U+2013 | ||
0x69 = U+0069 | ||
0x20 = U+0020 | ||
0x77 = U+0077 | ||
0x68 = U+0068 | ||
0x65 = U+0065 | ||
0x72 = U+0072 | ||
0x65 = U+0065 | ||
0x20 = U+0020 | ||
0x7C = U+007C | ||
0x20 = U+0020 | ||
0x77 = U+0077 | ||
0x63 = U+0063 | ||
0x20 = U+0020 | ||
0x2D = U+002D | ||
0x6C = U+006C | ||
$ | ||
|
||
The Unicode EN DASH U+2013 was why that `grep` command was failing with | ||
an error about being unable to find the file `where`. | ||
|
||
Jonathan Leffler ([email protected]) | ||
Sunday 1st March 2020 | ||
Wednesday 18th March 2020 |
Binary file not shown.