Add strspan-1.03.tgz to packages

jleffler · Mar 19, 2020 · 5a205eb · 5a205eb
1 parent 0d7ef28
commit 5a205eb
Show file tree

Hide file tree

Showing 2 changed files with 93 additions and 34 deletions.
diff --git a/packages/README.md b/packages/README.md
@@ -60,40 +60,64 @@ The release has a SHA2-256 checksum and size as shown:
 
     SHA-256 d116d9e85c77826c1fd3ff4d18c56c311a6295c8247f83686cd7e8805963220f    28953 newgid-1.10.tgz
 
-### `utf8-unicode`
-
-The code for `utf8-unicode` which processes named files or standard
-input as UTF-8 and prints the sequence of bytes that make up a single
-character and the Unicode character it corresponds to.
-
-For example:
-
-    $ echo "grep –i where | wc -l" | utf8-unicode
-    0x67 = U+0067
-    0x72 = U+0072
-    0x65 = U+0065
-    0x70 = U+0070
-    0x20 = U+0020
-    0xE2 0x80 0x93 = U+2013
-    0x69 = U+0069
-    0x20 = U+0020
-    0x77 = U+0077
-    0x68 = U+0068
-    0x65 = U+0065
-    0x72 = U+0072
-    0x65 = U+0065
-    0x20 = U+0020
-    0x7C = U+007C
-    0x20 = U+0020
-    0x77 = U+0077
-    0x63 = U+0063
-    0x20 = U+0020
-    0x2D = U+002D
-    0x6C = U+006C
+### `strspan`
+
+The code for library functions `str_span()` and `str_cspan()`, which are
+related to, but different from `strspn()` and `strcspn()`.
+These functions are designed to be used repeatedly for the same set of
+searching.
+They precompute a table of which characters are to be matched.
+One of the functions `set_span()` or `set_ranges()` is used to
+initialize the precomputed data.
+They readily out-perform `strspn()` and `strcspn()` on moderate size
+searches.
+
+The distribution includes a timing program `test2.strspan` which can be
+run with a number files.
+It measures the time to read the files — processing them with
+`strlen()` and `strchr()` to warm up the cache and I/O buffers.
+It then runs `str_span()` and `str_cspan()` on the same files, and then
+`strspn()` and `strcspn()`.
+It can be effective to name the same file multiple times on the command
+line.
+
+Example use:
+
+    $ test2.strspan bible-be.txt bible-be.txt bible-be.txt
+    # NB: The tests for str_span and strspn are comparable
+    #     The tests for strlen and strchr are not comparable
+    strlen   0.187297 (4467663) bible-be.txt
+    strlen   0.186324 (4467663) bible-be.txt
+    strlen   0.187616 (4467663) bible-be.txt
+    strchr   0.182676 (4467663) bible-be.txt
+    strchr   0.185405 (4467663) bible-be.txt
+    strchr   0.184813 (4467663) bible-be.txt
+    str_span 0.195715 (4467663) bible-be.txt
+    str_span 0.199516 (4467663) bible-be.txt
+    str_span 0.194588 (4467663) bible-be.txt
+    strspn   0.347890 (4467663) bible-be.txt
+    strspn   0.346028 (4467663) bible-be.txt
+    strspn   0.347305 (4467663) bible-be.txt
+    $ test2.strspan great.panjandrum great.panjandrum great.panjandrum
+    # NB: The tests for str_span and strspn are comparable
+    #     The tests for strlen and strchr are not comparable
+    strlen   0.000046 (487) great.panjandrum
+    strlen   0.000031 (487) great.panjandrum
+    strlen   0.000030 (487) great.panjandrum
+    strchr   0.000036 (487) great.panjandrum
+    strchr   0.000030 (487) great.panjandrum
+    strchr   0.000030 (487) great.panjandrum
+    str_span 0.000035 (487) great.panjandrum
+    str_span 0.000032 (487) great.panjandrum
+    str_span 0.000031 (487) great.panjandrum
+    strspn   0.000061 (487) great.panjandrum
+    strspn   0.000052 (487) great.panjandrum
+    strspn   0.000053 (487) great.panjandrum
     $
 
-The Unicode EN DASH U+2013 was why that `grep` command was failing with
-an error about being unable to find the file `where`.
+These results show that `str_span()` and `str_cspan()` are marginally
+slower than using `strlen()` or `strchr(), but considerably quicker than
+use `strspn()` and `strcspn()`.
 
 ### `timecmd`
 
@@ -115,7 +139,7 @@ The code for `timecmd` which measures elapsed time of commands specified as part
 
 Example uses:
 
-    $  timecmd -m sleep 65
+    $ timecmd -m sleep 65
     2020-03-01 08:42:58.079 [PID 16916] sleep 65
     2020-03-01 08:44:03.086 [PID 16916; status 0x0000]  -  1m 5.007s
     $ timecmd -b -m sleep 65
@@ -135,5 +159,40 @@ Example uses:
 
 The sample commands all produced no output.  It works fine with commands that do.
 
+### `utf8-unicode`
+
+The code for `utf8-unicode` which processes named files or standard
+input as UTF-8 and prints the sequence of bytes that make up a single
+character and the Unicode character it corresponds to.
+
+For example:
+
+    $ echo "grep –i where | wc -l" | utf8-unicode
+    0x67 = U+0067
+    0x72 = U+0072
+    0x65 = U+0065
+    0x70 = U+0070
+    0x20 = U+0020
+    0xE2 0x80 0x93 = U+2013
+    0x69 = U+0069
+    0x20 = U+0020
+    0x77 = U+0077
+    0x68 = U+0068
+    0x65 = U+0065
+    0x72 = U+0072
+    0x65 = U+0065
+    0x20 = U+0020
+    0x7C = U+007C
+    0x20 = U+0020
+    0x77 = U+0077
+    0x63 = U+0063
+    0x20 = U+0020
+    0x2D = U+002D
+    0x6C = U+006C
+    $
+
+The Unicode EN DASH U+2013 was why that `grep` command was failing with
+an error about being unable to find the file `where`.
+
 Jonathan Leffler ([email protected])
-Sunday 1st March 2020
+Wednesday 18th March 2020
diff --git a/packages/strspan-1.03.tgz b/packages/strspan-1.03.tgz