-
Notifications
You must be signed in to change notification settings - Fork 339
SPL:Text Handling
Text handling is an integral and indispensable part of any programming language. SPL (Structured Process Language) provides abundant functions for handling text. There are four categories – character handling, simple string handling, sequence-related handling, and complex analysis. They cover extracting a substring from a string, type judgment and conversion, character set handling, data type parsing, search and replacement, format handling, string generation/splitting/combination, the use of regular expression matching, extracting words and numbers, SQL statement analysis, a HTML analysis, etc.
Task | Code | Return value |
---|---|---|
Check if all characters are numbers | isdigit("12345") | true |
Check if all characters are letters | isalpha("abcde") | true |
Check if all characters are lower-case letters | islower("aBcd") | false |
Check if all characters are upper-case letters | isupper("ABCD") | true |
Task | Code | Return value |
---|---|---|
Convert all characters to lower-case | lower("aB’cD’e") | ab’cd’e |
Skip a string put in quotes when converting to lower-case | lower@q("aB’cD’e") | ab’cD’e |
Convert all characters to upper-case | upper("aB’cD’e") | AB’CD’E |
Skip a string put in quotes when converting to upper-case | upper@q("aB’cD’e") | AB’cD’E |
Convert row number and column number to an Excel-style cell name | cellname(8,32) | AF8 |
Task | Code | Return value |
---|---|---|
Get the Unicode value for the 2^nd^ character | asc("abc",2) | 98 |
Get the Unicode value for the 1^st^ character | asc("中国") | 20013 |
Get the corresponding character for the Unicode value | char(98) | B |
Get the corresponding character for the Unicode value | char(22269) | 国 |
Encode a URL string | urlencode("a=b+c>0?1:-1","UTF-8") | a%3Db%2Bc%3E0%3F1%3A-1 |
Decode an encoded URL string | urlencode@r("a%3Db%2Bc","UTF-8") | a=b+c |
Base64-encode a byte array | base64(bytes) | ixb7x1dxfbxefxff |
Convert to an MD5-encrypted string | md5("abcde") | AB56B4D92B40713ACC5AF89985D4B786 |
Parse a string value into one of the corresponding data type.
Task | Code | Return value |
---|---|---|
Parse a numeric string into a number | number("1234.56") | 1234.56 |
Parse a currency string into a number | number("$10,234.55","$#,###.#") | 10234.55 |
Parse a time format string into a time object | parse("10:20:30") | A time object |
Parse a sequence type string into a sequence | parse("[1,2,3]") | Sequence [1,2,3] |
Remove quotes and escape characters at parsing | parse@e(""a\tb"") | a b |
Only parse the numeric characters beginning from the left-most one | parse@n("123adsdx234") | 123 |
Parse until the second half the quotation marks appears when a given string begins with the quotation marks | parse@q("'sadsdxc'+23") | sadsdxc |
Task | Code | Return value |
---|---|---|
Convert a numeric value to a corresponding format string | string(12345.678,"#,##0.00") | 12,345.68 |
Convert a date value to a corresponding format string | string(date("2020-03-15"),"MM/dd/yyyy") | 03/15/2020 |
Convert a time value to a corresponding format string | string(time("16:18:54"),"h:mm a") | 4:18 PM |
Convert a datetime value to a corresponding format string | string(now(),"MM/dd/yyyy HH:mm") | 2020-05-28 14:06 |
Enclose a string with quotation marks | string@q("ab cd") | "ab cd" |
Escape a character | string@e("ab"c d") | ab"c\td |
Escape a character, and convert characters in a large character set to Unicode values | string@u("ab"c 中国") | ab"c\t\u4E2D\u56FD |
Parse a format specifier substring of a string according to the second parameter | format("The price is $%.2f",8.5) | The price is $8.50 |
Get a substring from a source string. Related functions are mid(), left() and right(). Below are some examples:
Task | Code | Return value |
---|---|---|
Get a substring from the 3^rd^ character | mid("abcdef",3) | cdef |
Get a substring of two characters from the 3^rd^ character | mid("abcdef",3,2) | cd |
Get a substring of the leftmost three characters | left("abcdef",3) | abc |
Get a substring by removing the last two characters | left("abcdef",-2) | abcd |
Get a substring of the rightmost three characters | right("abcdef",3) | def |
Get a substring by removing the first two characters | right("abcdef",-2) | cdef |
substr(s1,s2)
The function searches source string s1 for the target substring s2, and returns the substring before or after s2 or null if s2 cannot be found.
Task | Code | Return value |
---|---|---|
Get the substring after a given substring | substr("abCDcdef","cd") | ef |
Get the substring before a given substring | substr@l("abCDcdef","cd") | abCD |
Get the case-insensitive substring after a given substring | substr@c("abCDcdef","cd") | cdef |
Get a substring with a string put in quotes skipped | substr@q("ab"acd"cdef","cd") | ef |
pos(s1,s2{,begin})
The function finds the position of the target substring s2 in source string s1, and returns the original position of s2 in s1 or null if s2 cannot be found.
Task | Code | Return value |
---|---|---|
The ordinary search | pos("abcdef","aa") | null |
The ordinary search | pos("abcDedefgh","de") | 6 |
Perform the search from the 5^th^ character | pos("abcdedefgh","de",5) | 6 |
Perform the search backwards | pos@z("abcdedefgh","de") | 6 |
Perform a case-insensitive search | pos@c("abcDedefgh","de") | 4 |
Search for the target string at the beginning | pos@h("abcdefgh","ab") | 1 |
Search for the target string at the tail end | pos@hz("abcdefgh","fgh") | 6 |
Perform the search with a string put in quotes skipped | pos@q("ab"cde"fcd","cd") | 9 |
Check if the source string matches a given format string | like("abc123", "abc*") | true |
Case-insensitive for format string matching | like@c("abc123", "ABC*") | true |
replace(s,a,b)
The function replaces substring a in source string s with string b and returns the new s.
Task | Code | Return value |
---|---|---|
The ordinary replacement | replace("abc’ab’deA","a","ss") | ssbc'ssb’deA |
Replace the first-found target substring only | replace@1("abc’ab’deA","a","ss") | ssbc'ab’deA |
The case-insensitive replacement | replace@c("abc’ab’deA","a","ss") | ssbc'ssb’dess |
Perform the replacement with a string put in quotes skipped | replace@q("abc’ab’deA","a","ss") | ssbc'ab’deA |
Task | Code | Return value |
---|---|---|
Repeat a given string n times to generate a new string | fill("abc",4) | abcabcabcabc |
Delete white spaces at both ends of a given string | trim(" a bc ") | "a bc" |
Delete white spaces on the left of a given string | trim@l(" a bc ") | "a bc " |
Delete white spaces on the right of a given string | trim@r(" a bc ") | " a bc" |
Delete unwanted white spaces from a given string | trim@a(" a bc ") | "a bc" |
Patch a string before an existing one to extend the latter to the desired length | pad("Soth","Miss",10) | MissMiSoth |
Patch a string after an existing one to extend the latter to the desired length | pad@r("Soth","er",8) | Sotherer |
Generate a numeric string with the length of 1 randomly | rands("0123456789",11) | 44238061662 |
s.split(d)
The function splits string s into a sequence using delimiter d, or into a sequence of single characters when d is absent.
Task | Code | Return value |
---|---|---|
Split a given string using the vertical line | "aa | bb |
Split a given string using the comma | "1,[a,b],(2,c),'5,6'".split@c() | ["1","[a,b]","(2,c)",’5,6’] |
Split a given string into two parts by the first-found delimiter | "1,[a,b],(2,c),'5,6'".split@c1() | ["1","[a,b],(2,c),’5,6’"] |
Do not handle parentheses/brackets and quotation marks handling at splitting | "1,[a,b],(2,c),'5,6'".split@cb() | ["1","[a","b]","(2","c)","’5","6’"] |
Parse each member in the result sequence as an object | "1,[a,b],(2,c),'5,6'".split@cp() | [1,[a,b],"(2,c)","5,6"] |
Delete white spaces at both ends of each member in the result sequence | "192.168.0.3".split@t(".") | ["192","168","0","3"] |
Treat parameter d as a regular expression, and split the string, for instance, by digits | "a1b2c57d".split@r("\d") | ["a","b","c","","d"] |
Split a given string by line break, and then perform the ordinary or the option-enabled splitting on each line | "s,a,y\ngood,morning".split@nc() | "s","a","y"],["good","morning" |
A.concat(d)
The function concatenates members of sequence A into a string through delimiter d, during which the sub-sequences are handled in the same way, or joins them up directly when d is absent.
In the following example functions, A1 is [1, ["a","b"],[2,"c"]].
Task | Code | Return value |
---|---|---|
Concatenate members without a delimiter | A1.concat() | 1[ab][2c] |
Concatenate members using the colon | A1.concat(":") | 1:[a:b]:[2:c] |
Concatenate members using the comma | A1.concat@c() | 1,[a,b],[2,c] |
Enclose each member with double quotation marks at concatenation | A1.concat@q() | 1["a""b"][2"c"] |
Enclose each member with single quotation marks at concatenation | A1.concat@i() | 1[‘a’’b’][2’c’] |
With a two-level sequence, concatenate the outer level by line break and then perform concatenation on each sub-sequence | 1,2,3],["a","b"],[2,"c".concat@n("-") | 1-2-3 a-b 2-c |
Concatenate parameters into a string and do not enclose each result members with quotes | concat(2,["a","b"],"cd") | 2abcd |
Task | Code | Return value |
---|---|---|
Match a given string with the regular expression | "4,23,a,test".regex("(\d),([0-9]*),([a-z]),([a-z]*)") | ["4","23","a","test"] |
Match a given string with the regular expression | "4,23,a,test".regex("(\d),([a-z]),([0-9]*),([a-z]*)") | null |
Case-insensitive for the matching | "a:Test:B".regex@c("([a-z]):([a-z])") | ["a","T","t","B"] |
Learn more about regular expressions.
In the following example functions, s is "hi-10 hello!2020-01-01A8,3.14".
Task | Code | Return value |
---|---|---|
Extract English words | s.words() | ["hi","hello","A"] |
Extract digits strings | s.words@d() | ["10","2020","01","01","8","3","14"] |
Extract English words and strings of digits | s.words@a() | ["hi","10","hello","2020","01","01","A","8","3","14"] |
Extract all: words, strings of digits and other character strings | s.words@w() | ["hi","-","10"," ","hello","!","2020","-","01","-","01","A","8",",","3",".","14"] |
Extract all, where strings beginning with digits will be identified as numbers or datetimes as a whole | s.words@wp() | ["hi","-10"," ","hello","!","2020-01-01","A","8",",","3.14"] |
Treat continuous English letters and digits as a word | s.words@i() | ["hi","hello","A8"] |
sql.sqlparse(part)
The function splits a SQL query according to the order of select clause, from clause (including the join), group by clause, having clause and order by clause, make these syntactic units parameters members, and returns a sequence of them. When parameter part is present, replace the corresponding syntactic parameter in the source SQL query with it, and return a new SQL.
The sql in the following example functions is as follows:
select emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
from EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
where emp.EID<100 and dep.DEPT='R&D'
order by emp.SALARY,emp.EID
sql2:select * from dept
Task | Code | Return value |
---|---|---|
Get all syntactic units of a given SQL query | sql.sqlparse() | Omitted; please refer to Function Reference |
Get select clause | sql.sqlparse@s() | emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER |
Get from clause | sql.sqlparse@f() | EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT |
Get where clause | sql.sqlparse@w() | emp.EID < 100 and dep.DEPT='R&D' |
Get group by clause | sql.sqlparse@g() | null |
Get having clause | sql.sqlparse@h() | null |
Get order by clause | sql.sqlparse@o() | emp.SALARY,emp.EID |
Get syntactic units of a given SQL query and split each unit into a sequence of smaller ones | sql.sqlparse@a() | Omitted; please refer to Function Reference |
Replace select clause with the parameter | sql2.sqlparse@s("deptid") | select deptid from dept |
Replace from clause with the parameter | sql2.sqlparse@f("sales") | select * from sales |
Convert a SQL query to the counterpart in a specified database | sql.sqltranslate("MYSQL") | Omitted |
Node strings are key/value strings. In node strings in SPL, a key and its value are connected by the equals sign. Different nodes are separated by the white space.
xs.property(n,v)
The function reads value of the node named n from node string xs and returns the result. When string v is present, make v n’s value; and delete n when v is absent.
In the following example functions, xs is a set of key=value pairs: color=red size=20 price=500.
Task | Code | Return value |
---|---|---|
Get a two-column table sequence made up of all properties | xs.property() | Omitted |
Get value of a given property | xs.property("size") | "20" |
Get value of a given property and enclose it | xs.property@q("size") | ""20"" |
Get value of a given property and parse it into the corresponding data type | xs.property@v("size") | 20 |
Modify value of a given property | xs.property("size","30") | color=red size=30 price=500 |
Delete a given property | xs.property("size",null) | color=red price=500 |
Get value of a given property form node strings separated by semicolon | "color=red;size=30".property@cv("size") | 30 |
Get value of a given property form node strings where the property name and value are comma-separated | "color:red size:30".property@j("size") | "30" |
s.htmlparse(tag:i:j,…)
The function gets the j^th^ text under the i^th^ tag in HTML string s; or gets all text from s when all parameters are absent.
Below is the HTML string in the following example functions:
<div><div>aaa<div>bbb<span>ccc</span>ddd</div></div></div>
<table><tr><td>1</td><td>2</td><td>3</td></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
</table>
Task | Code | Return value |
---|---|---|
Get the 3^rd^ text under the 3^rd^ div tag | html.htmlparse("div":2:2) | ddd |
Get the 1^st^ text under the 2^nd^ div tag and the 1^st^ text under the 1^st^ span tag | html.htmlparse("div":1:0,"span":0:0) | [aaa,ccc] |
Get the content of the 1^st^ table tag | html.htmlparse("table":0) | 1,2,3],[a,b,c |
Learn more about text handling through string() functions.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code