Skip to content

SPL:Text Handling

esProcSPL edited this page Jul 17, 2024 · 1 revision

Text handling is an integral and indispensable part of any programming language. SPL (Structured Process Language) provides abundant functions for handling text. There are four categories – character handling, simple string handling, sequence-related handling, and complex analysis. They cover extracting a substring from a string, type judgment and conversion, character set handling, data type parsing, search and replacement, format handling, string generation/splitting/combination, the use of regular expression matching, extracting words and numbers, SQL statement analysis, a HTML analysis, etc.

1. character handling

1.1 Type judgment

Task Code Return value
Check if all characters are numbers isdigit("12345") true
Check if all characters are letters isalpha("abcde") true
Check if all characters are lower-case letters islower("aBcd") false
Check if all characters are upper-case letters isupper("ABCD") true

1.2 Conversion

Task Code Return value
Convert all characters to lower-case lower("aB’cD’e") ab’cd’e
Skip a string put in quotes when converting to lower-case lower@q("aB’cD’e") ab’cD’e
Convert all characters to upper-case upper("aB’cD’e") AB’CD’E
Skip a string put in quotes when converting to upper-case upper@q("aB’cD’e") AB’cD’E
Convert row number and column number to an Excel-style cell name cellname(8,32) AF8

1.3 Character set handling

Task Code Return value
Get the Unicode value for the 2^nd^ character asc("abc",2) 98
Get the Unicode value for the 1^st^ character asc("中国") 20013
Get the corresponding character for the Unicode value char(98) B
Get the corresponding character for the Unicode value char(22269)
Encode a URL string urlencode("a=b+c>0?1:-1","UTF-8") a%3Db%2Bc%3E0%3F1%3A-1
Decode an encoded URL string urlencode@r("a%3Db%2Bc","UTF-8") a=b+c
Base64-encode a byte array base64(bytes) ixb7x1dxfbxefxff
Convert to an MD5-encrypted string md5("abcde") AB56B4D92B40713ACC5AF89985D4B786

1.4 Data type parsing

Parse a string value into one of the corresponding data type.

Task Code Return value
Parse a numeric string into a number number("1234.56") 1234.56
Parse a currency string into a number number("$10,234.55","$#,###.#") 10234.55
Parse a time format string into a time object parse("10:20:30") A time object
Parse a sequence type string into a sequence parse("[1,2,3]") Sequence [1,2,3]
Remove quotes and escape characters at parsing parse@e(""a\tb"") a b
Only parse the numeric characters beginning from the left-most one parse@n("123adsdx234") 123
Parse until the second half the quotation marks appears when a given string begins with the quotation marks parse@q("'sadsdxc'+23") sadsdxc

1.5 Format handling

Task Code Return value
Convert a numeric value to a corresponding format string string(12345.678,"#,##0.00") 12,345.68
Convert a date value to a corresponding format string string(date("2020-03-15"),"MM/dd/yyyy") 03/15/2020
Convert a time value to a corresponding format string string(time("16:18:54"),"h:mm a") 4:18 PM
Convert a datetime value to a corresponding format string string(now(),"MM/dd/yyyy HH:mm") 2020-05-28 14:06
Enclose a string with quotation marks string@q("ab cd") "ab cd"
Escape a character string@e("ab"c d") ab"c\td
Escape a character, and convert characters in a large character set to Unicode values string@u("ab"c 中国") ab"c\t\u4E2D\u56FD
Parse a format specifier substring of a string according to the second parameter format("The price is $%.2f",8.5) The price is $8.50

2. Simple string handling

2.1 Getting a substring

Get a substring from a source string. Related functions are mid(), left() and right(). Below are some examples:

Task Code Return value
Get a substring from the 3^rd^ character mid("abcdef",3) cdef
Get a substring of two characters from the 3^rd^ character mid("abcdef",3,2) cd
Get a substring of the leftmost three characters left("abcdef",3) abc
Get a substring by removing the last two characters left("abcdef",-2) abcd
Get a substring of the rightmost three characters right("abcdef",3) def
Get a substring by removing the first two characters right("abcdef",-2) cdef
substr(s1,s2)

The function searches source string s1 for the target substring s2, and returns the substring before or after s2 or null if s2 cannot be found.

Task Code Return value
Get the substring after a given substring substr("abCDcdef","cd") ef
Get the substring before a given substring substr@l("abCDcdef","cd") abCD
Get the case-insensitive substring after a given substring substr@c("abCDcdef","cd") cdef
Get a substring with a string put in quotes skipped substr@q("ab"acd"cdef","cd") ef

2.2 Search and matching

pos(s1,s2{,begin})

The function finds the position of the target substring s2 in source string s1, and returns the original position of s2 in s1 or null if s2 cannot be found.

Task Code Return value
The ordinary search pos("abcdef","aa") null
The ordinary search pos("abcDedefgh","de") 6
Perform the search from the 5^th^ character pos("abcdedefgh","de",5) 6
Perform the search backwards pos@z("abcdedefgh","de") 6
Perform a case-insensitive search pos@c("abcDedefgh","de") 4
Search for the target string at the beginning pos@h("abcdefgh","ab") 1
Search for the target string at the tail end pos@hz("abcdefgh","fgh") 6
Perform the search with a string put in quotes skipped pos@q("ab"cde"fcd","cd") 9
Check if the source string matches a given format string like("abc123", "abc*") true
Case-insensitive for format string matching like@c("abc123", "ABC*") true

2.3 Replacement

replace(s,a,b)

The function replaces substring a in source string s with string b and returns the new s.

Task Code Return value
The ordinary replacement replace("abc’ab’deA","a","ss") ssbc'ssb’deA
Replace the first-found target substring only replace@1("abc’ab’deA","a","ss") ssbc'ab’deA
The case-insensitive replacement replace@c("abc’ab’deA","a","ss") ssbc'ssb’dess
Perform the replacement with a string put in quotes skipped replace@q("abc’ab’deA","a","ss") ssbc'ab’deA

2.4 Others

Task Code Return value
Repeat a given string n times to generate a new string fill("abc",4) abcabcabcabc
Delete white spaces at both ends of a given string trim(" a bc ") "a bc"
Delete white spaces on the left of a given string trim@l(" a bc ") "a bc "
Delete white spaces on the right of a given string trim@r(" a bc ") " a bc"
Delete unwanted white spaces from a given string trim@a(" a bc ") "a bc"
Patch a string before an existing one to extend the latter to the desired length pad("Soth","Miss",10) MissMiSoth
Patch a string after an existing one to extend the latter to the desired length pad@r("Soth","er",8) Sotherer
Generate a numeric string with the length of 1 randomly rands("0123456789",11) 44238061662

3. Sequence-related handling

3.1 Splitting

s.split(d)

The function splits string s into a sequence using delimiter d, or into a sequence of single characters when d is absent.

Task Code Return value
Split a given string using the vertical line "aa bb
Split a given string using the comma "1,[a,b],(2,c),'5,6'".split@c() ["1","[a,b]","(2,c)",’5,6’]
Split a given string into two parts by the first-found delimiter "1,[a,b],(2,c),'5,6'".split@c1() ["1","[a,b],(2,c),’5,6’"]
Do not handle parentheses/brackets and quotation marks handling at splitting "1,[a,b],(2,c),'5,6'".split@cb() ["1","[a","b]","(2","c)","’5","6’"]
Parse each member in the result sequence as an object "1,[a,b],(2,c),'5,6'".split@cp() [1,[a,b],"(2,c)","5,6"]
Delete white spaces at both ends of each member in the result sequence "192.168.0.3".split@t(".") ["192","168","0","3"]
Treat parameter d as a regular expression, and split the string, for instance, by digits "a1b2c57d".split@r("\d") ["a","b","c","","d"]
Split a given string by line break, and then perform the ordinary or the option-enabled splitting on each line "s,a,y\ngood,morning".split@nc() "s","a","y"],["good","morning"

3.2 Concatenation

A.concat(d)

The function concatenates members of sequence A into a string through delimiter d, during which the sub-sequences are handled in the same way, or joins them up directly when d is absent.

In the following example functions, A1 is [1, ["a","b"],[2,"c"]].

Task Code Return value
Concatenate members without a delimiter A1.concat() 1[ab][2c]
Concatenate members using the colon A1.concat(":") 1:[a:b]:[2:c]
Concatenate members using the comma A1.concat@c() 1,[a,b],[2,c]
Enclose each member with double quotation marks at concatenation A1.concat@q() 1["a""b"][2"c"]
Enclose each member with single quotation marks at concatenation A1.concat@i() 1[‘a’’b’][2’c’]
With a two-level sequence, concatenate the outer level by line break and then perform concatenation on each sub-sequence 1,2,3],["a","b"],[2,"c".concat@n("-") 1-2-3
a-b
2-c
Concatenate parameters into a string and do not enclose each result members with quotes concat(2,["a","b"],"cd") 2abcd

4. Complex analysis

4.1 Regular expression matching

Task Code Return value
Match a given string with the regular expression "4,23,a,test".regex("(\d),([0-9]*),([a-z]),([a-z]*)") ["4","23","a","test"]
Match a given string with the regular expression "4,23,a,test".regex("(\d),([a-z]),([0-9]*),([a-z]*)") null
Case-insensitive for the matching "a:Test:B".regex@c("([a-z]):([a-z])") ["a","T","t","B"]

Learn more about regular expressions.

4.2 Word and number extraction

In the following example functions, s is "hi-10 hello!2020-01-01A8,3.14".

Task Code Return value
Extract English words s.words() ["hi","hello","A"]
Extract digits strings s.words@d() ["10","2020","01","01","8","3","14"]
Extract English words and strings of digits s.words@a() ["hi","10","hello","2020","01","01","A","8","3","14"]
Extract all: words, strings of digits and other character strings s.words@w() ["hi","-","10"," ","hello","!","2020","-","01","-","01","A","8",",","3",".","14"]
Extract all, where strings beginning with digits will be identified as numbers or datetimes as a whole s.words@wp() ["hi","-10"," ","hello","!","2020-01-01","A","8",",","3.14"]
Treat continuous English letters and digits as a word s.words@i() ["hi","hello","A8"]

4.3 SQL statement analysis

sql.sqlparse(part)

The function splits a SQL query according to the order of select clause, from clause (including the join), group by clause, having clause and order by clause, make these syntactic units parameters members, and returns a sequence of them. When parameter part is present, replace the corresponding syntactic parameter in the source SQL query with it, and return a new SQL.

The sql in the following example functions is as follows:

select emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
    from EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
    where emp.EID<100 and dep.DEPT='R&D'
    order by emp.SALARY,emp.EID

sql2:select * from dept

Task Code Return value
Get all syntactic units of a given SQL query sql.sqlparse() Omitted; please refer to Function Reference
Get select clause sql.sqlparse@s() emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
Get from clause sql.sqlparse@f() EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
Get where clause sql.sqlparse@w() emp.EID < 100 and dep.DEPT='R&D'
Get group by clause sql.sqlparse@g() null
Get having clause sql.sqlparse@h() null
Get order by clause sql.sqlparse@o() emp.SALARY,emp.EID
Get syntactic units of a given SQL query and split each unit into a sequence of smaller ones sql.sqlparse@a() Omitted; please refer to Function Reference
Replace select clause with the parameter sql2.sqlparse@s("deptid") select deptid from dept
Replace from clause with the parameter sql2.sqlparse@f("sales") select * from sales
Convert a SQL query to the counterpart in a specified database sql.sqltranslate("MYSQL") Omitted

4.4 Node string handling

Node strings are key/value strings. In node strings in SPL, a key and its value are connected by the equals sign. Different nodes are separated by the white space.

xs.property(n,v)

The function reads value of the node named n from node string xs and returns the result. When string v is present, make v n’s value; and delete n when v is absent.

In the following example functions, xs is a set of key=value pairs: color=red size=20 price=500.

Task Code Return value
Get a two-column table sequence made up of all properties xs.property() Omitted
Get value of a given property xs.property("size") "20"
Get value of a given property and enclose it xs.property@q("size") ""20""
Get value of a given property and parse it into the corresponding data type xs.property@v("size") 20
Modify value of a given property xs.property("size","30") color=red size=30 price=500
Delete a given property xs.property("size",null) color=red price=500
Get value of a given property form node strings separated by semicolon "color=red;size=30".property@cv("size") 30
Get value of a given property form node strings where the property name and value are comma-separated "color:red size:30".property@j("size") "30"

4.5 HTML string handling

s.htmlparse(tag:i:j,…)

The function gets the j^th^ text under the i^th^ tag in HTML string s; or gets all text from s when all parameters are absent.

Below is the HTML string in the following example functions:

<div><div>aaa<div>bbb<span>ccc</span>ddd</div></div></div>
<table><tr><td>1</td><td>2</td><td>3</td></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
</table>
Task Code Return value
Get the 3^rd^ text under the 3^rd^ div tag html.htmlparse("div":2:2) ddd
Get the 1^st^ text under the 2^nd^ div tag and the 1^st^ text under the 1^st^ span tag html.htmlparse("div":1:0,"span":0:0) [aaa,ccc]
Get the content of the 1^st^ table tag html.htmlparse("table":0) 1,2,3],[a,b,c

Learn more about text handling through string() functions.

Clone this wiki locally