Skip to content

Conventional operations of esProc SPL on structured text files

esProcSPL edited this page Feb 20, 2025 · 1 revision

Structured text files refer to TXT files separated by tabs or CSV files separated by commas, which can be seen as data tables. Using esProc SPL makes it easy to perform common SQL-like operations on such files.

Assuming that the first row of the file is the field title.

A B
1 =T("scores.txt")
2 =A1.select(CLASS==2) // filter
3 =A1.avg(Chinese)
4 =A1.max(Math)
5 =A1.sum(English)
6 =A1.derive(English+Chinese+Math:total_score) // derive a total_score column
7 =A1.derive(if(Chinese>=90:"A",Chinese>=80:"B",Chinese>=60:"C";"D"):Chinese_evaluation) // evaluate Chinese score
8 =A1.sort(English)
9 =A1.sort(CLASS,-Math)
10 =A1.groups(CLASS;min(English),max(Chinese),avg(Math))
11 =A1.groups(CLASS;avg(English):avg_En).select(avg_En<85)
12 =A1.top(-3;English) // Top 3 high English students
13 =A1.groups(CLASS;top(3,English)) // Top 3 low English value in CLASS
14 =A1.id(CLASS) // distinct CLASS
15 =A1.icount(CLASS) // count distinct CLASS
16 =A1.groups(CLASS;icount(English)) // count distinct English every CLASS
17
18 =T("sales.csv")
19 =T("product.csv").keys(ID) // ID as key
20 =A18.switch(ProductID,A19:ID) // ProductID join ID
21 =A20.derive(Quantity*ProductID.Price:amount)
22
23 =T("sales.csv")
24 =A23.join(ProductID,A19:ID,Name,Price) // import Name, Price join on ProductID=ID
25 =A24.derive(Quantity*Price:amount)

https://try.esproc.com/splx?2Xx

A1 reads the student score data scores.txt as a table sequence, with column names in the first row and data starting from the second row, as shown in the following figure.

A2 selects the data records of class 2;

A3 calculates the average score of Chinese;

A4 calculates the highest score of mathematics;

A5 calculates the total score of English;

A6 adds a new column called 'total score' in A1, which represents the sum of three columns: English, Chinese, and Mathematics;

A7 adds a column called Chinese_evaluation in A1. When the Chinese score is above 90, it is rated as A; when it is above 80, it is rated as B; when it is above 60, it is rated as C; otherwise, it is rated as D.

A8: Sort in ascending order of English scores;

A9: Sort in ascending order by class ID first, and then sort in descending order by math score within the class;

A10: Group by class and calculate the lowest English score, highest Chinese score, and average math score for each class;

A11: Group by class, calculate the average English score for each class, and select the class with an average score below 85;

A12: After sorting in descending order of English, retrieve the records of top 3 students with the highest English scores;

A13: Group by class, sort the English in ascending order for each class, and get the top 3 lowest English scores after deduplication;

A14: Identify all non-duplicate class IDs;

A15: Identify the number of unique class IDs;

A16: Group by class and find out the number of non-repeating English scores in each class;

A18: Read the sales data table sales.csv as a table sequence.

A19: Read the product data table product.csv as a table sequence and set ID as the primary key.

The structure of the two tables is as follows:

A20 uses the switch function to associate the ProductID in A18 with the ID in A19 (if the ID is the primary key, it can also be omitted). At this point, the ProductID column is converted to point to the corresponding product record. If some data is extracted from A20, it will be as shown in the following figure

In A21, a new column named 'amount' has been added in A20, which is the product of the sales quantity Quantity and the product price Price; the expression ' ProductID.Price ' represents the value of the Price column of the record pointed to by the ProductID column.

A23: Reread the sales data table sales.csv as a table sequence;

A24 uses the join function to associate the ProductID in A23 with the ID in A19 (if the ID is the primary key, it can also be omitted), while introducing data from the Name and Price columns in A19. If some data from A24 is extracted, it will be as shown as in the following figure:

A25: A24 adds a column called 'amount', which is the product of the sales quantity Quantity and the product price Price.

Clone this wiki locally