-
Notifications
You must be signed in to change notification settings - Fork 339
SQL Query over File Examples
【Abstract】 This article illustrates scenarios of querying files directly in SQL and provides sample programs written with esProc SPL.
It’s convenient and efficient to manipulate database data in SQL. It’s natural to load data into the database and handle it in SQL when we trying to compute files. The problem is that data loading itself is not simple. It would be extremely convenient if there was a way to perform SQL query directly on files. Here I’ll bring in such a tool, esProc SPL, list various SQL file query scenarios and offers esProc example programs. esProc is a specialized data computing engine, and SPL, abbreviation of Structured Process Language used by esProc, boasts a complete set of SQL file query methods.
This article uses text files in all examples. In fact the methods also apply in Excel files.
Select records from a text file on a specified condition in SQL.
Example: Select scores of students in class 10 from students score table Students_scores.txt. In the file, the first row contains column names and detailed data begins from the second row:
A | |
---|---|
1 | $select * from E:/txt/Students_scores.txt where CLASS=10 |
Summarize data in a text file in SQL.
Example: Calculate the average Chinese score, the highest math score, and the total of English scores based on the student scores table.
A | |
---|---|
1 | $select avg(Chinese),max(Math),sum(English) from E:/txt/Students_scores.txt |
Perform inter-column calculations in a text file in SQL.
Example: Calculate the total score of each student in the student score table.
A | |
---|---|
1 | $select *,English+Chinese+Math as total_score from E:/txt/students_scores.txt |
Below is A1’s result, where a computed column total_score is added.
We can perform complicated conditional queries in SQL using CASE statement.
Example: Find whether or not the English result of each student in the student scores table is a Pass.
A | |
---|---|
1 | $select *, case when English>=60 then 'Pass' else 'Fail' end as English_evaluation from E:/txt/students_scores.txt |
Below is A1’s result, where a computed column English_evaluation is added.
Sort data in a text file in ascending (or descending) order in SQL.
Example: Sort the student scores table by class in ascending order and by total score in descending order.
A | |
---|---|
1 | $select * from E:/txt/students_scores.txt order by CLASS,English+Chinese+Math desc |
Get Top-N over a text file in SQL.
Example: Get the records of the three students who have the highest English scores.
A | |
---|---|
1 | $select top 3 * from E:/txt/students_scores.txt order by English desc |
Perform grouping & aggregation on data in a text file in SQL.
Example: Find the lowest English score, the highest Chinese score and the total math score in each class.
A | |
---|---|
1 | $select CLASS,min(English),max(Chinese),sum(Math) from E:/txt/students_scores.txt group by CLASS |
Group and summarize data in a text file and then perform filtering in SQL.
Example: Find the class where the average English score is below 70.
A | |
---|---|
1 | $select CLASS,avg(English) as avg_En from E:/txt/students_scores.txt group by CLASS having avg(English)<70 |
A1’s result.
Perform distinct operation over data in a text file in SQL.
Example: Get the IDs of all classes.
A | |
---|---|
1 | $select distinct CLASS from E:/txt/students_scores.txt |
Perform count after distinct operation on data in a text file in SQL.
Example: Based on the product information file, count the number of different kinds of products. Below is part of the file:
A | |
---|---|
1 | $select count(distinct PID) from E:/txt/PRODUCT_SALE.txt |
Group data in text file and perform count distinct on data in text file in SQL.
Example: Based on sales table, count the days when there are sales records for each kind of product.
A | |
---|---|
1 | $select PID,count(distinct DATE) as no_sdate from E:/txt/PRODUCT_SALE.txt group by PID |
Perform a join query over two text files in SQL.
Example: The product information and the sales information are stored in two text files respectively. We want to calculate the total sales amount of the products whose quantity per order is less than 10. Below are the files’ data structures and their relationship:
A | |
---|---|
1 | $select sum(S.quantity*P.Price) as total from E:/txt/Sales.txt as S join E:/txt/Products.txt as P on S.productid=P.ID where S.quantity<=10 |
Perform a join query over more than two text files in SQL.
Example: The state information, department information and employee information are stored in 3 different text files. We want to find the information of employees of California in HR department.
A | |
---|---|
1 | $select e.NAME as NAME from E:/txt/EMPLOYEE_J.txt as e join E:/txt/DEPARTMENT.txt as d on e.DEPTID=d.DEPTID join E:/txt/STATE.txt as s on e.STATEID=s.STATEID where d.NAME='HR' and s.NAME='California' |
Perform a multilevel join query over more than two text files in SQL.
Example: The state information, department information and employee information are stored in 3 different text files. We want to find the information of employees of New York state whose managers come from California.
A | |
---|---|
1 | $select e.NAME as ENAME from E:/txt/EMPLOYEE.txt as e join E:/txt/DEPARTMENT.txt as d on e.DEPT=d.NAME join E:/txt/EMPLOYEE.txt as emp on d.MANAGER=emp.EID where e.STATE='New York' and emp.STATE='California' |
Use very complicated SQL in a nested subquery.
Example: The employee information and the department information are stored in two text files. We want to find the department that has the youngest manager. Below are parts of the two files:
A | |
---|---|
1 | $select emp.BIRTHDAY as BIRTHDAY,emp.DEPT as DEPT from E:/txt/DEPARTMENT.txt as dept join E:/txt/EMPLOYEE.txt emp on dept.MANAGER=emp.EID where emp.BIRTHDAY=(select max(BIRTHDAY) from ( select emp1.BIRTHDAY as BIRTHDAY from E:/txt/DEPARTMENT.txt as dept1 join E:/txt/EMPLOYEE.txt as emp1 on dept1.MANAGER=emp1.EID ) ) |
Compute data in a text file using SQL WITH clause.
Example: Based on the same text files in the previous example, find the specified departments (HR, R&D, Sales) from the department table, and count the female employees and calculate the average salary in each of these departments.
A | |
---|---|
1 | $with A as (select NAME as DEPT from E:/txt/DEPARTMENT.txt where NAME='HR' or NAME='R&D' or NAME='Sales') select A.DEPT DEPT,count(*) NUM,avg(B.SALARY) AVG_SALARY from A left join E:/txt/EMPLOYEE.txt B on A.DEPT=B.DEPT where B.GENDER='F' group by A.DEPT |
Find more examples of agile computations in SPL CookBook.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code