Enable column pruning for the MR execution engine #34

jphalip · 2022-12-30T03:33:27Z

The MR execution engine does not seem to provide a reliable value for the hive.io.file.readcolumn.names when multiple tables are read in the same query. So we can't properly support column pruning as we have to select all the columns (i.e. SELECT *).

This is unfortunately quite inefficient. Tez, however, does not have that issue.

See more info here: https://lists.apache.org/thread/g464zybq4g6c7p2h6nd9jmmznq472785

We need to investigate to see if we can come up with a workaround, or figure out how to get the subset of read columns from some property or variable.

Relevant part of the codebase here:

hive-bigquery-connector/connector/src/main/java/com/google/cloud/hive/bigquery/connector/input/BigQueryInputSplit.java

Lines 176 to 191 in af82dcc

    
           Set<String> selectedFields; 
        
           String engine = HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE); 
        
           if (engine.equals("mr")) { 
        
             // Unfortunately the MR engine does not provide a reliable value for the 
        
             // "hive.io.file.readcolumn.names" when multiple tables are read in the 
        
             // same query. So we have to select all the columns (i.e. `SELECT *`). 
        
             // This is unfortunately quite inefficient. Tez, however, does not have that issue. 
        
             // See more info here: https://lists.apache.org/thread/g464zybq4g6c7p2h6nd9jmmznq472785 
        
             // TODO: Investigate to see if we can come up with a workaround. Maybe try 
        
             //  using the new MapRed API (org.apache.hadoop.mapreduce) instead of the old 
        
             //  one (org.apache.hadoop.mapred)? 
        
             selectedFields = new HashSet<>(columnNames); 
        
           } else { 
        
             selectedFields = 
        
                 new HashSet<>(Arrays.asList(ColumnProjectionUtils.getReadColumnNames(jobConf))); 
        
           }

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable column pruning for the MR execution engine #34

Enable column pruning for the MR execution engine #34

jphalip commented Dec 30, 2022

Enable column pruning for the MR execution engine #34

Enable column pruning for the MR execution engine #34

Comments

jphalip commented Dec 30, 2022