You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: Please remove the “.txt” suffix to convert it back to a Parquet file before proceeding.
Upload the aforementioned LZO-compressed Parquet file to HDFS for backup.
Launch spark-shell with Spark 3.5 + Blaze.
Enable the Blaze switch, read the Parquet file mentioned above, The query fails and throws an error as follows::
scala> spark.conf.set("spark.blaze.enable", true)
scala> val df = spark.read.parquet("hdfs://path/o/part-00000-7493e343-a159-4a2f-b69d-77cb68ac525f-c000.lz4.parquet")
scala> df.show()
...
25/01/17 17:01:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1) (tjtx16-35-27.58os.org executor 2): java.lang.RuntimeException: poll record batch error: Execution error: native execution panics: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[ParquetScan] error: Execution error: output_with_sender[ParquetScan]: output() returns error: Arrow error: External error: Arrow: Parquet argument error: External: the offset to copy is not contained in the decompressed buffer
at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:95)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:662)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:682)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Disable the Blaze switch, read the Parquet file mentioned above, the query succeeds and display the results, as follows:
Describe the bug
Issue with Reading LZO-Compressed Parquet File Using Spark 3.5 + Blaze
To Reproduce
Steps to reproduce the behavior:
The LZO-compressed Parquet file that reproduces the issue is attached, eg:
part-00000-7493e343-a159-4a2f-b69d-77cb68ac525f-c000.lz4.parquet.txt
Note: Please remove the “.txt” suffix to convert it back to a Parquet file before proceeding.
Upload the aforementioned LZO-compressed Parquet file to HDFS for backup.
Launch spark-shell with Spark 3.5 + Blaze.
Enable the Blaze switch, read the Parquet file mentioned above, The query fails and throws an error as follows::
Expected behavior
Screenshots
![Image](https://private-user-images.githubusercontent.com/15688792/404237404-a392c7fb-c811-4f52-8176-7040eeb013e3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxNzExMzIsIm5iZiI6MTczOTE3MDgzMiwicGF0aCI6Ii8xNTY4ODc5Mi80MDQyMzc0MDQtYTM5MmM3ZmItYzgxMS00ZjUyLTgxNzYtNzA0MGVlYjAxM2UzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDA3MDAzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJlMWIzM2Q5MTRlZmQwZDY2Y2YwYmM4MjYxZjQ0MjExNWZlZDM2M2E4MzBjNDc3YmQ3Yjg3MzdjYzE0YTQ2ODcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-N-Stuiy2q2b_3foORpBjs06XJ9xUKDkp-Ivkh4d-D0)
Enable the Blaze switch:
Disable the Blaze switch:
Additional context
Spark version: 3.5
The text was updated successfully, but these errors were encountered: