Generating MRF files using Pyspark #667

theAshP · 2023-04-11T20:17:54Z

theAshP
Apr 11, 2023

I am to generate these MRF files, which are very huge.

All the data is stored in Hive(ORC) and I am using pyspark to generate these file.

But as we need to construct one big json element , when all the aggregation is completed the data size is exceeding the spark limitation of 2 GB for a single column(IllegalArgumentException: Cannot grow BufferHolder, exceeds 2147483632 bytes).
In that case I had to divide the file into small subset each having a certain number of records

Wanted to see what other teams are using to generate these MRF files.

Please provide the insights on above or a reference to discussion that could be a starting point

There was this discussion:

#595

But the video referenced in it was taken down. If anyone has a copy of the video that would be helpful

Thanks,
Ash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating MRF files using Pyspark #667

{{title}}

Replies: 0 comments

Select a reply

Generating MRF files using Pyspark #667

theAshP Apr 11, 2023

Replies: 0 comments

theAshP
Apr 11, 2023