You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am to generate these MRF files, which are very huge.
All the data is stored in Hive(ORC) and I am using pyspark to generate these file.
But as we need to construct one big json element , when all the aggregation is completed the data size is exceeding the spark limitation of 2 GB for a single column(IllegalArgumentException: Cannot grow BufferHolder, exceeds 2147483632 bytes).
In that case I had to divide the file into small subset each having a certain number of records
Wanted to see what other teams are using to generate these MRF files.
Please provide the insights on above or a reference to discussion that could be a starting point
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am to generate these MRF files, which are very huge.
All the data is stored in Hive(ORC) and I am using pyspark to generate these file.
But as we need to construct one big json element , when all the aggregation is completed the data size is exceeding the spark limitation of 2 GB for a single column(IllegalArgumentException: Cannot grow BufferHolder, exceeds 2147483632 bytes).
In that case I had to divide the file into small subset each having a certain number of records
Wanted to see what other teams are using to generate these MRF files.
Please provide the insights on above or a reference to discussion that could be a starting point
There was this discussion:
#595
But the video referenced in it was taken down. If anyone has a copy of the video that would be helpful
Thanks,
Ash
Beta Was this translation helpful? Give feedback.
All reactions