You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper you claimed that there are 50K web_cc OCR data for each language, so that there should be 500K data in total, but the released version of PangeaInstruct have only 300K data in total, is there a size mismatch? Or there are data that you keep as private?
Thank you!
The text was updated successfully, but these errors were encountered:
In the paper you claimed that there are 50K web_cc OCR data for each language, so that there should be 500K data in total, but the released version of PangeaInstruct have only 300K data in total, is there a size mismatch? Or there are data that you keep as private?
Thank you!
The text was updated successfully, but these errors were encountered: