Datasets to add #61

KennethEnevoldsen · 2024-01-15T20:37:39Z

KennethEnevoldsen · 2024-01-15T20:40:24Z

@x-tabdeveloping I believe these datesets mostly cover what we want in SEB? Let me know if there is anything specifically that we are missing.

x-tabdeveloping · 2024-01-16T10:05:09Z

I'm sure we can get some Bitext mining task for Swedish from OPUS like swedish-norwegian or swedish-danish, I can do that if need be.

x-tabdeveloping · 2024-01-16T10:06:30Z

Could we perchance use (aka. scrape or find) DBA entries and categories for Danish clustering?

KennethEnevoldsen · 2024-01-24T15:55:28Z

Could we perchance use (aka. scrape or find) DBA entries and categories for Danish clustering?

We could, I am unsure if we could share the dataset though.

KennethEnevoldsen · 2024-01-25T08:02:27Z

A simpler solution might be to simply use dagw domains.

KennethEnevoldsen · 2024-01-28T15:52:54Z

I have split up all the datasets to add into issues for now. Unless we specifically want to add some new dataset I believe we can close this issue.

KennethEnevoldsen · 2024-01-28T16:23:08Z

KennethEnevoldsen · 2024-01-28T16:28:11Z

KennethEnevoldsen · 2024-02-05T15:56:21Z

Seems like we have all in the list. We might add a few other datasets but we can make custom issue for those

x-tabdeveloping · 2024-02-06T07:11:19Z

jolly, let's close this

x-tabdeveloping mentioned this issue Jan 23, 2024

Ensure variety in datasets #32

Closed

KennethEnevoldsen added the dataset new dataset to add label Jan 25, 2024

x-tabdeveloping closed this as completed Feb 6, 2024

Provide feedback