Determine optimal amount of LM adaptation needed #6

slippylolo · 2021-11-04T17:15:42Z

Description

To optimize zero-shot performance, we are taking our MLM models through LM adaptation (see #5). For now, we are considering doing this for ~10% of the pre-training steps (around ~3GT). This is the same setup as T5, but it's pretty arbitrary.

Ideally, we should explore what's the optimal ratio of MLM to (C)LM training: 10%, 20%, 5%, 40%? For a fixed number of tokens (~30GT), we should plot the end-task performance at different ratio of MLM to CLM training. That will give an idea of the optimum is there is one.

Note that this is a nice-to-have, that we should only pursue if we have enough compute budget/bandwidth.

slippylolo added 🧪 Experiment ✨ Nice-to-have labels Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine optimal amount of LM adaptation needed #6

Determine optimal amount of LM adaptation needed #6

slippylolo commented Nov 4, 2021

Determine optimal amount of LM adaptation needed #6

Determine optimal amount of LM adaptation needed #6

Comments

slippylolo commented Nov 4, 2021

Description