Ablation #2

Jeronymous · 2024-06-26T08:58:57Z

No description provided.

Jeronymous · 2024-06-26T09:21:04Z

ablation/perplexity.py

+            try:
+                loss = model[0].eval_batch(data_iterator) # average loss per sample per microbatch
+                # difficult to know if it is the right way to get the total loss
+                loss = loss * args.micro_batch_size * args.seq_length # losses per token


Why do you want a total loss, and not an average loss ?

I am not sure micro_batch_size is the correct one : this is the batch size per GPU, the effective batch size is macro_batch_size.

I would suggest to save the average loss per token AND the total number of tokens in the dataset (separately).
So that we can chose between the stats (average / total) and make checks based on the numbers of tokens.

To clarify, I suggest to use

loss = model[0].eval_batch(data_iterator) loss_dicts = [{'lm loss' : loss, 'num_batches' : 1}]

and to aggregate losses and number of batches where relevant (I think it's around line 417).
Then at the really end to normalize using the number of batches.

Jeronymous · 2024-06-26T10:34:41Z

ablation/perplexity.py

+    if is_last_rank():
+
+        val_loss = total_loss_dict['lm loss'].item() / (num_tokenized_tokens - 1)


Aussi il me semble (je peux avoir tord) que "is_last_rank" n'est True que sur un GPU en cas de multi-GPU.
Ce qui voudrait dire qu'en multi-GPU, on ignorerait les résultats sur "n-1" GPUs ?

thib-s · 2024-06-27T13:14:27Z

ablation/perplexity.py

+    if is_last_rank():
+
+        val_loss = total_loss_dict['lm loss'].item() / (num_tokenized_tokens - 1)
+        ppl = math.exp(min(20, val_loss))


Suggested change

ppl = math.exp(min(20, val_loss))

dist.all_reduce(val_loss, op=ReduceOp.SUM) # mean reduction is not supported

dist.all_reduce(ppl, op=ReduceOp.SUM)

dist.all_reduce(adjusted_ppl, op=ReduceOp.SUM)

dist.all_reduce(token_ratio, op=ReduceOp.SUM)

val_loss = val_loss / NB_SHARDS

token_ratio = token_ratio / NB_SHARDS

ppl = math.exp(min(20, val_loss))

adjusted_ppl = math.exp(min(20, val_loss * token_ratio))

Thanks, I'll try it out, hope it solves the synchronization problem ;)

…ns' weights

- Add datasets: Pile (WIP) and Stac (tiny). - Improve a bit folder organization. - Add zstandard in requirements (to read datasets in .jsonl.zst format)

…berg)

…ough for that

… (unfinished)

Jeronymous commented Jun 26, 2024

View reviewed changes

thib-s reviewed Jun 27, 2024

View reviewed changes

lucashervier force-pushed the ablation branch from cca8762 to 9d41b9f Compare June 28, 2024 12:59

thib-s force-pushed the ablation branch 2 times, most recently from 3c1de29 to f60db54 Compare June 28, 2024 13:50

Agustin-Picard force-pushed the ablation branch from d7b6a0d to 9c83ef1 Compare August 19, 2024 16:06

lucashervier and others added 23 commits August 21, 2024 16:51

feat: add perplexity scripts

e20cc7d

ablation: update data collection script to adapt to the new domains

10a39d2

ablation: introduce the domain proportions in yaml format

bd8586a

ablation: update script to call the right python script for the domai…

2da1c99

…ns' weights

ablation: modify eval function to return results dict instead of file

2bd1627

ablation: turn off DP as it was causing trouble during GPU sync

cba8dc9

- Add json files with raw statistics.

226e131

- Add datasets: Pile (WIP) and Stac (tiny). - Improve a bit folder organization. - Add zstandard in requirements (to read datasets in .jsonl.zst format)

Add category columns to dataset

a693922

Fix Pile categories

a8d6a75

Update stats (fix CroissantAligned/detailed, update Gallica and Guten…

a2eee0c

…berg)

Discard MathPile validation set for training

2526f63

add counts for categories, ocr, lang

f7233a9

Update token statistics, and statistics about YouTube

5d85e05

Breakdown Pile and OtherFr

899745a

Update categories

da8f62a

feat: update for domain and language

7f878af

fix: minor mistakes with paths and unused variables

9255142

feat: change the domain proportion yaml file to integrate the language

70e50f7

ablation: Fix DP in PPL evaluation

b7c3031

feat: agus add distributing arguments fr correct computation

a3be345

fix: the domain proportion for programming language was not normalized

1e0ee64

feat: add a slurm file for launching ablation for the 80M model

e015a85

fixup: forget a reflink to home

06b4483

lucashervier and others added 18 commits August 21, 2024 16:51

fix: a broken link

00d2c49

fix: remove a print which break the script with CulturaX

646247f

feat: automatically set the output path to the user work directory

e74fe67

feat: add a slurm training job for 80 M parameters model

900f230

fix: wrong print rank

c7c1f3e

feat: improve ablation study experiments management

7a779d3

chore: ease the launch of multiple experiments

49fde71

fix: typo mistake

a0c50e8

fix: update the splits so there is no empty datasets

4bc9cdf

fix: write numpy array in the csv instead of torch tensor

9a4fade

feat: avoid the drop last batch as some test samples are not large en…

2ecd99e

…ough for that

feat: change the data loader

ec35b26

fix: wrong wild card to move the outputs files

1438cd8

feat: add the tokenizers parallelism variable

4c9bba5

ablation: update bash scripts to point to correct folders (for me)

03af559

ablation: update data config file to contain a weight for web datasets

8058464

ablation: initialize a script to analyze results from the experiments…

f904967

… (unfinished)

chore: clean-up a conflict from the rebase

8b5cec0

Agustin-Picard force-pushed the ablation branch from ea89965 to 8b5cec0 Compare August 21, 2024 14:53

Jeronymous force-pushed the master branch 6 times, most recently from 7a1f6de to e3da1de Compare October 16, 2024 18:37

Jeronymous force-pushed the master branch from bfa03ec to ba9cf96 Compare November 6, 2024 17:40

Jeronymous force-pushed the master branch 3 times, most recently from 4a9f8dc to 7a8da24 Compare December 10, 2024 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ablation #2

Ablation #2

Jeronymous commented Jun 26, 2024

Jeronymous Jun 26, 2024 •

edited

Loading

Jeronymous Jun 26, 2024 •

edited

Loading

Jeronymous Jun 26, 2024 •

edited

Loading

thib-s Jun 27, 2024

Agustin-Picard Jun 27, 2024

		if is_last_rank():

		val_loss = total_loss_dict['lm loss'].item() / (num_tokenized_tokens - 1)

-        ppl = math.exp(min(20, val_loss))
+        dist.all_reduce(val_loss, op=ReduceOp.SUM) # mean reduction is not supported
+        dist.all_reduce(ppl, op=ReduceOp.SUM)
+        dist.all_reduce(adjusted_ppl, op=ReduceOp.SUM)
+        dist.all_reduce(token_ratio, op=ReduceOp.SUM)
+        val_loss = val_loss / NB_SHARDS
+        token_ratio = token_ratio / NB_SHARDS
+        ppl = math.exp(min(20, val_loss))
+        adjusted_ppl = math.exp(min(20, val_loss * token_ratio))

Ablation #2

Are you sure you want to change the base?

Ablation #2

Conversation

Jeronymous commented Jun 26, 2024

Jeronymous Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Jeronymous Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Jeronymous Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

thib-s Jun 27, 2024

Choose a reason for hiding this comment

Agustin-Picard Jun 27, 2024

Choose a reason for hiding this comment

Jeronymous Jun 26, 2024 •

edited

Loading

Jeronymous Jun 26, 2024 •

edited

Loading

Jeronymous Jun 26, 2024 •

edited

Loading