rolling ppl with sliding window #2553

l-k-11235 · 2024-01-18T17:41:27Z

The wikitext2 perplexity calculation method is based on this Huggingface article:

It is calculated with a window size of max_seq_length = 4096 tokens. At each step, the window shifts by stride=512 tokens, and its first max_seq_length - stride. tokens are considered as context tokens. This means that their logits are not taken into account, allowing this rolling perplexity to be calculated without overlap.

I benchmarked llama2-7B with this config:

##############
# transforms #
##############
transforms: [sentencepiece]

###########
# Subword #
###########
src_subword_model: "llama/tokenizer.model"
tgt_subword_model: "llama/tokenizer.model"

#############
# Inference # 
#############

# GPU
world_size: 1
gpu_ranks: [0]
gpu: 0

seed: 42
max_length: 10
batch_type: sents
batch_size: 15

report_time: false
beam_size: 1
model: checkpoints/llama-2-7B_safetensors.pt
src: None

By running python3 run_wikitext-2_benchmark.py -config and

with fp16 precision

precision: fp16

I got a perplexity of 5.02

with fp16 precision

precision: fp16
quant_layers: ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
quant_type: "bnb_NF4"

I got a perplexity of 5.15

It is close to the score reported here ggml-org/llama.cpp#2352

l-k-11235 added 6 commits January 18, 2024 18:33

rolling ppl with window size 1000 and stride 512

10d9cb0

applied black

06bcac6

fixed flake error

de2a558

fixed tokenization

68c22af

zero out the context tokens

94da04c

fixed unit test error

e7207bb

l-k-11235 changed the title ~~rolling ppl with window size 1000 and stride 512~~ rolling ppl with sliding window Jan 23, 2024

some code cleaning

e5829db

vince62s merged commit 1c27987 into OpenNMT:master Jan 23, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rolling ppl with sliding window #2553

rolling ppl with sliding window #2553

l-k-11235 commented Jan 18, 2024 •

edited

Loading

rolling ppl with sliding window #2553

rolling ppl with sliding window #2553

Conversation

l-k-11235 commented Jan 18, 2024 • edited Loading

l-k-11235 commented Jan 18, 2024 •

edited

Loading