GitHub

KBERT (Keller BERT) is an open-source project for training the pareto frontier of transformer encoder models.

Specifically, the goal is to produce a model which beats ModernBERT and DeBERTaV3 on both SQuAD v2.0 and MNLI with minimal compute.

Related Works

modded-nanogpt: Introduces substantial architectural and performance improvements to gpt2 training.
SpeedRunningESM2: Adapts modded-nanogpt to encoder model for protein structure prediction.

Quick Start

Setup environment, dependencies, and data

git clone https://github.com/lapp0/kbert && cd kbert
pip install -r requirements.txt
pip install --pre torch==2.7.0.dev20250110+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124 --upgrade

python data/download_fineweb_edu.py --num_chunks 120  # ~100M tokens / chunk
python data/download_mnli.py

export N_GPU=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)

Save HF credentials (only need to run once):

huggingface-cli login

Pretrain KBERT on Fineweb EDU with MLM objective

torchrun --standalone --nproc_per_node=$N_GPU pretrain.py \
  --train.hf_model_name PRETRAINED_HF_URI

Finetune KBERT on MNLI with sequence classification objective

torchrun --standalone --nproc_per_node=$N_GPU finetune.py \
  --train.base_model PRETRAINED_HF_URI
  --train.hf_model_name FINETUNED_HF_URI

Benchmarks

	KBERT-base	DeBERTa-v3-base	ModernBERT-base
Training Tokens	?	400 billion*	1.7 trillion
Metrics
MNLI	?	?	?
SQuAD v2.0	?	?	?
Parameters	150M	185M	150M
Encoder Parameters	106M	87M	111M
Embedding Parameters	39M	98M	39M

*Estimate is based on papers stated "160GB of data" @ 10 epochs, (estimated 4 bytes per token)

Name		Name	Last commit message	Last commit date
Latest commit History 845 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_trainer.py		base_trainer.py
dataloading.py		dataloading.py
finetune.py		finetune.py
model.py		model.py
optimizer.py		optimizer.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Related Works

Quick Start

Pretrain KBERT on Fineweb EDU with MLM objective

Finetune KBERT on MNLI with sequence classification objective

Benchmarks

About

Releases

Packages

Contributors 9

Languages

License

lapp0/kbert

Folders and files

Latest commit

History

Repository files navigation

Related Works

Quick Start

Pretrain KBERT on Fineweb EDU with MLM objective

Finetune KBERT on MNLI with sequence classification objective

Benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages