Name		Name	Last commit message	Last commit date
parent directory ..
AWS_runner.ipynb		AWS_runner.ipynb
README.md		README.md
arguments.py		arguments.py
metrics_utils.py		metrics_utils.py
requirements.txt		requirements.txt
run_aux.py		run_aux.py
run_first_peer.py		run_first_peer.py
run_trainer.py		run_trainer.py
tokenize_wikitext103.py		tokenize_wikitext103.py

README.md

Training ALBERT with decentralized averaging

This tutorial will walk you through the steps to set up collaborative training with the ALBERT-large-v2 model and the WikiText-103 dataset. It uses Hugging Face datasets and transformers libraries to compute local updates and hivemind.CollaborativeOptimizer to exchange information between peers.

Preparation

Install hivemind (see main README)
Dependencies: pip install -r requirements.txt
Preprocess data: python tokenize_wikitext103.py
Run the coordinator

 HIVEMIND_THREADS=128 python ./run_first_peer.py --dht_listen_on [::]:SOME_PORT  \
 --experiment_prefix SOME_NAME --wandb_project YOUR_PROJECT

The coordinator will then print Running DHT root at COORDINATOR_IP_HERE:COORDINATOR_PORT_HERE, you will need these values to launch additional peers.

To start a GPU-enabled trainer, run

HIVEMIND_THREADS=128 python run_trainer.py \
--output_dir ./outputs_trainer --overwrite_output_dir  --logging_dir ./logs_trainer \
--logging_first_step --logging_steps 100   --initial_peers COORDINATOR_IP:COORDINATOR_PORT \
--experiment_prefix SOME_NAME --seed 42 --averaging_timeout 120  --bandwidth 1000

Instead of COORDINATOR_IP:COORDINATOR_PORT, you can specify any existing trainers in the same format.

To start an auxiliary CPU peer, run

HIVEMIND_THREADS=128 python run_aux.py  --output_dir ./outputs_aux \
--overwrite_output_dir   --logging_dir ./logs_aux --logging_first_step --logging_steps 100 \
--initial_peers COORDINATOR_IP:COORDINATOR_PORT   --experiment_prefix SOME_NAME --seed 42 \
--averaging_timeout 120 --fp16 False --bandwidth 1000

These peers do not contribute gradients, but assist others in gradient averaging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

albert

albert

README.md

Training ALBERT with decentralized averaging

Preparation

Files

albert

Directory actions

More options

Directory actions

More options

Latest commit

History

albert

Folders and files

parent directory

README.md

Training ALBERT with decentralized averaging

Preparation