Improve HF download speed #1899

rasbt · 2025-01-07T18:53:24Z

Addresses the speed issue discussed in #1886. I think the issue is that the setting needs to happen before importing huggingface packages @Andrei-Aksionov .

Andrei-Aksionov · 2025-01-07T19:03:50Z

I think I tested the same approach and it didn't work.
I think the import happens somewhere even earlier.

⚡ rasbt-patch-4 ~/litgpt export repo_id=Qwen/Qwen2.5-0.5B-Instruct
⚡ rasbt-patch-4 ~/litgpt litgpt download $repo_id                 
Setting HF_HUB_ENABLE_HF_TRANSFER=1
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 659/659 [00:00<00:00, 7.81MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 242/242 [00:00<00:00, 1.32MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 988M/988M [01:35<00:00, 10.4MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:00<00:00, 12.0MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.30k/7.30k [00:00<00:00, 69.2MB/s]
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': PosixPath('checkpoints/Qwen/Qwen2.5-0.5B-Instruct'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Loading weights: model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 00:01<00:00, 76.08it/s
Saving converted checkpoint to checkpoints/Qwen/Qwen2.5-0.5B-Instruct
⚡ rasbt-patch-4 ~/litgpt export HF_HUB_ENABLE_HF_TRANSFER=1                   
⚡ rasbt-patch-4 ~/litgpt rm -rf checkpoints                
⚡ rasbt-patch-4 ~/litgpt litgpt download $repo_id          
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 659/659 [00:00<00:00, 9.15MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 242/242 [00:00<00:00, 1.54MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 988M/988M [00:02<00:00, 459MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:00<00:00, 22.1MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.30k/7.30k [00:00<00:00, 64.1MB/s]
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': PosixPath('checkpoints/Qwen/Qwen2.5-0.5B-Instruct'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Loading weights: model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 00:01<00:00, 75.90it/s
Saving converted checkpoint to checkpoints/Qwen/Qwen2.5-0.5B-Instruct

2 seconds vs 1 minute 35 seconds.

litgpt/scripts/download.py

rasbt · 2025-01-07T20:58:18Z

@Andrei-Aksionov thanks for checking, I will poke around a bit more

rasbt · 2025-01-07T22:38:30Z

Hm, this is weird:

With the code in main:

if I HF_HUB_ENABLE_HF_TRANSFER=1 it complaints that hf_transfer is not installed; when installing it then, the download goes fast as advertised.
If changing back to HF_HUB_ENABLE_HF_TRANSFER=0, it's still fast.
when running withHF_HUB_ENABLE_HF_TRANSFER=0 but hf_transfer uninstalled, it's back to slow
when reinstalling hf_transfer but keeping HF_HUB_ENABLE_HF_TRANSFER=0 it's still fast.

So I think all one needs to do is to add hf_transfer to the dependencies? And printing a message to suggest installing it if users don't have it already installed.

Improve HF download speed

7c27a1e

rasbt requested a review from lantiga as a code owner January 7, 2025 18:53

rasbt requested a review from Andrei-Aksionov January 7, 2025 18:53

rasbt commented Jan 7, 2025

View reviewed changes

litgpt/scripts/download.py Outdated Show resolved Hide resolved

Update litgpt/scripts/download.py

ce5d0fe

update pr

d1103db

rasbt merged commit a439338 into main Jan 8, 2025
8 of 9 checks passed

rasbt deleted the rasbt-patch-4 branch January 8, 2025 19:56

This was referenced Jan 8, 2025

Slow download from HuggingFace Hub (capped at 10.5 MB/s) #1886

Open

How to disable hf_transfer? #1835

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve HF download speed #1899

Improve HF download speed #1899

rasbt commented Jan 7, 2025

Andrei-Aksionov commented Jan 7, 2025

rasbt commented Jan 7, 2025

rasbt commented Jan 7, 2025

Improve HF download speed #1899

Improve HF download speed #1899

Conversation

rasbt commented Jan 7, 2025

Andrei-Aksionov commented Jan 7, 2025

rasbt commented Jan 7, 2025

rasbt commented Jan 7, 2025