Skip to content

Actions: microsoft/DeepSpeed

nv-accelerate-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
4,277 workflow run results
4,277 workflow run results

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Fix no-torch workflow and update real_accelerator
nv-accelerate-v100 #12533: Pull request #6885 opened by loadams
December 17, 2024 22:25 3h 10m 13s loadams/fix-real-accelerator-no-torch
December 17, 2024 22:25 3h 10m 13s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-accelerate-v100 #12531: Pull request #6803 synchronize by loadams
December 17, 2024 20:22 19m 35s nelyahu:zero2_param_idx
December 17, 2024 20:22 19m 35s
Add arctic model support by adding w2 to all_reduce
nv-accelerate-v100 #12530: Pull request #6856 synchronize by loadams
December 17, 2024 19:58 12m 54s pi314ever:arctic-enabling-upstream
December 17, 2024 19:58 12m 54s
Cleanup ops/transformer/inference tests
nv-accelerate-v100 #12529: Pull request #6830 synchronize by loadams
December 17, 2024 19:55 19m 49s loadams/transformers-inference
December 17, 2024 19:55 19m 49s
Inference ops unit test failures/fixes
nv-accelerate-v100 #12528: Pull request #6879 synchronize by loadams
December 17, 2024 19:54 11m 10s loadams/inference-ops-test-repro
December 17, 2024 19:54 11m 10s
Update transformers ops unit tests to use requried_torch_version
nv-accelerate-v100 #12527: Pull request #6884 synchronize by loadams
December 17, 2024 18:22 11m 41s loadams/fix-transformers-inference
December 17, 2024 18:22 11m 41s
Inference ops unit test failures/fixes
nv-accelerate-v100 #12524: Pull request #6879 synchronize by loadams
December 17, 2024 18:00 11m 20s loadams/inference-ops-test-repro
December 17, 2024 18:00 11m 20s
[inf] Add config var to enable keeping module on host
nv-accelerate-v100 #12522: Pull request #6846 synchronize by oelayan7
December 17, 2024 07:46 3m 51s oelayan7:keep_module_on_host
December 17, 2024 07:46 3m 51s
Fix error caused by all_reduce call in domino
nv-accelerate-v100 #12520: Pull request #6880 synchronize by hwchen2017
December 17, 2024 01:46 15m 34s hongwei/fix_domino_allreduce
December 17, 2024 01:46 15m 34s
Add arctic model support by adding w2 to all_reduce
nv-accelerate-v100 #12519: Pull request #6856 synchronize by tjruwase
December 17, 2024 01:35 11m 46s pi314ever:arctic-enabling-upstream
December 17, 2024 01:35 11m 46s
Fix checkpointable_layers Logic
nv-accelerate-v100 #12517: Pull request #6881 opened by Quentin-Anthony
December 17, 2024 00:11 13m 37s Quentin-Anthony:qanthony/fix-act-recomp
December 17, 2024 00:11 13m 37s
nv-accelerate-v100
nv-accelerate-v100 #12516: Scheduled
December 17, 2024 00:07 25m 32s master
December 17, 2024 00:07 25m 32s
Fix error caused by all_reduce call in domino
nv-accelerate-v100 #12515: Pull request #6880 synchronize by hwchen2017
December 16, 2024 23:50 12m 57s hongwei/fix_domino_allreduce
December 16, 2024 23:50 12m 57s
Fix error caused by all_reduce call in domino
nv-accelerate-v100 #12514: Pull request #6880 opened by hwchen2017
December 16, 2024 23:47 2m 45s hongwei/fix_domino_allreduce
December 16, 2024 23:47 2m 45s
Inference ops unit test failures/fixes
nv-accelerate-v100 #12513: Pull request #6879 opened by loadams
December 16, 2024 23:08 32m 59s loadams/inference-ops-test-repro
December 16, 2024 23:08 32m 59s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-accelerate-v100 #12512: Pull request #6803 synchronize by loadams
December 16, 2024 22:52 12m 22s nelyahu:zero2_param_idx
December 16, 2024 22:52 12m 22s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-accelerate-v100 #12511: Pull request #6863 synchronize by loadams
December 16, 2024 22:49 12m 27s akeshet:akeshet/pdsh_rank_log
December 16, 2024 22:49 12m 27s
Add the missing view operations from sequence parallel(async).
nv-accelerate-v100 #12510: Pull request #6750 synchronize by loadams
December 16, 2024 22:49 14m 38s inkcherry:ds_overlap_fix
December 16, 2024 22:49 14m 38s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-accelerate-v100 #12509: Pull request #6803 synchronize by loadams
December 16, 2024 22:15 6m 22s nelyahu:zero2_param_idx
December 16, 2024 22:15 6m 22s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-accelerate-v100 #12508: Pull request #6863 synchronize by loadams
December 16, 2024 21:28 11m 20s akeshet:akeshet/pdsh_rank_log
December 16, 2024 21:28 11m 20s
Support pure meta model lm_head tp
nv-accelerate-v100 #12505: Pull request #6812 synchronize by loadams
December 16, 2024 19:34 11m 36s Yejing-Lai:lyj/lm_head_replace
December 16, 2024 19:34 11m 36s