Skip to content

Actions: microsoft/DeepSpeed

nv-lightning-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
4,374 workflow run results
4,374 workflow run results

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Fix no-torch workflow and update real_accelerator
nv-lightning-v100 #13795: Pull request #6885 opened by loadams
December 17, 2024 22:25 6m 43s loadams/fix-real-accelerator-no-torch
December 17, 2024 22:25 6m 43s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-lightning-v100 #13793: Pull request #6803 synchronize by loadams
December 17, 2024 20:22 7m 7s nelyahu:zero2_param_idx
December 17, 2024 20:22 7m 7s
Add arctic model support by adding w2 to all_reduce
nv-lightning-v100 #13792: Pull request #6856 synchronize by loadams
December 17, 2024 19:58 5m 47s pi314ever:arctic-enabling-upstream
December 17, 2024 19:58 5m 47s
Cleanup ops/transformer/inference tests
nv-lightning-v100 #13791: Pull request #6830 synchronize by loadams
December 17, 2024 19:55 6m 57s loadams/transformers-inference
December 17, 2024 19:55 6m 57s
Inference ops unit test failures/fixes
nv-lightning-v100 #13790: Pull request #6879 synchronize by loadams
December 17, 2024 19:54 5m 46s loadams/inference-ops-test-repro
December 17, 2024 19:54 5m 46s
Inference ops unit test failures/fixes
nv-lightning-v100 #13786: Pull request #6879 synchronize by loadams
December 17, 2024 18:00 6m 46s loadams/inference-ops-test-repro
December 17, 2024 18:00 6m 46s
[inf] Add config var to enable keeping module on host
nv-lightning-v100 #13784: Pull request #6846 synchronize by oelayan7
December 17, 2024 07:46 6m 29s oelayan7:keep_module_on_host
December 17, 2024 07:46 6m 29s
Fix error caused by all_reduce call in domino
nv-lightning-v100 #13782: Pull request #6880 synchronize by hwchen2017
December 17, 2024 01:46 7m 9s hongwei/fix_domino_allreduce
December 17, 2024 01:46 7m 9s
Add arctic model support by adding w2 to all_reduce
nv-lightning-v100 #13781: Pull request #6856 synchronize by tjruwase
December 17, 2024 01:35 7m 2s pi314ever:arctic-enabling-upstream
December 17, 2024 01:35 7m 2s
nv-lightning-v100
nv-lightning-v100 #13779: Scheduled
December 17, 2024 00:22 1h 18m 55s master
December 17, 2024 00:22 1h 18m 55s
Fix checkpointable_layers Logic
nv-lightning-v100 #13778: Pull request #6881 opened by Quentin-Anthony
December 17, 2024 00:11 10m 41s Quentin-Anthony:qanthony/fix-act-recomp
December 17, 2024 00:11 10m 41s
Fix error caused by all_reduce call in domino
nv-lightning-v100 #13777: Pull request #6880 synchronize by hwchen2017
December 16, 2024 23:50 18m 46s hongwei/fix_domino_allreduce
December 16, 2024 23:50 18m 46s
Fix error caused by all_reduce call in domino
nv-lightning-v100 #13776: Pull request #6880 opened by hwchen2017
December 16, 2024 23:47 3m 2s hongwei/fix_domino_allreduce
December 16, 2024 23:47 3m 2s
Inference ops unit test failures/fixes
nv-lightning-v100 #13775: Pull request #6879 opened by loadams
December 16, 2024 23:08 8m 20s loadams/inference-ops-test-repro
December 16, 2024 23:08 8m 20s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-lightning-v100 #13774: Pull request #6803 synchronize by loadams
December 16, 2024 22:52 5m 48s nelyahu:zero2_param_idx
December 16, 2024 22:52 5m 48s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-lightning-v100 #13773: Pull request #6863 synchronize by loadams
December 16, 2024 22:49 6m 48s akeshet:akeshet/pdsh_rank_log
December 16, 2024 22:49 6m 48s
Add the missing view operations from sequence parallel(async).
nv-lightning-v100 #13772: Pull request #6750 synchronize by loadams
December 16, 2024 22:49 8m 31s inkcherry:ds_overlap_fix
December 16, 2024 22:49 8m 31s
Zero2: avoid graph breaks in torch.compile by using param_idx
nv-lightning-v100 #13771: Pull request #6803 synchronize by loadams
December 16, 2024 22:15 5m 33s nelyahu:zero2_param_idx
December 16, 2024 22:15 5m 33s
Fix --enable_each_rank_log when used with PDSH multi-node runner
nv-lightning-v100 #13770: Pull request #6863 synchronize by loadams
December 16, 2024 21:28 6m 35s akeshet:akeshet/pdsh_rank_log
December 16, 2024 21:28 6m 35s
Support pure meta model lm_head tp
nv-lightning-v100 #13767: Pull request #6812 synchronize by loadams
December 16, 2024 19:34 6m 44s Yejing-Lai:lyj/lm_head_replace
December 16, 2024 19:34 6m 44s