Skip to content

Actions: deepspeedai/DeepSpeed

nv-lightning-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
4,348 workflow run results
4,348 workflow run results

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

nv-lightning-v100
nv-lightning-v100 #13897: Scheduled
January 3, 2025 00:21 43m 47s master
January 3, 2025 00:21 43m 47s
Cleanup ops/transformer/inference tests
nv-lightning-v100 #13895: Pull request #6830 synchronize by loadams
January 2, 2025 18:47 2h 3m 36s loadams/transformers-inference
January 2, 2025 18:47 2h 3m 36s
Autotp training
nv-lightning-v100 #13893: Pull request #6922 synchronize by inkcherry
January 2, 2025 03:54 6m 34s inkcherry:autotp_training
January 2, 2025 03:54 6m 34s
nv-lightning-v100
nv-lightning-v100 #13892: Scheduled
January 2, 2025 00:20 5m 31s master
January 2, 2025 00:20 5m 31s
nv-lightning-v100
nv-lightning-v100 #13891: Scheduled
January 1, 2025 00:23 6m 51s master
January 1, 2025 00:23 6m 51s
Add fp8_gemm fallback for non-triton systems
nv-lightning-v100 #13890: Pull request #6916 synchronize by oelayan7
December 31, 2024 12:01 3m 3s oelayan7:fp8_gemm_no_triton
December 31, 2024 12:01 3m 3s
[inf] Add config var to enable keeping module on host
nv-lightning-v100 #13889: Pull request #6846 synchronize by oelayan7
December 31, 2024 07:32 6m 32s oelayan7:keep_module_on_host
December 31, 2024 07:32 6m 32s
nv-lightning-v100
nv-lightning-v100 #13888: Scheduled
December 31, 2024 00:20 55m 3s master
December 31, 2024 00:20 55m 3s
Use ds-specific module id to avoid conflicts
nv-lightning-v100 #13887: Pull request #6847 synchronize by loadams
December 30, 2024 21:04 2m 49s olruwase/pr_6772
December 30, 2024 21:04 2m 49s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13886: Pull request #6909 synchronize by loadams
December 30, 2024 21:02 6m 45s hj-wei:dev_hjwei
December 30, 2024 21:02 6m 45s
Stage3: Use new torch grad accumulation hooks API
nv-lightning-v100 #13885: Pull request #6773 synchronize by loadams
December 30, 2024 18:54 5m 55s deepcharm:stage3-use-new-grad-acc-api
December 30, 2024 18:54 5m 55s
Fix checkpointable_layers Logic
nv-lightning-v100 #13884: Pull request #6881 synchronize by loadams
December 30, 2024 18:53 40m 16s Quentin-Anthony:qanthony/fix-act-recomp
December 30, 2024 18:53 40m 16s
Add fp8_gemm fallback for non-triton systems
nv-lightning-v100 #13883: Pull request #6916 synchronize by loadams
December 30, 2024 17:57 51m 51s oelayan7:fp8_gemm_no_triton
December 30, 2024 17:57 51m 51s
fix: RuntimeError for UCP large DP
nv-lightning-v100 #13882: Pull request #6918 synchronize by loadams
December 30, 2024 17:17 8m 26s saforem2/ucp-bug
December 30, 2024 17:17 8m 26s
nv-lightning-v100
nv-lightning-v100 #13879: Scheduled
December 30, 2024 00:22 7m 2s master
December 30, 2024 00:22 7m 2s
fix: RuntimeError for UCP large DP
nv-lightning-v100 #13878: Pull request #6918 opened by saforem2
December 29, 2024 18:23 3m 13s saforem2/ucp-bug
December 29, 2024 18:23 3m 13s
nv-lightning-v100
nv-lightning-v100 #13877: Scheduled
December 29, 2024 00:23 5m 51s master
December 29, 2024 00:23 5m 51s
Use ds-specific module id to avoid conflicts
nv-lightning-v100 #13876: Pull request #6847 synchronize by tjruwase
December 28, 2024 19:44 6m 47s olruwase/pr_6772
December 28, 2024 19:44 6m 47s
nv-lightning-v100
nv-lightning-v100 #13875: Scheduled
December 28, 2024 00:20 5m 47s master
December 28, 2024 00:20 5m 47s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13874: Pull request #6909 synchronize by hj-wei
December 27, 2024 03:06 6m 54s hj-wei:dev_hjwei
December 27, 2024 03:06 6m 54s
nv-lightning-v100
nv-lightning-v100 #13871: Scheduled
December 27, 2024 00:20 5m 47s master
December 27, 2024 00:20 5m 47s
Stage3: Use new torch grad accumulation hooks API
nv-lightning-v100 #13870: Pull request #6773 synchronize by loadams
December 26, 2024 20:09 5m 56s deepcharm:stage3-use-new-grad-acc-api
December 26, 2024 20:09 5m 56s