TST: Skip test on multi-GPU as DataParallel fails (#2234)

This test fails in multi-GPU setting because transformers.Trainer switches to DataParallel. As this is not a commonly used parallelization strategy, it should be okay to just skip this.
huggingface · Nov 26, 2024 · d13d7a4 · d13d7a4
1 parent ca1b3b1
commit d13d7a4
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/tests/test_decoder_models.py b/tests/test_decoder_models.py
@@ -538,6 +538,12 @@ def test_prompt_learning_with_gradient_checkpointing(self, test_name, model_id,
         # Test prompt learning methods with gradient checkpointing in a semi realistic setting.
         # Prefix tuning does not work if the model uses the new caching implementation. In that case, a helpful error
         # should be raised.
+
+        # skip if multi GPU, since this results in DataParallel usage by Trainer, which fails with "CUDA device
+        # assertion", breaking subsequent tests
+        if torch.cuda.device_count() > 1:
+            pytest.skip("Skip prompt_learning_with_gradient_checkpointing test on multi-GPU setups")
+
         peft_config = config_cls(
             base_model_name_or_path=model_id,
             **config_kwargs,