Bug Report: PyTorchModel Defaults to CUDA instead of CPU #104

warisgill · 2024-10-24T20:12:05Z

Description:
The PyTorchModel class in pfl.model.pytorch automatically uploads the model to CUDA if CUDA is available on the system, regardless of whether the user intends to use CPU for FL training. This behavior is unexpected and deviates from standard PyTorch practices, where models remain on CPU unless explicitly moved to CUDA.

Steps to Reproduce:

Ensure CUDA is available on your system.

Initialize a PyTorchModel instance:

from torchvision.models import resnet18
from pfl.model.pytorch import PyTorchModel
pytorch_model = resnet18(pretrained=False)
pytorch_model.loss = None
pytorch_model.metrics = None    
print(f"Original model device: {next(pytorch_model.parameters()).device}")
# Initialize the PFL PyTorch model 
pfl_pt_model = PyTorchModel(pytorch_model, local_optimizer_create=None, central_optimizer=None)
print(f"PFL model device: {next(pfl_pt_model.pytorch_model.parameters()).device}")

Output of the above code:

Original model device: cpu
PFL model device: cuda:0

Observe that pytorch_model is moved to CUDA despite intending to use CPU.

Expected Behavior:
Similar to centralized training in PyTorch, the model should remain on CPU by default and only move to CUDA when the developer explicitly specifies.

Proposed Solution:
Introduce a device parameter to the PyTorchModel class, allowing users to specify the desired device ('cpu' or 'cuda'). Additionally, provide a .to(device) method to facilitate moving the model as needed.

Affected Code:
In pfl/model/pytorch.py:

self._model = model.to(pytorch_ops.get_default_device())

In pfl/internal/ops/pytorch_ops.py:

def get_default_device():
    manual_device = os.environ.get('PFL_PYTORCH_DEVICE', None)
    if manual_device:
        default_device = torch.device(manual_device)
    elif is_pytest_running():
        default_device = torch.device('cpu')
    elif torch.cuda.is_available():
        default_device = torch.device('cuda')
    elif (hasattr(torch.backends, 'mps') and torch.backends.mps.is_available()):
        default_device = torch.device('mps')
    else:
        default_device = torch.device('cpu')
    return default_device

Suggested Fix: Modify PyTorchModel to accept a device parameter and adjust the get_default_device function to respect user preferences more accurately.

Additional Information:

pfl version: 0.2.0
PyTorch version: 2.0.1

I am willing to work on a Pull Request to implement this fix if the maintainers agree.

The text was updated successfully, but these errors were encountered:

congzheng-song · 2024-11-13T17:06:42Z

You can use environment variable PFL_PYTORCH_DEVICE=cpu to force using cpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: PyTorchModel Defaults to CUDA instead of CPU #104

Bug Report: PyTorchModel Defaults to CUDA instead of CPU #104

warisgill commented Oct 24, 2024

congzheng-song commented Nov 13, 2024

Bug Report: PyTorchModel Defaults to CUDA instead of CPU #104

Bug Report: PyTorchModel Defaults to CUDA instead of CPU #104

Comments

warisgill commented Oct 24, 2024

congzheng-song commented Nov 13, 2024