feat: add apple silicon support #469

NripeshN · 2025-01-30T00:07:42Z

No description provided.

IFaTaK

Comment on memory management for CUDA, MPS, and CPU

In fp8_cast_bf16.py at line 97 :

if len(loaded_files) > 2:
    oldest_file = next(iter(loaded_files))
    del loaded_files[oldest_file]
    torch.cuda.empty_cache()

The current implementation calls torch.cuda.empty_cache() for memory management, but this only works on CUDA devices. For MPS (Metal Performance Shaders) and CPU, we should handle memory differently. Here's an updated version that works across all devices:

import gc

...

# Memory management: keep only the 2 most recently used files
if len(loaded_files) > 2:
    oldest_file = next(iter(loaded_files))
    del loaded_files[oldest_file]

    # Check if CUDA is available, then free memory on CUDA device
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    # Check if MPS is available, then free memory on MPS device
    elif torch.mps.is_available():
        torch.mps.empty_cache()
    # Otherwise, clean up memory for CPU
    else:
        gc.collect()

Why this change is important:

CUDA: If CUDA is available, torch.cuda.empty_cache() will be called to free unused GPU memory.
MPS: If MPS is available on macOS, torch.mps.empty_cache() will be used for memory management.
CPU: For CPU or other non-GPU devices, we use gc.collect() to trigger garbage collection and free memory.

This ensures memory is managed correctly across all devices—whether you're using CUDA, MPS, or CPU. Would you consider updating this logic for better compatibility across environments?

Thanks! 🚀

inference/model.py

NripeshN · 2025-02-02T10:43:17Z

@IFaTaK
Thanks for the suggestion, just made all the changes.

p.s. Sorry for the silly error😅

IFaTaK · 2025-02-02T14:46:09Z

inference/requirements.txt

@@ -1,4 +1,4 @@
-torch==2.4.1
+torch==2.6.0
 triton==3.0.0


Suggested Change to requirements.txt
It seems that torch 2.6.0 specifically requires triton 3.2.0 compatibility. Currently, the requirements.txt lists triton==3.0.0, which causes a conflict with torch==2.6.0.

I recommend updating the triton version in the requirements.txt file to 3.2.0 to match the requirement from torch 2.6.0. The updated section would look like this:

This should resolve the version compatibility issue.

However, please note that triton==3.2.0 does not currently work on macOs or Windows through pip. Users on those platforms will need to manually install Triton by cloning the repository and building it from source. I suggest adding documentation for that in case someone needs to follow those steps. You can follow the installation instructions from the official Triton repository to do this.

This leads to update requirement.txt as follow :

torch==2.6.0 triton==3.2.0; platform_system == "Linux" transformers==4.46.3 safetensors==0.4.5

🚀

Actually further looking into this, Triton does not have a pip package at all for Mac or windows for any version. It currently only supports linux(https://github.com/triton-lang/triton?tab=readme-ov-file#compatibility).

Maybe we can avoid using triton all together and use pure PyTorch but that would come with it's own issues. Not sure how we can proceed.

One potential solution is to have requirements.txt look something like this:

torch==2.6.0 git+https://github.com/triton-lang/triton@main#subdirectory=python transformers==4.46.3 safetensors==0.4.5

This way we install triton from source and should work on every OS, but yea triton would not support GPU acceleration on Macs as of now.

While downloading Triton from source using git+https://github.com/triton-lang/triton@main#subdirectory=python ensures compatibility across all OS, I am concerned that Triton may still require additional building.

That said, if we want to fully support Metal on Apple Silicon, we would need to use MLX.
However, MLX is not as mature as Triton and lacks many of the optimizations and utilities that Triton provides for CUDA. Using MLX would require significant custom implementation, particularly for matrix operations and kernel optimizations. We'd need to manually write custom Metal shaders and utilize the Python API to replicate functionality that Triton handles out of the box.

This would involve a substantial amount of work to make MLX work like Triton. It could be a great project, but it'll definitely take some time to get it right. 🚀

While downloading Triton from source using git+https://github.com/triton-lang/triton@main#subdirectory=python ensures compatibility across all OS, I am concerned that Triton may still require additional building.

That said, if we want to fully support Metal on Apple Silicon, we would need to use MLX. However, MLX is not as mature as Triton and lacks many of the optimizations and utilities that Triton provides for CUDA. Using MLX would require significant custom implementation, particularly for matrix operations and kernel optimizations. We'd need to manually write custom Metal shaders and utilize the Python API to replicate functionality that Triton handles out of the box.

This would involve a substantial amount of work to make MLX work like Triton. It could be a great project, but it'll definitely take some time to get it right. 🚀

Triton is currently built around NVIDIA’s CUDA (NVPTX) and AMD’s ROCm backends. It would not work on other backends unless added(via metal for apple maybe). MLX is an array framework similar to PyTorch. We could implement this entire kernel using PyTorch’s native tensor operations, JIT, or even TorchScript which would enable cross platform, but this could come with some trade-offs like performance etc.

mowentian · 2025-02-06T08:04:58Z

thanks, this change is good to be an apple-silicon branch, because we have no apple device to test it...

feat: Enhance device compatibility and update PyTorch version

e75ce46

IFaTaK suggested changes Jan 30, 2025

View reviewed changes

inference/model.py Outdated Show resolved Hide resolved

NripeshN and others added 2 commits January 30, 2025 16:04

Fix small error

b6e3910

Memory management update

73efe7c

NripeshN requested a review from IFaTaK February 2, 2025 10:43

IFaTaK reviewed Feb 2, 2025

View reviewed changes

wowrakibul approved these changes Feb 3, 2025

View reviewed changes

IFaTaK approved these changes Feb 3, 2025

View reviewed changes

fix: Update triton dependency to use the latest version from GitHub

a533688

wowrakibul approved these changes Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add apple silicon support #469

feat: add apple silicon support #469

NripeshN commented Jan 30, 2025

IFaTaK left a comment

NripeshN commented Feb 2, 2025

IFaTaK Feb 2, 2025 •

edited

Loading

NripeshN Feb 2, 2025

NripeshN Feb 2, 2025 •

edited

Loading

IFaTaK Feb 3, 2025

NripeshN Feb 3, 2025

mowentian commented Feb 6, 2025

feat: add apple silicon support #469

Are you sure you want to change the base?

feat: add apple silicon support #469

Conversation

NripeshN commented Jan 30, 2025

IFaTaK left a comment

Choose a reason for hiding this comment

Why this change is important:

NripeshN commented Feb 2, 2025

IFaTaK Feb 2, 2025 • edited Loading

Choose a reason for hiding this comment

NripeshN Feb 2, 2025

Choose a reason for hiding this comment

NripeshN Feb 2, 2025 • edited Loading

Choose a reason for hiding this comment

IFaTaK Feb 3, 2025

Choose a reason for hiding this comment

NripeshN Feb 3, 2025

Choose a reason for hiding this comment

mowentian commented Feb 6, 2025

IFaTaK Feb 2, 2025 •

edited

Loading

NripeshN Feb 2, 2025 •

edited

Loading