Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: prepare host for specific gpu (needed for containers) without starting api/ui #16762

Open
1 task done
AlBundy33 opened this issue Dec 30, 2024 · 6 comments
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance

Comments

@AlBundy33
Copy link

AlBundy33 commented Dec 30, 2024

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

It would be nice to have some option to initializen the current installation for a specific gpu
right now I exploit the script to "simulate" my amd gpu

root@ollama:~/ollama# cat Dockerfile.automatic1111
FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.4.0

WORKDIR /automatic1111

# install packages
RUN apt update \
 && apt install google-perftools bc -y --no-install-recommends \
 && apt clean

# clone repo
RUN mkdir -p /data \
 && git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui .

# prepare environment (exploits webui.sh)
RUN --mount=type=cache,target=/root/.cache/pip \
    sed 's/start()/# start()/g' launch.py >_init.py \
 && bash -ec "function lspci { echo '03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] XXXXXX'; } \
 && export -f lspci \
 && LAUNCH_SCRIPT=_init.py ./webui.sh -f --data-dir /data --skip-torch-cuda-test" \
 && rm -rf /data/* _init.py

EXPOSE 7860

VOLUME /data

CMD ["/automatic1111/webui.sh", "-f", "--api", "--listen", "--skip-prepare-environment", "--data-dir", "/data", "--precision", "full", "--no-half" ]

Proposed workflow

maybe a separate script or argument would be nice

Additional information

if someome is interessted this is my docker-compose.yml

services:
  ollama:
    image: ollama/ollama:rocm
    ports:
      - "11434:11434"
    volumes:
      #- ./ollama:/root/.ollama
      - /usr/share/ollama/.ollama:/root/.ollama
    restart: unless-stopped
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - 44
      - 993

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    volumes:
      - ./open-webui:/app/backend/data
    ports:
      - "8080:8080"
    environment:
      OLLAMA_BASE_URL: http://ollama:11434
      AUTOMATIC1111_BASE_URL: http://automatic1111:7860
      ENABLE_OLLAMA_API: true
      ENABLE_OPENAI_API: false
      YOUTUBE_LOADER_LANGUAGE: de
      ENABLE_RAG_WEB_SEARCH: true
      RAG_WEB_SEARCH_ENGINE: duckduckgo
      ENABLE_IMAGE_GENERATION: true
      IMAGE_GENERATION_ENGINE: automatic1111
      IMAGE_GENERATION_MODEL: v1-5-pruned-emaonly.safetensors [6ce0161689]
      SCARF_NO_ANALYTICS: true
      DO_NOT_TRACK: true
      ANONYMIZED_TELEMETRY: false

  automatic1111:
    build:
      context: .
      dockerfile: Dockerfile.automatic1111
    volumes:
      - ./automatic1111:/data
    restart: unless-stopped
    ports:
      - "7860:7860"
    group_add:
      - 44
      - 993
    devices:
      - /dev/kfd
      - /dev/dri
@AlBundy33 AlBundy33 added the enhancement New feature or request label Dec 30, 2024
@w-e-w
Copy link
Collaborator

w-e-w commented Dec 30, 2024

I think what you asking is already supported

I don't use AMD GPU so I'm not exactly familiar with the situation but from my understanding

there should be no need to simulate a GPU
unless there is something that I'm not aware the difference is the different GPU is what version of torch that it installs

the environment variable TORCH_COMMAND is the actual thing that defines what version of torch that we installed
so you should just set the appropriate command for your desired platform without the heck simulation stuff
if the user already defines TORCH_COMMAND then auto platform detection is disabled

gpu_info=$(lspci 2>/dev/null | grep -E "VGA|Display")
case "$gpu_info" in
*"Navi 1"*)
export HSA_OVERRIDE_GFX_VERSION=10.3.0
if [[ -z "${TORCH_COMMAND}" ]]
then
pyv="$(${python_cmd} -c 'import sys; print(f"{sys.version_info[0]}.{sys.version_info[1]:02d}")')"
# Using an old nightly compiled against rocm 5.2 for Navi1, see https://github.com/pytorch/pytorch/issues/106728#issuecomment-1749511711
if [[ $pyv == "3.8" ]]
then
export TORCH_COMMAND="pip install https://download.pytorch.org/whl/nightly/rocm5.2/torch-2.0.0.dev20230209%2Brocm5.2-cp38-cp38-linux_x86_64.whl https://download.pytorch.org/whl/nightly/rocm5.2/torchvision-0.15.0.dev20230209%2Brocm5.2-cp38-cp38-linux_x86_64.whl"
elif [[ $pyv == "3.9" ]]
then
export TORCH_COMMAND="pip install https://download.pytorch.org/whl/nightly/rocm5.2/torch-2.0.0.dev20230209%2Brocm5.2-cp39-cp39-linux_x86_64.whl https://download.pytorch.org/whl/nightly/rocm5.2/torchvision-0.15.0.dev20230209%2Brocm5.2-cp39-cp39-linux_x86_64.whl"
elif [[ $pyv == "3.10" ]]
then
export TORCH_COMMAND="pip install https://download.pytorch.org/whl/nightly/rocm5.2/torch-2.0.0.dev20230209%2Brocm5.2-cp310-cp310-linux_x86_64.whl https://download.pytorch.org/whl/nightly/rocm5.2/torchvision-0.15.0.dev20230209%2Brocm5.2-cp310-cp310-linux_x86_64.whl"
else
printf "\e[1m\e[31mERROR: RX 5000 series GPUs python version must be between 3.8 and 3.10, aborting...\e[0m"
exit 1
fi
fi
;;

alternatively you could always manually install torch, if torch is already installed webui won't auto install a different version for you (unless you pass --reinstall-torch) in whitch case it will try to install what's defined by TORCH_COMMAND


also there is no need to modify launch.py to prevent webui from starting
just pass --exit and it will exit after installing requirements

if "--exit" in sys.argv:
print("Exiting because of --exit argument")
exit(0)


wiki

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

@w-e-w w-e-w added asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance and removed enhancement New feature or request labels Dec 30, 2024
@AlBundy33
Copy link
Author

AlBundy33 commented Dec 30, 2024

--exit seems to work - II completely overlooked it.

Also setting the TORCH_COMMAND works - I already spotted this but in this case I need to know the correct packages, versions and urls.
My goal was to use the "suggested" settings from webui.sh
e.g. in my case export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7" is the command to use but if you change the rocm version, add a package or set a specific version I would have to do the same in the Dockerfile.

it would be nice to have an option or environment variable to define the gpu type which results in the "correct" TORCH_COMMAMD

this is my current docker-file

FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.4.0

WORKDIR /automatic1111

# install packages
RUN apt update \
 && apt install google-perftools bc -y --no-install-recommends \
 && apt clean

# clone repo
RUN mkdir -p /data \
 && git clone --depth 1 https://github.com/AUTOMATIC1111/stable-diffusion-webui .

# prepare environment (TORCH_COMMAND from webui.sh)
RUN --mount=type=cache,target=/root/.cache/pip \
    export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7" \
 && bash -e ./webui.sh -f --data-dir /data --skip-torch-cuda-test --exit \
 && rm -rf /data/*

EXPOSE 7860

VOLUME /data

CMD ["/automatic1111/webui.sh", "-f", "--api", "--listen", "--skip-prepare-environment", "--data-dir", "/data", "--precision", "full", "--no-half", "--medvram" ]

@w-e-w
Copy link
Collaborator

w-e-w commented Dec 30, 2024

so are you asking for something like
if gpu_info is pre defined
then don't run the gpu test like lspci 2>/dev/null | grep -E "VGA|Display" grep -q "NVIDIA" grep -q "AMD"
but still sets TORCH_COMMAND base on the pre defined value of gpu_info

@AlBundy33
Copy link
Author

something like this would be nice.
or maybe a new variable or arguments like --amd-gpu and --nvidia-gpu (maybe more depending on the count of TORCH_COMMAND)

@w-e-w
Copy link
Collaborator

w-e-w commented Dec 30, 2024

if you really have a clear idea of what you want
then it's better that you make a PR and see if others agree with your addition

@AlBundy33
Copy link
Author

I think the easiest change would be something like this

if [[ -z "${TORCH_GPU}" ]]
then
    gpu_info=$(lspci 2>/dev/null | grep -E "VGA|Display")
else
    gpu_info=$TORCH_GPU
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance
Projects
None yet
Development

No branches or pull requests

2 participants