Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View output from Comfy in Runpod logs #76

Open
jeffcrouse opened this issue Oct 30, 2024 · 5 comments
Open

View output from Comfy in Runpod logs #76

jeffcrouse opened this issue Oct 30, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@jeffcrouse
Copy link

Is your feature request related to a problem? Please describe.
Background: My endpoint is running super slow on RunPod (NVIDIA A100-SXM4-80GB), and I don't know why. When I run my Docker container locally (3080 Ti), it runs at the expected speed. As a benchmark, comfyui_face_parsing custom node loads in 0.8 seconds locally, but takes 5.9 seconds on RunPod. So I'm trying to figure out what is wrong.

Problem: When I run the Docker image locally, I can see both the output of rp_handler.py AND the output of Comfy. I imagine this would be super helpful in diagnosing the slowness. But when I watch the logs of the Serverless worker on RunPod, I can see the output of rp_handler.py, but none of the output of Comfy.

Describe the solution you'd like
I'd like to be able to see output from Comfy in the Worker logs as workflows are running.

Describe alternatives you've considered
I see two possible solutions:

  1. in src/start.sh, make a named pipe, pipe the output from Comfy to /tmp/comfy_pipe. Then I can asynchronously read from this pipe in rp_handler.py and print it to stdout
  2. I can use some 3rd party service like Sentry or Cloudwatch and have both rp_handler.py and Comfy send output to them

Additional context
I am not at all confident that I am diagnosing this logging problem correctly. Should I be seeing the comfy output in the RunPod worker logs? Are my possible solutions the only way? Or am I missing something obvious?

@jeffcrouse
Copy link
Author

FWIW, my pipe idea worked -- I can now see the comfy output in the Runpod Serverless logs.

You have to add mkfifo /tmp/comfy_pipe to the top of start.sh and then pipe output to that pippe with > /tmp/comfy_pipe 2>&1

Then, in rp_handler.py,

def read_pipe():
    with open('/tmp/comfy_pipe', 'r') as pipe:
        while True:
            line = pipe.readline()
            if line:
                print("[comfy] ", line.strip())
            else:
                break  # Exit if there’s no more data


# Start the handler only if this script is run directly
if __name__ == "__main__":
    pipe_thread = threading.Thread(target=read_pipe, daemon=True)
    pipe_thread.start()
    runpod.serverless.start({"handler": handler})

However, I am still confused as to why the endpoint is running so slowly. It takes 10-20x longer on the NVIDIA A100-SXM4-80GB Runpod server than it does on my local 3080 Ti.

@jeffcrouse
Copy link
Author

I realize that I should probably move this out of Issues, but I just realized that the Docker image is based on nvidia/cuda:11.8.0, but then PyTorch is installing from the 12.1 repo. Is this a problem?

@limeberri
Copy link

I realize that I should probably move this out of Issues, but I just realized that the Docker image is based on nvidia/cuda:11.8.0, but then PyTorch is installing from the 12.1 repo. Is this a problem?

i noticed that too and changed it to:

FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 as base

@TimPietrusky
Copy link
Member

TimPietrusky commented Nov 18, 2024

Yeah, the next thing that should happen here is to actually update to CUDA 12.1 as this should improve the performance overall.

And yes, seeing the output of ComfyUI would also be nice.

@TimPietrusky TimPietrusky self-assigned this Nov 18, 2024
@TimPietrusky TimPietrusky added the enhancement New feature or request label Nov 18, 2024
@jelling
Copy link

jelling commented Nov 30, 2024

However, I am still confused as to why the endpoint is running so slowly. It takes 10-20x longer on the NVIDIA A100-SXM4-80GB Runpod server than it does on my local 3080 Ti.

10 - 20x makes me think that the endpoint is downloading the models on each inference. The most likely cause for that would be Fast Boot being false and the worker spinning down between inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants