Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Timeout Configuration for web_server_proxy #2752

Open
Dragoy opened this issue Jan 12, 2025 · 3 comments
Open

Allow Timeout Configuration for web_server_proxy #2752

Dragoy opened this issue Jan 12, 2025 · 3 comments

Comments

@Dragoy
Copy link

Dragoy commented Jan 12, 2025

The web_server_proxy function in Modal, designed to proxy requests to a local web server, currently uses a hardcoded timeout of 3600 seconds (1 hour) within its internal aiohttp.ClientSession. This fixed timeout presents challenges when proxying requests to services that might require longer processing times, such as those involving machine learning model training or extensive data operations.

This lack of configurability can lead to premature termination of requests that exceed the one-hour limit, disrupting long-running tasks and negatively impacting user experience.

Proposed Solution:

Introduce a mechanism to allow customization of the aiohttp.ClientTimeout used within web_server_proxy.
Introduce a global or app-level configuration option: This would allow setting a default timeout for all web_server_proxy instances within an application.

timeout=aiohttp.ClientTimeout(total=3600),

@Dragoy
Copy link
Author

Dragoy commented Jan 20, 2025

bump

@mwaskom
Copy link
Contributor

mwaskom commented Jan 21, 2025

Hi, it's not quite clear what you're doing when you're running into this timeout. A simple example would be helpful.

Have you seen our docs on request timeouts with web endpoint functions? They have some suggested patterns for putting a long-running process behind a web endpoint interface.

Also we're typically faster to respond on modal.com/slack if you'd like to continue the conversation there. The GitHub issues are infrequently monitored by a small team, whereas others will see the question on Slack and can weigh in.

@Dragoy
Copy link
Author

Dragoy commented Jan 22, 2025

Hi, it's not quite clear what you're doing when you're running into this timeout. A simple example would be helpful.

Have you seen our docs on request timeouts with web endpoint functions? They have some suggested patterns for putting a long-running process behind a web endpoint interface.

Also we're typically faster to respond on modal.com/slack if you'd like to continue the conversation there. The GitHub issues are infrequently monitored by a small team, whereas others will see the question on Slack and can weigh in.

I am posting here because I have not found success in solving my problem in slack. Here's my message there:
https://modallabscommunity.slack.com/archives/C06GPR10K9V/p1735061064018099

Having TimeoutError issues with FluxGym training on Modal (H100). App crashes after ~1hr with:
1. TimeoutError in proxy_http_request (duration: 3600.3s)
2. Event loop closure errors
3. Queue timeouts (GET /queue/data)
Using Python 3.11, CUDA 12.4.0, timeout=7200.
Any recommendations for handling long-running Gradio sessions and preventing these timeouts?

Error logs:
   POST /queue/join -> 200 OK  (duration: 167.7 ms, execution: 70.0 ms)
   POST /queue/join -> 200 OK  (duration: 161.1 ms, execution: 62.5 ms)
   POST /queue/join -> 200 OK  (duration: 189.7 ms, execution: 62.2 ms)
Task exception was never retrieved
future: <Task finished name='Task-8873' coro=<_proxy_http_request.<locals>.send_response() done, defined at /pkg/modal/_runtime/asgi.py:370> exception=TimeoutError()>
Traceback (most recent call last):
  File "/pkg/modal/_runtime/asgi.py", line 377, in send_response
    async for data in proxy_response.content.iter_any():
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 35, in __anext__
    rv = await self.read_func()
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 397, in readany
    await self._wait("readany")
  File "/usr/local/lib/python3.11/site-packages/aiohttp/streams.py", line 303, in _wait
    with self._timer:
  File "/usr/local/lib/python3.11/site-packages/aiohttp/helpers.py", line 720, in __exit__
    raise asyncio.TimeoutError from None
TimeoutError
    GET /queue/data -> 200 OK  (duration: 3600.3 s, execution: 3600.2 s)
   POST /queue/join -> 200 OK  (duration: 162.4 ms, execution: 64.1 ms)

Advai Pal tried his best to help me, but the only solution he offered me was another slack user's crutch method of using two windows and clicking them alternately.

Image

"""       
When running a long-running process in a web app launched by `@modal.web_server` (such as model training),
even if you set a long `timeout` value in `@app.cls` or `@app.function`, it will be forcibly interrupted after exactly 1 hour.
However, if you open the web app in two windows and keep sending requests periodically from both windows, you can maintain the process for more than 1 hour.
This simple script keeps clicking specified points alternately in two windows and stops after specified time.

Usage:
    1. Launch the sleep test app with `modal serve sleep_test_gradio_webserver.py` and open it in 2 windows.
    2. Send a long sleep request in one window.
    3. Run this script with `python click_loop_simple.py`.
       Specify one point to click periodically (such as health check button) in each of the two open windows by pressing the space key.
    4. The script will click the recorded coordinates alternately.
    5. After specified time, the script will stop.
"""

import asyncio
import pyautogui
from pynput.keyboard import Key, Listener


click_interval = 25  # (seconds) Set to less than half of `container_idle_timeout`


def get_coordinates_on_space():
    """Get mouse coordinates when the space key is pressed"""
    coords: list[tuple[float, float]] = []

    def on_press(key):
        if key == Key.space:
            x, y = pyautogui.position()
            coords.append((x, y))
            print(f"Recorded coordinates: ({x}, {y})")
            if len(coords) == 2:  # Stop the listener when 2 points are recorded
                listener.stop()
                return

    with Listener(on_press=on_press) as listener:
        listener.join()
    return coords


async def click_loop(points, stop_event: asyncio.Event):
    """
    Click the specified list of coordinates in an infinite loop.
    Exit the loop when the stop signal comes.
    """
    while True:
        for point in points:
            pyautogui.click(point)
            print(f"Clicked at {point}")
            for i in range(click_interval, 0, -1):
                print(
                    f"{i} seconds until the next click... (Stop with Ctrl+C)",
                    end="\r",
                    flush=True,
                )
                await asyncio.sleep(1)
                if stop_event.is_set():
                    return


async def sleep_and_stop(stop_event: asyncio.Event):
    """
    A simple function to send a stop signal.
    You should change the conditions to send the signal according to the actual usage situation.
    """
    seconds = 60 * 60 * 1.5  # 1.5 hours
    await asyncio.sleep(seconds)
    print(f"{seconds} seconds have passed.")
    stop_event.set()


async def main():
    print(
        "Specify 2 points to click periodically. Move the mouse to the desired position and press the space key."
    )
    points = get_coordinates_on_space()

    stop_event = asyncio.Event()
    click_task = asyncio.create_task(click_loop(points, stop_event))
    sleep_task = asyncio.create_task(sleep_and_stop(stop_event))

    await asyncio.gather(click_task, sleep_task)


if __name__ == "__main__":
    asyncio.run(main())

In the end, one wants a more concrete solution, such as a configuration option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants