You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unsure, as it happens sporadically, with no clear correlation to payload/prompts, traffic, or anything else. After the incomplete generation takes place, all subsequent requests timeout, and the only way to recover is to restart the endpoint. This has mainly occurred with Llama3.1 70b,
System Info
Sagemaker Realtime Inference endpoints
TGI Version 2.4.1
p4d: 4 A100, 96 CPU, 1152 GB mem
MAX_INPUT_LENGTH: '16128'
MAX_TOTAL_TOKENS: '16384'
Information
Tasks
Reproduction
Unsure, as it happens sporadically, with no clear correlation to payload/prompts, traffic, or anything else. After the incomplete generation takes place, all subsequent requests timeout, and the only way to recover is to restart the endpoint. This has mainly occurred with Llama3.1 70b,
Example Logs:
Success
Custom Attributes
(printed out headers returned from TGI from calling app)Incomplete generation
Incomplete Generation
Expected behavior
incomplete generations do not block the entire model
The text was updated successfully, but these errors were encountered: