You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
怎么使用量化后的模型? 以及我写了一个for 循环
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
model = 'OpenGVLab/InternVL2_5-8B-MPO'
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, load_in_8bit=True))
for i in range(0,100):
response = pipe(('describe this image', image))
print(response.text)
print("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
但是GPU的利用率是在瞬间能到100%,其余时间都是0%(大概持续6~7s)。卡是4090
The text was updated successfully, but these errors were encountered:
怎么使用量化后的模型? 以及我写了一个for 循环
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
model = 'OpenGVLab/InternVL2_5-8B-MPO'
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, load_in_8bit=True))
for i in range(0,100):
response = pipe(('describe this image', image))
print(response.text)
print("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
但是GPU的利用率是在瞬间能到100%,其余时间都是0%(大概持续6~7s)。卡是4090
The text was updated successfully, but these errors were encountered: