You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run a gated model through the pipeline API but I get a gated model access error despite having the HF_TOKEN env var set.
>>> from optimum.nvidia.pipelines import pipeline as optimum_pipeline
>>> fast_pipe = optimum_pipeline('text-generation', 'meta-llama/Llama-2-70b-chat-hf', tp=2, use_fp8=True)
[...]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/optimum/nvidia/pipelines/__init__.py", line 93, in pipeline
raise RuntimeError(
RuntimeError: Failed to instantiate the pipeline inferring the task for model meta-llama/Llama-2-70b-chat-hf: 401 Client Error. (Request ID: Root=1-66859d97-068675f635fb5d9f4e0b36b6;10764411-e918-4f91-8a97-ca114e65ea79)
Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Llama-2-70b-chat-hf.
Access to model meta-llama/Llama-2-70b-chat-hf is restricted. You must be authenticated to access it.
>>> import os
>>> os.environ["HF_TOKEN"]
'***' # this is my access token that has access to the model
Running with the transformers pipeline succeeds in downloading the checkpoint
The text was updated successfully, but these errors were encountered:
I am trying to run a gated model through the pipeline API but I get a gated model access error despite having the HF_TOKEN env var set.
Running with the transformers pipeline succeeds in downloading the checkpoint
The text was updated successfully, but these errors were encountered: