RuntimeError: Input is too long for context length 77. No truncation passed #468

hessaAlawwad · 2024-10-21T07:16:03Z

Hello,
So I am trying to embed text using CLIP, I got the error that my text is too long but from the huggingface I see that I can fix the variable:

max_position_embeddings (int, optional, defaults to 77) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

any ideas on how can I do this?

My code is as follow:

import clip
client = OpenAI(api_key = "",)
#load model on device. The device you are running inference/training on is either a CPU or GPU if you have.
device = "cpu"
model, preprocess = clip.load("ViT-B/32",device=device)
def clip_get_features_from_single_image(image_path):
    image = preprocess(Image.open(image_path).convert("RGB"))
    image_input = torch.tensor(image).unsqueeze(0)
    with torch.no_grad():
        image_features = model.encode_image(image_input).float()
    return image_features

def clip_get_single_text_embedding(text):
    # inputs = clip.tokenize(text, context_length=77, truncate=True)
    inputs = clip.tokenize(text).to(device)
    with torch.no_grad():
        text_features = model.encode_text(inputs)
    return text_features

and do I need to adjust the image embeddings size too in order to calculate the similarty with the text?

Thanks in advance

The text was updated successfully, but these errors were encountered:

DeadLineChaser · 2025-01-07T08:28:40Z

hello，i met the same problem as you did,have，do you know how to change 'max_position_embeddings ' now? waiting for your replay，thanks in advance

99991 · 2025-01-23T13:52:23Z

Your code example is for openai/CLIP, but max_position_embeddings is for HuggingFace CLIP, which are different implementations.

Either way, there simply are no positional embeddings for tokens in longer texts. If you change context_length, you need to retrain the model (or at least the new parameters of positional_embedding).

CLIP/clip/model.py

Line 291 in dcba3cb

    
           self.positional_embedding = nn.Parameter(torch.empty(self.context_length, transformer_width))

You have a few options:

Make your text shorter. Small LLMs are probably good enough for this task.
Train a new model.
Split your text into chunks smaller than 77 tokens, compute embeddings for each chunk and then average the embeddings, as described here for example. Note that embeddings get worse the more you average them, but it might be good enough for small texts.
Use a different model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Input is too long for context length 77. No truncation passed #468

RuntimeError: Input is too long for context length 77. No truncation passed #468

hessaAlawwad commented Oct 21, 2024

DeadLineChaser commented Jan 7, 2025

99991 commented Jan 23, 2025

RuntimeError: Input is too long for context length 77. No truncation passed #468

RuntimeError: Input is too long for context length 77. No truncation passed #468

Comments

hessaAlawwad commented Oct 21, 2024

DeadLineChaser commented Jan 7, 2025

99991 commented Jan 23, 2025