Suffix Activation Storing #5

chuyishang · 2025-01-03T06:35:33Z

When taking the suffix activations, in steering_vector.py, I noticed that the length of the tokenized suffix is calculated using suffixes[0][0], but the different suffixes in the list may be of different lengths. Wondering if this is intended behavior?

elif accumulate_last_x_tokens == "suffix-only":
    if suffixes:
        # Tokenize the suffix
        suffix_tokens = tokenizer.encode(suffixes[0][0], add_special_tokens=False)
        # Get the hidden states for the suffix tokens
        suffix_hidden = batch_hidden[-len(suffix_tokens):, :]
        accumulated_hidden_state = torch.mean(suffix_hidden, dim=0)

Thanks in advance!

The text was updated successfully, but these errors were encountered:

brucewlee · 2025-01-09T18:26:12Z

Thanks for finding this! The code was implemented this way mostly because having a fixed length seemed more stable, and suffixes mostly had similar lengths for our setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suffix Activation Storing #5

Suffix Activation Storing #5

chuyishang commented Jan 3, 2025

brucewlee commented Jan 9, 2025

Suffix Activation Storing #5

Suffix Activation Storing #5

Comments

chuyishang commented Jan 3, 2025

brucewlee commented Jan 9, 2025