Customizable `bad_token_ids` policies #8

dxoigmn · 2025-01-27T21:24:37Z

llmart has the capability of banning "bad" tokens from the adversarial optimization.

Right now bad_token_ids implements a static policy for what is considered a "bad" token (non-printability, ascii-only):

LLMart/src/llmart/tokenizer.py

Lines 428 to 444 in c7bbef3

    
           def bad_token_ids(self) -> torch.Tensor: 
        
               added_tokens = self.added_tokens_encoder.keys() 
        
               tokens = [ 
        
                   self.convert_tokens_to_string([token]) 
        
                   for token in self.convert_ids_to_tokens(list(range(self.__vocab_size))) 
        
               ] 
        
               printable_tokens = torch.tensor( 
        
                   [ 
        
                       token.isprintable() 
        
                       and token.isascii() 
        
                       and token not in added_tokens 
        
                       and len(token.strip()) > 0 
        
                       for token in tokens 
        
                   ], 
        
               ) 
        
               return torch.where(~printable_tokens)[0]

Being able to add configurable policies would help with non-ascii languages. Additionally, being able to ban a set of tokens would also be beneficial.

The text was updated successfully, but these errors were encountered:

mariusarvinte added the hackathon label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customizable `bad_token_ids` policies #8

Customizable `bad_token_ids` policies #8

dxoigmn commented Jan 27, 2025 •

edited by mariusarvinte

Loading

Customizable bad_token_ids policies #8

Customizable bad_token_ids policies #8

Comments

dxoigmn commented Jan 27, 2025 • edited by mariusarvinte Loading

Customizable `bad_token_ids` policies #8

Customizable `bad_token_ids` policies #8

dxoigmn commented Jan 27, 2025 •

edited by mariusarvinte

Loading