You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then, I would like to perform supervised training with active learning of the Llama sequence-classifer transformer model on the dataset Birchlabs/openai-prm800k-stepwise-critic.
Questions:
How do I modify the example in the repository to get a clf_factory which uses the above base_model instead of providing TransformerModelArguments?
How do I use small-text to handle the large model size of Llama and potentially distribute its training over multiple GPUs?
The text was updated successfully, but these errors were encountered:
TransformerModelArguments is just a wrapper for the Hugging Face names/paths for model, tokenizer and config. Some models work out of the box, others need adaptations. I cannot cover this 100% since the transformers library does not impose too much restrictions on the different models, and the newest one can always deviate from this.
I briefly tried a smaller Llama model (1B):
<...>
File [/path/to/site-packages/transformers/models/llama/modeling_llama.py#line=1371), in LlamaForSequenceClassification.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1369 batch_size = inputs_embeds.shape[0]
1371 if self.config.pad_token_id is None and batch_size != 1:
-> 1372 raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
1373 if self.config.pad_token_id is None:
1374 sequence_lengths = -1
ValueError: Cannot handle batch sizes > 1 if no padding token is defined.
The error seems to be known but the workaround is difficult to achieve with the current API. I will keep this in mind for v2.0.0, but for now I would recommend just copying or subclassing TransformerBasedClassification and adapting it until it fits your needs.
This is currently not supported. You could write your own Classifier implementation to do that. Somewhere down my list of ideas I have a PyTorch Lightning integration which can help with distributed training, however, I think for Llama 2 you will still need other repos as well.
Refering to the active learning for text classification example given here.
In the given example, we have:
In my case, I would like to use the language model meta-llama/Llama-2-7b-chat-hf as a sequence classifier by calling it as
Then, I would like to perform supervised training with active learning of the Llama sequence-classifer transformer model on the dataset Birchlabs/openai-prm800k-stepwise-critic.
Questions:
How do I modify the example in the repository to get a
clf_factory
which uses the abovebase_model
instead of providingTransformerModelArguments
?How do I use
small-text
to handle the large model size of Llama and potentially distribute its training over multiple GPUs?The text was updated successfully, but these errors were encountered: