TorchScript format problem: ai.djl.engine.EngineException: forward() is missing value for argument 'attention_mask' #3538

JeeDevUser · 2024-11-18T10:33:41Z

JeeDevUser
Nov 18, 2024

Scenario:

-downloaded all files under 'Files and versions' section, to the local drive, from location:
https://huggingface.co/google/pegasus-xsum/tree/main
-converted `pytorch_model.bin` to the TorchScript format (via Python script), 
and got the `pegasus_xsum_torchscript.pt` 

- initialized the tokenizer as follows:

HuggingFaceTokenizer tokenizer = HuggingFaceTokenizer.newInstance(Paths.get('/pathToTheLocation/with_tokenizer_json_etc'));

-loaded the model as follows:

      Model model = Model.newInstance(modelPath, "PyTorch");
      model.load(Paths.get(modelPath));

-Translator creation:

      // . Create Translator:
      Translator<long[], long[]> translator = new Translator<>() {
         @Override
         public NDList processInput(TranslatorContext ctx, long[] input) {
            // Kreiranje LongBuffer-a iz long[]
            LongBuffer buffer = LongBuffer.wrap(input);
            // Kreiranje NDArray objekta iz Buffer-a
            NDArray array = ctx.getNDManager().create(buffer, new Shape(1, input.length), DataType.INT64);

            NDList result =  new NDList(array);
            System.out.println("processInput() returns: " + result);
            return result;
         }


         @Override
         public long[] processOutput(TranslatorContext ctx, NDList list) {
            System.out.println("processOutput(), NDList in processOutput: " + list);
            // Getting NDArray from NDList:
            NDArray outputArray = list.singletonOrThrow();
            // Convert NDArray (output from the Model) to the long[]:
            long[] outputTokens = outputArray.toLongArray();
            System.out.println("processOutput() returns: " + list);
            return outputTokens;
         }
      };

finally, generating a summary for the input text:

    // 5. Predictor:
     try (Predictor<long[], long[]> predictor = model.newPredictor(translator)) {
        // 6. Generating summary from the tokenized input:
        System.out.println("Tokenized input: " + Arrays.toString(tokenizedInput));
        long[] summaryTokens = predictor.predict(tokenizedInput);

        // 7. decode generated tokens:
        String summaryText = tokenizer.decode(summaryTokens);
        System.out.println("Summary: " + summaryText);
     }catch(Exception ex) {
        System.err.println("error : " + ex.getMessage());
        ex.printStackTrace();
     }

All this causes the following error:

 ai.djl.engine.EngineException: forward() is missing value for argument 'attention_mask'. Declaration: forward(__torch__.ModelWrapper self, Tensor input_ids, Tensor attention_mask, Tensor decoder_input_ids) -> Tensor
ai.djl.translate.TranslateException: ai.djl.engine.EngineException: forward() is missing value for argument 'attention_mask'. Declaration: forward(__torch__.ModelWrapper self, Tensor input_ids, Tensor attention_mask, Tensor decoder_input_ids) -> Tensor
	at ai.djl.inference.Predictor.batchPredict(Predictor.java:197)
	at ai.djl.inference.Predictor.predict(Predictor.java:133)
	at djl.TestDJL.main(TestDJL.java:144)
Caused by: ai.djl.engine.EngineException: forward() is missing value for argument 'attention_mask'. Declaration: forward(__torch__.ModelWrapper self, Tensor input_ids, Tensor attention_mask, Tensor decoder_input_ids) -> Tensor
	at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method)
	at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:57)
	at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:146)
	at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79)
	at ai.djl.nn.Block.forward(Block.java:127)
	at ai.djl.inference.Predictor.predictInternal(Predictor.java:147)
	at ai.djl.inference.Predictor.batchPredict(Predictor.java:188)

What could be the problem?

frankfliu · 2024-11-18T16:19:00Z

frankfliu
Nov 18, 2024

@JeeDevUser

Can you provide your script that trace the model?

The model takes multiple inputs, but your processInput only provide in NDArray. your input should be long[][], it at least containds token_ids and attention_mask

2 replies

JeeDevUser Nov 19, 2024
Author

@frankfliu ,
do you mean, python script I used to convert pythorch_model.bin to TorchScript format?
Here is:

import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_path = 'd:/IDEJE_ZA_AI_APP/Hugging Face Modeli/pegasus-xsum'
tokenizer_path = 'd:/IDEJE_ZA_AI_APP/Hugging Face Modeli/pegasus-xsum'

model = PegasusForConditionalGeneration.from_pretrained(model_path)
tokenizer = PegasusTokenizer.from_pretrained(tokenizer_path)

# Prepare inputs
input_text = "This is an example input text for summarization."
inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)

# Extract necessary inputs
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Create dummy decoder inputs (typically starts with the decoder start token)
decoder_input_ids = torch.tensor([[model.config.decoder_start_token_id]])

# Wrap the model for TorchScript trace
class ModelWrapper(torch.nn.Module):
    def __init__(self, model):
        super(ModelWrapper, self).__init__()
        self.model = model

    def forward(self, input_ids, attention_mask, decoder_input_ids):
        outputs = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            decoder_input_ids=decoder_input_ids
        )
        return outputs.logits

wrapped_model = ModelWrapper(model)

# Trace the wrapped model
dummy_inputs = (input_ids, attention_mask, decoder_input_ids)
traced_model = torch.jit.trace(wrapped_model, dummy_inputs)

# Save the TorchScript model
traced_model.save("pegasus_xsum_torchscript.pt")
print("TorchScript model saved successfully!")

here is complete Java code I am using for text sumarization:

   public static void main(String[] args) throws IOException, ModelException {
      String modelPath = "d:\\IDEJE_ZA_AI_APP\\Hugging Face Modeli\\pegasus-xsum\\pegasus_xsum_torchscript.pt";

      String inputText = "Sample text to be summarized";

      // 1. Tokenizer:
      String tokenizerPath = "d:\\IDEJE_ZA_AI_APP\\Hugging Face Modeli\\pegasus-xsum";
      HuggingFaceTokenizer tokenizer = HuggingFaceTokenizer.newInstance(Paths.get(tokenizerPath));
      Encoding encoding = tokenizer.encode(inputText);
      long[] tokenizedInput = encoding.getIds(); 
      long[] attentionMask = encoding.getAttentionMask(); 
      long[] decoderInputIds = new long[]{0}; // BOS token za Pegasus model ?

      Model model = Model.newInstance(modelPath, "PyTorch");
      model.load(Paths.get(modelPath));

      // 4. Translator
      Translator<long[][], long[]> translator = new Translator<>() {
         @Override
         public NDList processInput(TranslatorContext ctx, long[][] inputs) {
            NDManager manager = ctx.getNDManager();
            NDArray inputIdsArray = manager.create(inputs[0]);
            NDArray attentionMaskArray = manager.create(inputs[1]);
            NDArray decoderInputIdsArray = manager.create(inputs[2]);

            NDList result = new NDList(inputIdsArray, attentionMaskArray, decoderInputIdsArray);
            System.out.println("processInput() returns: " + result);
            return result;
         }

         @Override
         public long[] processOutput(TranslatorContext ctx, NDList list) {
            NDArray outputArray = list.singletonOrThrow();
            NDArray tokenIds = outputArray.argMax(1); 
            long[] outputTokens = tokenIds.toLongArray(); 

            return outputTokens;    
           }
      };



      // 5. Predictor
      try (Predictor<long[][], long[]> predictor = model.newPredictor(translator)) {

         long[][] input = new long[][]{tokenizedInput, attentionMask, decoderInputIds};

         // 6. summary generation:
         long[] summaryTokens = predictor.predict(input);

         String summaryText = tokenizer.decode(summaryTokens);
         System.out.println("Summary: " + summaryText);
      } catch (Exception ex) {
         System.err.println("Error: " + ex.getMessage());
         ex.printStackTrace();
      }
   }

Tthis code _does not cause any error,_ but the summary text is not generated correctly. For example, I get the following output:

Summary:

Can you help somehow?

JeeDevUser Nov 19, 2024
Author

Just to add...for some inputText, I am getting the following summary:

Summary: The

-the impression is, as if something interrupted the generation of the summary ... but the question is what?

frankfliu · 2024-11-20T16:22:30Z

frankfliu
Nov 20, 2024

@JeeDevUser

This is summarization model, it's a bit complicated, you need write a loop to generate all tokens. You need to take a look our TextGeneration example: https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/nlp/TextGeneration.java

Here is how I trace the model:

from typing import Tuple, List

import torch
from torch import nn
from transformers import pipeline


class ModelWrapper(nn.Module):

    def __init__(self, model) -> None:
        super().__init__()
        self.model = model

    def encode(
            self,
            input_ids: torch.Tensor,
            attention_mask: torch.Tensor,
    ) -> Tuple[torch.Tensor]:
        return self.model.get_encoder()(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_attentions=False,
            output_hidden_states=False,
            return_dict=False,
        )

    def forward(
            self,
            attention_mask: torch.Tensor,
            decoder_input_ids: torch.Tensor,
            encoder_outputs: torch.Tensor,
            past_key_values: List[torch.Tensor],
    ) -> Tuple[torch.Tensor]:
        past_kv_list = []
        for i in range(16):
            layers = []
            for j in range(4):
                layers.append(past_key_values[i * 4 + j])
            past_kv_list.append(layers)

        return self.model(
            decoder_input_ids=decoder_input_ids,
            encoder_outputs=(encoder_outputs,),
            attention_mask=attention_mask,
            past_key_values=tuple(past_kv_list),
            use_cache=True,
            output_attentions=False,
            output_hidden_states=False,
            return_dict=False,
        )

    def forward_init(
            self,
            attention_mask: torch.Tensor,
            decoder_input_ids: torch.Tensor,
            encoder_outputs: torch.Tensor,
    ) -> Tuple[torch.Tensor]:
        return self.model(
            decoder_input_ids=decoder_input_ids,
            encoder_outputs=(encoder_outputs,),
            attention_mask=attention_mask,
            use_cache=True,
            output_attentions=False,
            output_hidden_states=False,
            return_dict=False,
        )


def generate_dummy_past_key_values(num_heads=16, num_layers=16, kv_dims=64, batch_size=1):
    past_key_values = []
    for _ in range(num_layers):
        past_key_values.append(torch.zeros(batch_size, num_heads, 1, kv_dims))
        past_key_values.append(torch.zeros(batch_size, num_heads, 1, kv_dims))
        past_key_values.append(torch.zeros(batch_size, num_heads, 12, kv_dims))
        past_key_values.append(torch.zeros(batch_size, num_heads, 12, kv_dims))

    return past_key_values


def main():
    model_id = "google/pegasus-xsum"
    pipe = pipeline(model=model_id, framework="pt")
    pipe.model.config.num_beams = 1  # use greedy search
    model = ModelWrapper(pipe.model)

    intput_text = "This is an example input text for summarization."
    # output = pipe(intput_text, use_cache=False)
    # print(output)

    encoding = pipe.tokenizer(intput_text, return_tensors="pt", max_length=1024, truncation=True)
    input_ids = encoding["input_ids"]
    attention_mask = encoding["attention_mask"]

    past_key_values = generate_dummy_past_key_values()
    encoder_outputs = model.encode(input_ids, attention_mask)
    # print(decoder_outputs)

    traced_decoder = torch.jit.trace_module(
        model,
        {
            "encode": [input_ids, attention_mask],
            "forward_init": [attention_mask, torch.tensor([[0]]), encoder_outputs[0]],
            "forward": [attention_mask, torch.tensor([[0, 0]]), encoder_outputs[0], past_key_values],
        }
    )
    torch.jit.save(traced_decoder, "model.pt")


if __name__ == '__main__':
    main()

And here is a naive implementation in java, I use a simple greedy search and hard-code everything (max_token, eos_token_id, bos_token_id), they should read from config.json file. You have to implement beam search if you want have to exactly the same default output as python, my implementation is same as:

model =  pipeline(model=google/pegasus-xsum, framework="pt")
output = pipeline(intput_text, num_beam=1, do_sample=False)

Here is java code:

    public static void main(String[] args)
            throws ModelException,
            IOException,
            TranslateException {
        String inputText = "This is an example input text for summarization.";
        int maxNewToken = 64;

        Path path = Paths.get("summarization/model.pt");

        HuggingFaceTokenizer tokenizer = HuggingFaceTokenizer.newInstance("google/pegasus-xsum");
        Encoding encoding = tokenizer.encode(inputText);
        List<Long> outputIds = new ArrayList<>();

        Criteria<NDList, NDList> criteria =
                Criteria.builder()
                        .setTypes(NDList.class, NDList.class)
                        .optModelPath(path)
                        .optEngine("PyTorch")
                        .build();
        try (ZooModel<NDList, NDList> model = criteria.loadModel();
             Predictor<NDList, NDList> predictor = model.newPredictor();
             NDManager manager = NDManager.newBaseManager("PyTorch")) {

            NDArray inputIds = manager.create(encoding.getIds()).expandDims(0);
            NDArray attentionMask = manager.create(encoding.getAttentionMask()).expandDims(0);

            NDArray encodeMethod = manager.create("");
            encodeMethod.setName("module_method:encode");
            NDArray initMethod = manager.create("");
            initMethod.setName("module_method:forward_init");

            NDList encodeInput = new NDList(inputIds, attentionMask, encodeMethod);
            NDList encoderOutputs = predictor.predict(encodeInput);

            NDArray decoderOutputIds = manager.create(new long[][]{{0}});

            NDList initInputs =
                    new NDList(attentionMask, decoderOutputIds, encoderOutputs.get(0), initMethod);
            NDList pastKeyValues = null;
            for (int i = 0; i < maxNewToken; ++i) {
                NDList output;
                if (i == 0) {
                    output = predictor.predict(initInputs);
                } else {
                    NDList decoderInputs =
                            new NDList(attentionMask, decoderOutputIds, encoderOutputs.get(0));
                    decoderInputs.addAll(pastKeyValues);
                    output = predictor.predict(decoderInputs);
                }
                NDArray logits = output.get(0);
                pastKeyValues = output.subNDList(1, 65);
                for (NDArray pastKeyValue : pastKeyValues) {
                    pastKeyValue.setName("past_key_values[]");
                }

                logits = logits.get(":, -1, :");
                NDArray nextTokenIds = logits.argMax(-1);
                long tokenId = nextTokenIds.getLong();
                if (tokenId == 1) {
                    // found eos
                    break;
                }

                outputIds.add(tokenId);
                decoderOutputIds = decoderOutputIds.concat(nextTokenIds.expandDims(0), -1);
            }
        }

        String text = tokenizer.decode(outputIds.stream().mapToLong(Long::longValue).toArray());
        System.out.println(text);
    }

2 replies

JeeDevUser Nov 20, 2024
Author

@frankfliu , thanks for response , I'm new to all this, there's a lot of material and I'm still not getting the hang of it, I hope you understand...
So...first let me ask, do you see a problem with the Python code i used to generate TorchScipt?

frankfliu Nov 21, 2024

Yes, the problem related to your python script is the limitation of jit trace. If the model code contains if else branch, the jit trace only record one branch. You should see warning when you trace the model.

Now let's look into this model. This is encoder decoder model. It effectively contains two models, and each work at different inference stage. As I mentioned earlier, summarization is a text generation task. Each model forward only produce one token. You need a loop to generate a full sentence. Your java code only output single word: The, this is the expected behavior. To improve repeated model forward performance, this model uses kv_cache, the cache is generated at prefill stage. and the following forward call takes an extra past_key_values parameters.

So my approach is to trace 3 code path individually: encode, initial prefill, and decoder forward. Now you can invoke each code path separately in java code.

Finally, you need to understand different token selection algorithm:

greedy search, always pick highest probability one (argmax())
sampling, pick random sample in top N
beam search, keep N beams, and pick the best overall output.

In my example, i just use the greedy search to show you how text generation model works.

JeeDevUser · 2024-11-21T11:01:24Z

JeeDevUser
Nov 21, 2024
Author

@frankfliu thanks for great explanation.
Tried your Python TorchScript generator, and Java code.

Got following Sumarization from Java:

This example input text is example example example text text text.

But from python, I got:
'summary_text': 'This example input text is a summarization of the following:'

Question: does this mean that the TorchScript version (used in Java code) is still unable to generate the proper output?

...or is this because of what you said:

And here is a naive implementation in java, I use a simple greedy search and hard-code everything (max_token, eos_token_id, bos_token_id), they should read from config.json file. You have to implement beam search if you want have to exactly the same default output as python

-I looked at

https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/nlp/TextGeneration.java
but still confused how to implement beam search...can you assist somehow?

1 reply

JeeDevUser Nov 21, 2024
Author

To be honest, I need a quick result (the same output as from Python) to convince the management of the company to use DJL in future development...

JeeDevUser · 2024-11-24T09:00:54Z

JeeDevUser
Nov 24, 2024
Author

Anyone?

0 replies

frankfliu · 2024-11-25T02:58:51Z

frankfliu
Nov 25, 2024

@JeeDevUser

The text generation task is pretty complicated, we don't have builtin java code that can handle common models. My recommendation is to take a look at our model server solution: https://docs.djl.ai/master/docs/serving/serving/docs/lmi/index.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchScript format problem: ai.djl.engine.EngineException: forward() is missing value for argument 'attention_mask' #3538

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

TorchScript format problem: ai.djl.engine.EngineException: forward() is missing value for argument 'attention_mask' #3538

JeeDevUser Nov 18, 2024

Replies: 5 comments · 5 replies

frankfliu Nov 18, 2024

JeeDevUser Nov 19, 2024 Author

JeeDevUser Nov 19, 2024 Author

frankfliu Nov 20, 2024

JeeDevUser Nov 20, 2024 Author

frankfliu Nov 21, 2024

JeeDevUser Nov 21, 2024 Author

JeeDevUser Nov 21, 2024 Author

JeeDevUser Nov 24, 2024 Author

frankfliu Nov 25, 2024

JeeDevUser
Nov 18, 2024

Replies: 5 comments 5 replies

frankfliu
Nov 18, 2024

JeeDevUser Nov 19, 2024
Author

JeeDevUser Nov 19, 2024
Author

frankfliu
Nov 20, 2024

JeeDevUser Nov 20, 2024
Author

JeeDevUser
Nov 21, 2024
Author

JeeDevUser Nov 21, 2024
Author

JeeDevUser
Nov 24, 2024
Author

frankfliu
Nov 25, 2024