Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with different padding size in batch #65

Closed
ryantwolf opened this issue Jul 30, 2024 · 1 comment
Closed

Error with different padding size in batch #65

ryantwolf opened this issue Jul 30, 2024 · 1 comment

Comments

@ryantwolf
Copy link

I am encountering an error when using crossfit with Llama 2 7B to generate a continuation of a sequence. I am using a HFModel. See here for the exact implementation. I get this error when using a large dataset, but not a small (2 document) dataset.

Traceback (most recent call last):
  File "/home/rywolf/Documents/aegis-tests/new_aegis.py", line 61, in <module>
    main()
  File "/home/rywolf/Documents/aegis-tests/new_aegis.py", line 49, in main
    write_to_disk(
  File "/home/rywolf/.local/lib/python3.10/site-packages/nemo_curator/utils/distributed_utils.py", line 505, in write_to_disk
    output = output.compute()
  File "/home/rywolf/.local/lib/python3.10/site-packages/dask/base.py", line 375, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/rywolf/.local/lib/python3.10/site-packages/dask/base.py", line 661, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/rywolf/.local/lib/python3.10/site-packages/crossfit/op/base.py", line 94, in __call__
    output = self.call(data, *args, partition_info=partition_info, **kwargs)
  File "/home/rywolf/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/rywolf/.local/lib/python3.10/site-packages/crossfit/backend/torch/op/base.py", line 86, in call
    outputs = cp.asarray(torch.cat(all_outputs_ls, dim=0))
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 3690 but got size 3249 for tensor number 1 in the list.

@VibhuJawa thinks it's due to a mismatch in the output size for each batch since each batch will be padded to a different size.

@VibhuJawa
Copy link
Member

Fixed by #66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants