You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current DataloaderIter class assumes that the Dataset's __getitem__ method can handle slicing and return batches of samples. This assumption causes compatibility issues with datasets that return a single sample which is usually the case. To resolve this, i think it's better to handle slicing within DataloaderIter__next__ method and keep __getitem__'s concern limited to returning a single sample.
The line causing errors is
batch=self.dataset[self.current_index:end]
The proposed solution is to modify __next__ to call __getitem__ multiple times to create a batch, maybe something that looks like this:
DataloaderIter class assumes that the Dataset's getitem method can handle slicing and return batches of samples. This assumption causes compatibility issues with datasets that return a single sample which is usually the case.
Yep, I was able to reproduce the issue on my side and it looks like the dataloader is expecting a __getitem__ method that enable slicing. That said, I think it's better the shift the responsibility of the slicing to the dataset class since:
Most python objects that implement a getitem do have slicing capabilities
It's not that difficult to implement the slicing on the dataset since it has access to the underlining data
Thank you for looking into this issue and providing a solution. I've actually been relying on the same approach, and it works really great.
However, I initially thought this was just a temporary fix and not the way to go, as it didn't feel like the conventional approach used in deep learning frameworks such as PyTorch. Where users typically find the __getitem__ definition more intuitive since it returns a single sample, and then use other mechanisms such as the collate_fn function to handle slicing and batching.
That being said, I now see the benefits of implementing slicing within the __getitem__ method, such as making use of the slicing capabilities of most Python objects and keeping the dataset class more self-contained. I think this is a minor issue, and both approaches work just fine.
The current
DataloaderIter
class assumes that the Dataset's__getitem__
method can handle slicing and return batches of samples. This assumption causes compatibility issues with datasets that return a single sample which is usually the case. To resolve this, i think it's better to handle slicing withinDataloaderIter
__next__
method and keep__getitem__
's concern limited to returning a single sample.The line causing errors is
The proposed solution is to modify
__next__
to call__getitem__
multiple times to create a batch, maybe something that looks like this:The text was updated successfully, but these errors were encountered: