Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incomplete and risky bias statistics #4424

Open
iProzd opened this issue Nov 26, 2024 · 0 comments · May be fixed by #4496
Open

[BUG] Incomplete and risky bias statistics #4424

iProzd opened this issue Nov 26, 2024 · 0 comments · May be fixed by #4496
Labels

Comments

@iProzd
Copy link
Collaborator

iProzd commented Nov 26, 2024

Bug summary

In data statistics, certain types may not be sampled from the dataset, resulting in incomplete bias statistics. This will cause training problems, especially when dealing with mixed-type data formats.

The PyTorch DataLoader could be enhanced by implementing two methods:

  1. Calculate the number of atoms of each type within the dataset and cache some frame indices for each type.
  2. If the sampled frames lack certain types that exist in the dataset, use the cached indices to add frames of these missing types into the samples before performing bias statistics.

This approach will ensure comprehensive bias statistics.

DeePMD-kit Version

3.0.0

Backend and its version

Both Pytorch and TensorFlow

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

See above

Steps to Reproduce

See above

Further Information, Files, and Links

No response

@iProzd iProzd added the bug label Nov 26, 2024
@njzjz njzjz linked a pull request Jan 3, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant