Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Used center of f_1 as an additional storage and also fixed some bugs #99

Merged
merged 1 commit into from
Jan 2, 2025

Conversation

hsalehipour
Copy link
Collaborator

@hsalehipour hsalehipour commented Dec 22, 2024

Contributing Guidelines

Description

Added feature:

  1. Added central location of output population (f_1) as another storage space for storing BC auxilary data since that's also available.
  2. Used this newly added space for allocating single auxilary data of ZouHe/Regularized which simplified the code and removed for-loops over lattice direction for reading the prescribed values.

Fixed the following bugs:

  1. Flow over sphere was not working with prescribed pressure in JAX
  2. Flow over sphere was not working with FP64FP64 in WARP
  3. Regularized or ZouHe were not working with prescribed velocity of ZERO vector associated with no-slip BC (straight wall).

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • All pytest tests pass

Linting and Code Formatting

Make sure the code follows the project's linting and formatting standards. This project uses Ruff for linting.

To run Ruff, execute the following command from the root of the repository:

ruff check .
  • Ruff passes

@mehdiataei
Copy link
Contributor

mehdiataei commented Jan 2, 2025

This seems unnecessary, and I’m not even sure it won’t cause issues in some scenarios that you may need post/pre collision values as the center value is updated during collision. Could you please explain what problem this extra data is intended to solve?

At the very least, it introduces a lot of confusion and deviates from the usual definition of using "unknown population" for storage. Regarding the central population, @massimim dicussed the possibility of using a different buffer for central populations in the future, so this approach will not be compatible in that context.

_rho = f_post[_opp_indices[q], index[0], index[1], index[2]]
break
# Since we need only one scalar value, we only need to find one value (stored at the center of f_1)
_rho = f_post[0, index[0], index[1], index[2]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that this BC (zouhe) might need more than 1 aux variable? I am not sure what you mean by "more general".

@hsalehipour
Copy link
Collaborator Author

hsalehipour commented Jan 2, 2025

This seems unnecessary, and I’m not even sure it won’t cause issues in some scenarios that you may need post/pre collision values as the center value is updated during collision. Could you please explain what problem this extra data is intended to solve?

At the very least, it introduces a lot of confusion and deviates from the usual definition of using "unknown population" for storage. Regarding the central population, @massimim dicussed the possibility of using a different buffer for central populations in the future, so this approach will not be compatible in that context.

We always have these additional storage spaces at our disposal say in the pull method: (1) unknown directions of the output buffer (2) known directions of the input buffer and (3) center of the output. The known directions of the input (or f_0 in our notation) is often needed to store post-collision values for the BC and hence we don't use that. Recently we have used unknown directions of output (or f_1) in Extrapolation outflow BC and also for storing single values of normal vel and density in Regularized and ZouHe. However, we had never used the center location of the output. This is all part of our novel complex BC approach (you can confirm with @massimim who actually explained this all to me). Now this PR introduces the use of central location which you can see has generic retrival for 1 and +1 aux data. Only in Regularized and ZouHe which needs one aux data, we use center only. In other BCs (not added yet) we need not only the unknown directions but also the center location for storing aux data. Yes you can store index of missing directions to avoid the for loop and it would work but that's unnecessary here since we do have the center location that we should leverage and it is always at index 0.

I hope this answers your questions about "what problem this extra data is intended to solve".

@mehdiataei
Copy link
Contributor

mehdiataei commented Jan 2, 2025

I don't think this is correct. This will cause information loss and the core idea of using aux data is that you use "UNUSED" populations to store these values.

Say that I come up with a new BC tomorrow that is applied during streaming, and requires pre-collision populations. How can I retrieve the value of pre-collision at location 0 at the beginning of the iteration?

@mehdiataei mehdiataei closed this Jan 2, 2025
@mehdiataei mehdiataei reopened this Jan 2, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Jan 2, 2025
@hsalehipour
Copy link
Collaborator Author

The center location of OUTPUT (or f_1) is UNUSED and no one needs it just like the missing directions of f_1. This has nothing to do with requiring the pre-collision (or post-streaming) populations because they are already available in the thread (those desgingated with _f_post_stream in the code). No one uses centeral position of f_1 just like how the missing direcctions of f_1 are unused. Please think about this more carefully.

@mehdiataei
Copy link
Contributor

This doesn't answer my question. "Say that I come up with a new BC tomorrow that is applied during streaming, and requires pre-collision populations. How can I retrieve the value of pre-collision at location 0 at the beginning of the iteration?" At the beginning of the iteration the registers are empty, so they're not stored there.

@hsalehipour
Copy link
Collaborator Author

hsalehipour commented Jan 2, 2025

haha! it does not answer your question because you changed / editted your question :) But again we store aux data in f_0 at the end of the loop through apply_aux_recovery_bc function. Then f_0 is swapped with f_1 and hence at the beginning of the next iteration all aux data are in f_1 and are read into thread and designated by _f1_thread. So now if there is a hypothetical BC that is applied during streaming and needs pre-streaming or post-collision data, that information is jut available in _f0_thread and has nothing to do with _f1_thread. If the hypothetical BC needs post-streaming (or pre-collision) values they are again available in _f_post_stream.

In any case, please think about it this way: any argument you want to make against center location of f_1 by some hypothetical BC is equally applicable to the missing direction of f_1. I don't see any issues even in case of the two hypothetical scenarios you have mentioned so far as I have explained above.

@mehdiataei
Copy link
Contributor

The difference is that the missing directions are reconstructed, the center is not. If that information is needed (for any reason whatsoever), it is lost as it is overwritten.

@hsalehipour
Copy link
Collaborator Author

NO! It is not lost! Of course we ALWAYS need the data at the center of the lattice but that is not overwritten by this method. We have two fields f_0 and f_1. Generally speaking in the two pop approach (as you know) f_0 is really all we need during an LBM step and f_1 is really needed only for storage. These tricks we are introducing for storing aux data of complex BCs is really about how to use f_1 for storage more wisely.

@mehdiataei
Copy link
Contributor

If you will never need values of f_1 for any physical computations, you should be able to use the full extra field as additional storage, as long as you swap the whole cell in apply_aux_recovery_bc before storing f_1.

Hypothetically if you need them (like the example I gave, which is the pre-collision values at time t, given that the streaming operation does nothing at index=0), then overwriting them in apply_aux_recovery_bc will cause issues.

@mehdiataei
Copy link
Contributor

But we don't do that, because it will cause race condition in the streaming. Again, if pre-collision values at time t is needed for any computations, the overwrite will lose that value. We don't currently use it in our BCs as far as I can tell.

@mehdiataei
Copy link
Contributor

Isn't f_1 the f_post in functional_pressure? In that case, in _get_fsum function, we're using f_1[0] to compute fsum_middle?

@mehdiataei
Copy link
Contributor

image
image

@mehdiataei
Copy link
Contributor

fsum_middle includes f_1[0], which instead of the population it is adding the aux data. Am I wrong here?

@mehdiataei
Copy link
Contributor

mehdiataei commented Jan 2, 2025

I think you use _f_post not f_post...the namings are very confusing. We should fix this later. The buffer values are not the same as the register values and the same name will cause confusion. We should use f_0 and f_1 for the buffers always.

@mehdiataei
Copy link
Contributor

Again this is my final argument for this:

  1. The value of f_1 at index=0, equals to pre-collision value at time t (current timestep when the kernel is running). If this value is NEVER needed for any physical computations, we can use that storage. This is NOT the same as the missing populations, as we always reconstruct them.
  2. If the value is needed, or could be needed, it is better to use the storage of the remaining missing directions and store the index for trivial retrieval in case of +1 aux value.

@hsalehipour hsalehipour force-pushed the improved_bc_encoding branch from 7dbb071 to 23d06d8 Compare January 2, 2025 19:46
@hsalehipour
Copy link
Collaborator Author

I agree with the naming confusion and pushed a change.

@hsalehipour
Copy link
Collaborator Author

  1. The value of f_1 at index=0, equals to pre-collision value at time t (current timestep when the kernel is running). If this value is NEVER needed for any physical computations, we can use that storage. This is NOT the same as the missing populations, as we always reconstruct them.

No the value of f_1 at index=0 at current time step, t, is equal to the values of f_0 at the previous time-step (because of the buffer swapping) which are in turn equal to the pre-streaming (or post-collision) values at t-1 for the same voxel. The most complicated BC we have so far which is closest to your hypothetical scenario is ExrapolationOutflow that needs previous time-step information of the neighbouring cells. If you look at update_bc_auxilary_data in that BC you will see that we read from f_0 which includes index 0 so we never read from f_1 unless a quantity is stored there that is needed. Of course we cannot store information in the known directions of f_1 because of the race conditioning with other threads but this is not true for the central location.

  1. If the value is needed, or could be needed, it is better to use the storage of the remaining missing directions and store the index for trivial retrieval in case of +1 aux value.

I don't think we ever need that space.

@@ -91,8 +91,8 @@ def bc_profile(self):
@wp.func
def bc_profile_warp(index: wp.vec3i):
# Poiseuille flow profile: parabolic velocity distribution
y = self.precision_policy.store_precision.wp_dtype(index[1])
z = self.precision_policy.store_precision.wp_dtype(index[2])
y = wp.float32(index[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this back to compute type instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revering it back will cause lots of issues if you run this example with f64/f64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm pls create an issue if it is a bug.

y = self.precision_policy.store_precision.wp_dtype(index[1])
z = self.precision_policy.store_precision.wp_dtype(index[2])
y = wp.float32(index[1])
z = wp.float32(index[2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revering it back will cause lots of issues if you run this example with f64/f64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm pls create an issue if it is a bug

@hsalehipour hsalehipour force-pushed the improved_bc_encoding branch from 23d06d8 to 77ecdf6 Compare January 2, 2025 21:46
@hsalehipour
Copy link
Collaborator Author

Applied the requested changes and pushed again.

@hsalehipour hsalehipour merged commit 5340e6c into Autodesk:main Jan 2, 2025
10 checks passed
@hsalehipour hsalehipour deleted the improved_bc_encoding branch January 3, 2025 01:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants