Used center of f_1 as an additional storage and also fixed some bugs #99

hsalehipour · 2024-12-22T05:12:14Z

Contributing Guidelines

I have read and understood the CONTRIBUTING.md guidelines

Description

Added feature:

Added central location of output population (f_1) as another storage space for storing BC auxilary data since that's also available.
Used this newly added space for allocating single auxilary data of ZouHe/Regularized which simplified the code and removed for-loops over lattice direction for reading the prescribed values.

Fixed the following bugs:

Flow over sphere was not working with prescribed pressure in JAX
Flow over sphere was not working with FP64FP64 in WARP
Regularized or ZouHe were not working with prescribed velocity of ZERO vector associated with no-slip BC (straight wall).

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

All pytest tests pass

Linting and Code Formatting

Make sure the code follows the project's linting and formatting standards. This project uses Ruff for linting.

To run Ruff, execute the following command from the root of the repository:

ruff check .

Ruff passes

mehdiataei · 2025-01-02T14:40:09Z

This seems unnecessary, and I’m not even sure it won’t cause issues in some scenarios that you may need post/pre collision values as the center value is updated during collision. Could you please explain what problem this extra data is intended to solve?

At the very least, it introduces a lot of confusion and deviates from the usual definition of using "unknown population" for storage. Regarding the central population, @massimim dicussed the possibility of using a different buffer for central populations in the future, so this approach will not be compatible in that context.

xlb/operator/boundary_condition/bc_regularized.py

mehdiataei · 2025-01-02T14:44:23Z

xlb/operator/boundary_condition/bc_zouhe.py

-                    _rho = f_post[_opp_indices[q], index[0], index[1], index[2]]
-                    break
+            # Since we need only one scalar value, we only need to find one value (stored at the center of f_1)
+            _rho = f_post[0, index[0], index[1], index[2]]


Same comment

Are you suggesting that this BC (zouhe) might need more than 1 aux variable? I am not sure what you mean by "more general".

hsalehipour · 2025-01-02T15:53:38Z

This seems unnecessary, and I’m not even sure it won’t cause issues in some scenarios that you may need post/pre collision values as the center value is updated during collision. Could you please explain what problem this extra data is intended to solve?

At the very least, it introduces a lot of confusion and deviates from the usual definition of using "unknown population" for storage. Regarding the central population, @massimim dicussed the possibility of using a different buffer for central populations in the future, so this approach will not be compatible in that context.

We always have these additional storage spaces at our disposal say in the pull method: (1) unknown directions of the output buffer (2) known directions of the input buffer and (3) center of the output. The known directions of the input (or f_0 in our notation) is often needed to store post-collision values for the BC and hence we don't use that. Recently we have used unknown directions of output (or f_1) in Extrapolation outflow BC and also for storing single values of normal vel and density in Regularized and ZouHe. However, we had never used the center location of the output. This is all part of our novel complex BC approach (you can confirm with @massimim who actually explained this all to me). Now this PR introduces the use of central location which you can see has generic retrival for 1 and +1 aux data. Only in Regularized and ZouHe which needs one aux data, we use center only. In other BCs (not added yet) we need not only the unknown directions but also the center location for storing aux data. Yes you can store index of missing directions to avoid the for loop and it would work but that's unnecessary here since we do have the center location that we should leverage and it is always at index 0.

I hope this answers your questions about "what problem this extra data is intended to solve".

mehdiataei · 2025-01-02T16:01:12Z

I don't think this is correct. This will cause information loss and the core idea of using aux data is that you use "UNUSED" populations to store these values.

Say that I come up with a new BC tomorrow that is applied during streaming, and requires pre-collision populations. How can I retrieve the value of pre-collision at location 0 at the beginning of the iteration?

hsalehipour · 2025-01-02T16:18:57Z

The center location of OUTPUT (or f_1) is UNUSED and no one needs it just like the missing directions of f_1. This has nothing to do with requiring the pre-collision (or post-streaming) populations because they are already available in the thread (those desgingated with _f_post_stream in the code). No one uses centeral position of f_1 just like how the missing direcctions of f_1 are unused. Please think about this more carefully.

mehdiataei · 2025-01-02T16:24:53Z

This doesn't answer my question. "Say that I come up with a new BC tomorrow that is applied during streaming, and requires pre-collision populations. How can I retrieve the value of pre-collision at location 0 at the beginning of the iteration?" At the beginning of the iteration the registers are empty, so they're not stored there.

hsalehipour · 2025-01-02T16:46:55Z

haha! it does not answer your question because you changed / editted your question :) But again we store aux data in f_0 at the end of the loop through apply_aux_recovery_bc function. Then f_0 is swapped with f_1 and hence at the beginning of the next iteration all aux data are in f_1 and are read into thread and designated by _f1_thread. So now if there is a hypothetical BC that is applied during streaming and needs pre-streaming or post-collision data, that information is jut available in _f0_thread and has nothing to do with _f1_thread. If the hypothetical BC needs post-streaming (or pre-collision) values they are again available in _f_post_stream.

In any case, please think about it this way: any argument you want to make against center location of f_1 by some hypothetical BC is equally applicable to the missing direction of f_1. I don't see any issues even in case of the two hypothetical scenarios you have mentioned so far as I have explained above.

mehdiataei · 2025-01-02T17:11:39Z

The difference is that the missing directions are reconstructed, the center is not. If that information is needed (for any reason whatsoever), it is lost as it is overwritten.

hsalehipour · 2025-01-02T17:30:46Z

NO! It is not lost! Of course we ALWAYS need the data at the center of the lattice but that is not overwritten by this method. We have two fields f_0 and f_1. Generally speaking in the two pop approach (as you know) f_0 is really all we need during an LBM step and f_1 is really needed only for storage. These tricks we are introducing for storing aux data of complex BCs is really about how to use f_1 for storage more wisely.

mehdiataei · 2025-01-02T18:31:28Z

If you will never need values of f_1 for any physical computations, you should be able to use the full extra field as additional storage, as long as you swap the whole cell in apply_aux_recovery_bc before storing f_1.

Hypothetically if you need them (like the example I gave, which is the pre-collision values at time t, given that the streaming operation does nothing at index=0), then overwriting them in apply_aux_recovery_bc will cause issues.

mehdiataei · 2025-01-02T18:35:07Z

But we don't do that, because it will cause race condition in the streaming. Again, if pre-collision values at time t is needed for any computations, the overwrite will lose that value. We don't currently use it in our BCs as far as I can tell.

mehdiataei · 2025-01-02T18:40:03Z

Isn't f_1 the f_post in functional_pressure? In that case, in _get_fsum function, we're using f_1[0] to compute fsum_middle?

mehdiataei · 2025-01-02T18:42:32Z

mehdiataei · 2025-01-02T18:43:50Z

fsum_middle includes f_1[0], which instead of the population it is adding the aux data. Am I wrong here?

mehdiataei · 2025-01-02T18:45:34Z

I think you use _f_post not f_post...the namings are very confusing. We should fix this later. The buffer values are not the same as the register values and the same name will cause confusion. We should use f_0 and f_1 for the buffers always.

mehdiataei · 2025-01-02T18:55:07Z

Again this is my final argument for this:

The value of f_1 at index=0, equals to pre-collision value at time t (current timestep when the kernel is running). If this value is NEVER needed for any physical computations, we can use that storage. This is NOT the same as the missing populations, as we always reconstruct them.
If the value is needed, or could be needed, it is better to use the storage of the remaining missing directions and store the index for trivial retrieval in case of +1 aux value.

hsalehipour · 2025-01-02T19:48:05Z

I agree with the naming confusion and pushed a change.

hsalehipour · 2025-01-02T20:17:02Z

The value of f_1 at index=0, equals to pre-collision value at time t (current timestep when the kernel is running). If this value is NEVER needed for any physical computations, we can use that storage. This is NOT the same as the missing populations, as we always reconstruct them.

No the value of f_1 at index=0 at current time step, t, is equal to the values of f_0 at the previous time-step (because of the buffer swapping) which are in turn equal to the pre-streaming (or post-collision) values at t-1 for the same voxel. The most complicated BC we have so far which is closest to your hypothetical scenario is ExrapolationOutflow that needs previous time-step information of the neighbouring cells. If you look at update_bc_auxilary_data in that BC you will see that we read from f_0 which includes index 0 so we never read from f_1 unless a quantity is stored there that is needed. Of course we cannot store information in the known directions of f_1 because of the race conditioning with other threads but this is not true for the central location.

If the value is needed, or could be needed, it is better to use the storage of the remaining missing directions and store the index for trivial retrieval in case of +1 aux value.

I don't think we ever need that space.

xlb/operator/stepper/nse_stepper.py

xlb/operator/boundary_condition/bc_zouhe.py

mehdiataei · 2025-01-02T20:48:25Z

examples/cfd/flow_past_sphere_3d.py

@@ -91,8 +91,8 @@ def bc_profile(self):
        @wp.func
        def bc_profile_warp(index: wp.vec3i):
            # Poiseuille flow profile: parabolic velocity distribution
-            y = self.precision_policy.store_precision.wp_dtype(index[1])
-            z = self.precision_policy.store_precision.wp_dtype(index[2])
+            y = wp.float32(index[1])


Revert this back to compute type instead.

revering it back will cause lots of issues if you run this example with f64/f64.

Hmm pls create an issue if it is a bug.

mehdiataei · 2025-01-02T20:48:33Z

examples/cfd/flow_past_sphere_3d.py

-            y = self.precision_policy.store_precision.wp_dtype(index[1])
-            z = self.precision_policy.store_precision.wp_dtype(index[2])
+            y = wp.float32(index[1])
+            z = wp.float32(index[2])


revering it back will cause lots of issues if you run this example with f64/f64.

hmm pls create an issue if it is a bug

xlb/operator/boundary_condition/boundary_condition.py

hsalehipour · 2025-01-02T21:47:06Z

Applied the requested changes and pushed again.

hsalehipour requested a review from mehdiataei December 22, 2024 05:12

mehdiataei reviewed Jan 2, 2025

View reviewed changes

mehdiataei closed this Jan 2, 2025

mehdiataei reopened this Jan 2, 2025

github-actions bot locked and limited conversation to collaborators Jan 2, 2025

hsalehipour force-pushed the improved_bc_encoding branch from 7dbb071 to 23d06d8 Compare January 2, 2025 19:46

mehdiataei requested changes Jan 2, 2025

View reviewed changes

Used center of f_1 as an additional storage and also fixed some bugs

77ecdf6

hsalehipour force-pushed the improved_bc_encoding branch from 23d06d8 to 77ecdf6 Compare January 2, 2025 21:46

mehdiataei approved these changes Jan 2, 2025

View reviewed changes

hsalehipour merged commit 5340e6c into Autodesk:main Jan 2, 2025
10 checks passed

hsalehipour deleted the improved_bc_encoding branch January 3, 2025 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Used center of f_1 as an additional storage and also fixed some bugs #99

Used center of f_1 as an additional storage and also fixed some bugs #99

hsalehipour commented Dec 22, 2024 •

edited

Loading

mehdiataei commented Jan 2, 2025 •

edited

Loading

mehdiataei Jan 2, 2025

hsalehipour Jan 2, 2025

hsalehipour commented Jan 2, 2025 •

edited

Loading

mehdiataei commented Jan 2, 2025 •

edited

Loading

hsalehipour commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

hsalehipour commented Jan 2, 2025 •

edited

Loading

mehdiataei commented Jan 2, 2025

hsalehipour commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025 •

edited

Loading

mehdiataei commented Jan 2, 2025

hsalehipour commented Jan 2, 2025

hsalehipour commented Jan 2, 2025

mehdiataei Jan 2, 2025

hsalehipour Jan 2, 2025

mehdiataei Jan 2, 2025

mehdiataei Jan 2, 2025

hsalehipour Jan 2, 2025

mehdiataei Jan 2, 2025

hsalehipour commented Jan 2, 2025

Used center of f_1 as an additional storage and also fixed some bugs #99

Used center of f_1 as an additional storage and also fixed some bugs #99

Conversation

hsalehipour commented Dec 22, 2024 • edited Loading

Contributing Guidelines

Description

Type of change

How Has This Been Tested?

Linting and Code Formatting

mehdiataei commented Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsalehipour commented Jan 2, 2025 • edited Loading

mehdiataei commented Jan 2, 2025 • edited Loading

hsalehipour commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

hsalehipour commented Jan 2, 2025 • edited Loading

mehdiataei commented Jan 2, 2025

hsalehipour commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025

mehdiataei commented Jan 2, 2025 • edited Loading

mehdiataei commented Jan 2, 2025

hsalehipour commented Jan 2, 2025

hsalehipour commented Jan 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsalehipour commented Jan 2, 2025

hsalehipour commented Dec 22, 2024 •

edited

Loading

mehdiataei commented Jan 2, 2025 •

edited

Loading

hsalehipour commented Jan 2, 2025 •

edited

Loading

mehdiataei commented Jan 2, 2025 •

edited

Loading

hsalehipour commented Jan 2, 2025 •

edited

Loading

mehdiataei commented Jan 2, 2025 •

edited

Loading