Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PETSc to 3.20.3, update libmesh #26462

Merged
merged 12 commits into from
Jan 27, 2024
Merged

Conversation

cticenhour
Copy link
Member

Closes #26461

cticenhour added a commit to cticenhour/moose that referenced this pull request Jan 2, 2024
@cticenhour cticenhour changed the title [WIP] Update PETSc to 3.20.3 Update PETSc to 3.20.3 Jan 2, 2024
@cticenhour
Copy link
Member Author

@roystgnr I'd like to get one round of testing in on this before piling more on, but would you be interested in a libMesh update going in alongside this?

petsc Show resolved Hide resolved
@moosebuild
Copy link
Contributor

moosebuild commented Jan 3, 2024

Job Test timings on fbd5908 wanted to post the following:

View timings here

This comment will be updated on new commits.

@moosebuild
Copy link
Contributor

moosebuild commented Jan 3, 2024

Job Documentation on fbd5908 wanted to post the following:

View the site here

This comment will be updated on new commits.

@roystgnr
Copy link
Contributor

roystgnr commented Jan 3, 2024

would you be interested in a libMesh update going in alongside this?

Probably not. I've got a minor fix that would be useful in MOOSE, but not worth the hassle alone, and I've got a major branch that isn't likely to be ready to merge before the end of the week.

@cticenhour
Copy link
Member Author

Alright - will ping again if this ends up lingering through Friday or Monday.

@moosebuild
Copy link
Contributor

moosebuild commented Jan 3, 2024

Job Coverage on fbd5908 wanted to post the following:

Framework coverage

c581c9 #26462 fbd590
Total Total +/- New
Rate 85.10% 85.10% -0.00% -
Hits 98663 98662 -1 0
Misses 17273 17274 +1 0

Diff coverage report

Full coverage report

Modules coverage

Coverage did not change

Full coverage reports

Reports

This comment will be updated on new commits.

@lindsayad
Copy link
Member

I would be interested in a libmesh update

@lindsayad
Copy link
Member

Can't build #26477 without a libmesh update. A couple cool new things in libMesh directly leveraged in that MOOSE PR are L2_HIERARCHIC_VEC and the ability to form a callback post linear-solve, right before the line search.

That being said, I would be willing to wait for @roystgnr's major branch to merge (is it libMesh/libmesh#3759 or something else?)

@roystgnr
Copy link
Contributor

roystgnr commented Jan 4, 2024

Those are pretty good reasons for the update.

I don't want to rush libMesh/libmesh#3759, though that is what I was thinking about. The current status of libMesh::MeshBase is roughly "what's a class invariant", and I'm trying to figure out if I can efficiently change that to "you can know a priori that the mesh is prepared except for a runtime-testable and easily-individually-correctable set of exceptions", but if I stop here then we're just at "what's a class invariant, but more confusing".

So we should probably update the submodule after all, and do it without that. I'd like libMesh/libmesh#3758 to make it to master, and I do have one other PR that I'm hoping to get in today too, but there's actually more than enough to justify an update as-is, now that I look over the git logs. If we wait much longer getting libMesh/libmesh#3736 downstream I'll probably forget entirely about the MOOSE side of it...

@roystgnr
Copy link
Contributor

roystgnr commented Jan 4, 2024

I've pushed newsletter docs to roystgnr@1bfae7d if you want to cherry-pick that to go alongside a libmesh submodule update.

@cticenhour
Copy link
Member Author

Thanks for your input! I will update this PR tomorrow to include a libMesh update as well. The only thing holding this back on testing at the moment is a FBLASLAPACK download issue during PETSc alt container creation....working with Logan on it.

@cticenhour cticenhour requested a review from roystgnr as a code owner January 5, 2024 15:50
cticenhour added a commit to cticenhour/moose that referenced this pull request Jan 5, 2024
libmesh Outdated Show resolved Hide resolved
@roystgnr
Copy link
Contributor

roystgnr commented Jan 8, 2024

If there's more futzing to be done anyway, it wouldn't hurt to get libMesh up to libMesh/libmesh@3fcc7e7 - the additional changelog entries would be:

  • Fixes for VTK .pvtu reading, and preservation of libMesh node, element, and subdomain ids written to VTK files
  • Elem::volume() no longer can throw an exception when called on an element without an invertible mapping; hence print_info() and get_info() are now safe to use when emitting debugging messages about twisted geometric elements.

IMHO neither of those are super high priority if you just want to focus on getting the PETSc issues fixed and getting this in.

@cticenhour
Copy link
Member Author

@roystgnr The GCC min / PETSc-alt failures are all tied to the FBLASLAPACK download issue (which is preventing the build of a new container for those tests), but the new app failures I am seeing are due to the libMesh additions. Do you mind taking a look today? They all seem to be related to the nodeset changes or similar.

@roystgnr
Copy link
Contributor

roystgnr commented Jan 8, 2024

Will do! Looks like we've got a segfault in a Malamute test and 7 (looks like 2 independent) exodiffs in Bison tests?

@roystgnr
Copy link
Contributor

roystgnr commented Jan 9, 2024

The good news is that I understand the MALAMUTE failure now.

The bad news is that the MALAMUTE failure is coming from MOOSE doing something unsupported and, up until now, getting away with it via dumb luck.

That input deck creates a mesh of QUAD8 elements. It then has MOOSE create Edge3 elements along one side of some of them, because formulation = mortar, and MOOSE manually does a edge3->set_interior_parent(quad8) on each. It then has MOOSE convert all the Quad8 elements to Quad9, because second_order = true, and the trouble with this of course is that you can't actually convert one class into another in C++. What libMesh does to "convert" a mesh like this is to delete all the lower-order elements and allocate higher-order elements to replace them. This has apparently (probably?) been working out for MALAMUTE in the past because the allocated elements have been ending up at the same addresses in memory!? And then those interior_parent pointers stay valid. But nothing guarantees that happening, and any kind of modification of the all_second_order code (much less my recent extensive modifications) can stop it from happening. (as could a change of compiler, or of compiler flags, or of day of the week...)

I'm trying to figure out a way to make this robust.

@lindsayad
Copy link
Member

Where is MOOSE doing this? Doesn’t moose rely on libMesh’s api for converting the mesh to second order?

@lindsayad
Copy link
Member

Haven’t you been doing a bunch of work in these mesh order upgrade methods? Is it possible the behavior is changing in this update? It would seem like quite a coincidence for this to have been working for all CI runs for months

@roystgnr
Copy link
Contributor

roystgnr commented Jan 9, 2024

Where is MOOSE doing this? Doesn’t moose rely on libMesh’s api for converting the mesh to second order?

The set_interior_parent() calls are in LowerDBlockFromSidesetGenerator, I believe. The all_second_order() call is in SetupMeshAction. It is using the libMesh API, but the problem is that even the libMesh API can't guarantee that deallocation and reallocation of objects gives you back the same pointers if only you do it in just the right order. Maybe there's a way to do it with fancy C++ tricks, if we destruct the existing Elem without freeing its memory and then you use placement-new to put the new Elem in its place (assuming all Elem subclasses are exactly the same size, which maybe?), but we don't do that. We just delete the existing Elem and new a replacement, and the memory allocator has apparently been returning the same pointer, but it's not at all guaranteed to.

Haven’t you been doing a bunch of work in these mesh order upgrade methods? Is it possible the behavior is changing in this update? It would seem like quite a coincidence for this to have been working for all CI runs for months

It's definitely triggered by this update (something is performing another allocation in between the delete and the new? I'm now doing the new before the delete?), but it's not caused by this update. I like the "nasal demons" joke about Undefined Behavior, where a compiler is allowed to respond to UB by making demons come out your nose, but really the most insidious allowed response to UB is the one we're seeing here: it's allowed to work the way you hoped right up until for no apparent reason it stops working.

On the bright side, the refactoring should at least make it possible to fix every case of this all in one place.

On the not-so-bright side, the fix is going to make some of my new capabilities asymptotically slower. We can't safely upgrade O(N_range) elements until after we first check O(N_elem) elements to see if they have any interior_parent() links that need to be moved as part of the upgrade. I don't think anybody's going to need them to be O(N_range) but it would have been nice to have had the option.

Copy link
Contributor

@roystgnr roystgnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the PETSc part of the change, but everything on the libMesh side should be ready to go.

Copy link
Member

@loganharbour loganharbour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to add in a minor apptainer change and one more libmesh bump.

@loganharbour loganharbour changed the title Update PETSc to 3.20.3 Update PETSc to 3.20.3, update libmesh Jan 23, 2024
@roystgnr
Copy link
Contributor

everything on the libMesh side should be ready to go.

I'd like to thank Nemesis for only striking me down with a new bug report from Alex, and not with the lightning bolt that my hubris above may have deserved.

I'm looking into it now.

libmesh Show resolved Hide resolved
libmesh Show resolved Hide resolved
@moosebuild
Copy link
Contributor

Job Precheck on fbd5908 wanted to post the following:

The following file(s) are changed:

conda/libmesh/meta.yaml

The libmesh conda configuration has changed; ping @idaholab/moose-dev-ops

@cticenhour
Copy link
Member Author

@milljm Could you please give us another set of eyes for these changes? Thanks!

Copy link
Member

@milljm milljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Conda stuff, all looks well!

@roystgnr
Copy link
Contributor

The fix for that last remaining libMesh bug is in libMesh/libmesh#3772

After we get that to master (hopefully tomorrow), we can update the submodule here and we'll have taken care of everything outstanding I know about.

I'm thinking my approval for merging as-is still stands, though, if anyone is chomping at the bit. The updates in this PR right now include fixes for potentially more serious and definitely more wide-ranging bugs, in long-standing use cases where as far as I can tell we've so far been saved from data corruption by kindly compilers and dumb luck, whereas the bug fixed in libMesh/libmesh#3772 only affects Tet14 refinement, a relatively new feature.

@loganharbour loganharbour merged commit 16e6408 into idaholab:next Jan 27, 2024
101 checks passed
@lindsayad
Copy link
Member

so we didn't get Roy's last fix in?

lindsayad added a commit to lindsayad/moose that referenced this pull request Jan 27, 2024
Refs failures from the combination of idaholab#26462 and idaholab#26227

To avoid negative viscosity interpolations I shrank the growth
factor from 2 to 1.5. Once I did this I got a CSVDiff which signaled
to me that the original result was solved to too loose a tolerance.
So I removed the steady state detection, which has potential issues
with it (see idaholab#26312), and extended the `end_time` until we hit
a fairly tight absolute tolerance
milljm pushed a commit to milljm/moose that referenced this pull request Jan 29, 2024
milljm pushed a commit to milljm/moose that referenced this pull request Jan 29, 2024
@roystgnr
Copy link
Contributor

Nope; sorry. Hope you're okay working from a manual submodule update until next month.

@roystgnr
Copy link
Contributor

Looks like we missed the LIBMESH_REV update in docker_ci/Dockerfile here? And yet that didn't cause any problems for users?

@loganharbour
Copy link
Member

Looks like we missed the LIBMESH_REV update in docker_ci/Dockerfile here? And yet that didn't cause any problems for users?

I need to remove the docker_ci folder; it's not used anymore.

schakrabortygithub pushed a commit to schakrabortygithub/moose that referenced this pull request Mar 12, 2024
schakrabortygithub pushed a commit to schakrabortygithub/moose that referenced this pull request Mar 12, 2024
Refs failures from the combination of idaholab#26462 and idaholab#26227

To avoid negative viscosity interpolations I shrank the growth
factor from 2 to 1.5. Once I did this I got a CSVDiff which signaled
to me that the original result was solved to too loose a tolerance.
So I removed the steady state detection, which has potential issues
with it (see idaholab#26312), and extended the `end_time` until we hit
a fairly tight absolute tolerance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update PETSc to 3.20.3
6 participants