Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ngen build getting an BMI error #50

Closed
benlee0423 opened this issue Nov 21, 2023 · 5 comments · Fixed by #51
Closed

Ngen build getting an BMI error #50

benlee0423 opened this issue Nov 21, 2023 · 5 comments · Fixed by #51
Labels
bug Something isn't working

Comments

@benlee0423
Copy link

benlee0423 commented Nov 21, 2023

Current behavior

Ngen build getting an error.

3725
#18 255.9 [ 68%] Building CXX object test/CMakeFiles/test_bmi_multi.dir/realizations/catchments/Bmi_Cpp_Multi_Array_Test.cpp.o
3726
#18 258.0 gmake[2]: *** [CMakeFiles/ngen.dir/build.make:76: CMakeFiles/ngen.dir/src/NGen.cpp.o] Error 1
3727
#18 258.0 gmake[1]: *** [CMakeFiles/Makefile2:498: CMakeFiles/ngen.dir/all] Error 2
3728
#18 258.0 gmake[1]: *** Waiting for unfinished jobs....

Expected behavior

Build without error

Steps to replicate behavior (include URLs)

  1. Disable the line 156 in Docker.ngen file.
    156 # && ./build_sub extern/test_bmi_cpp

Recent failed action can be found here.
https://github.com/CIROH-UA/NGIAB-CloudInfra/actions/runs/6949391875

Getting the same error in github runner and locally.

@benlee0423 benlee0423 added the bug Something isn't working label Nov 21, 2023
@hellkite500
Copy link
Collaborator

The build failure comes from a recent merge of NOAA-OWP/ngen#679 . Still digging into the possible regression.

@hellkite500
Copy link
Collaborator

Also, looks like the serial build completes OK:
https://github.com/CIROH-UA/NGIAB-CloudInfra/actions/runs/6949391875/job/18907400883#step:3:3570
but the parallel build is encountering the error. I'm trying to reproduce locally, but haven't been able to as of yet

@program--
Copy link

program-- commented Nov 22, 2023

Doing some investigating led to realizing that the serial build is actually failing due to having build artifacts from the parallel configuration.

After a recent NGen update, we started providing compile-time information at runtime by configuring a header file include/NGenConfig.h with that information. In Dockerfile.ngen, we first configure an out-of-source build for a serial version, which modifies NGenConfig.h, then immediately after configure a parallel version, which again modifies NGenConfig.h.

Then, we first build the parallel version (which succeeds because it is linked to MPI as expected), then build the serial version:

for BUILD_DIR in
  $(if [ "${BUILD_NGEN_PARALLEL}" == "true" ]; then echo "cmake_build_parallel"; fi)
  $(if [ "${BUILD_NGEN_SERIAL}" == "true" ]; then echo "cmake_build_serial"; fi)
do
    cmake --build $BUILD_DIR --target all -j $(nproc)
done

This means the serial build, gets the parallel header, hence why the MPI functions are not defined, but why NGEN_WITH_MPI is set to 1.

Solution

Configure and build each version individually instead of in tandem, such that modify the Dockerfile to do

serial configure -> serial build

then

parallel configure -> parallel build

instead of

serial configure -> parallel configure

then

parallel build -> serial build

Alternative Solution

ngen can potentially output the header into out-of-source build directories, so that each configured build has its own header with the rest of its files. This is probably the better way to handle this, but may take some time.

@arpita0911patel
Copy link
Member

@benlee0423 are we good to close this Issue since mentioned tickets are merged and closed?

@benlee0423
Copy link
Author

Yeah, we can close this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants