Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple program fails with mixed AMD/Intel machines #74

Open
snps-jfaust opened this issue Jan 10, 2025 · 0 comments
Open

Simple program fails with mixed AMD/Intel machines #74

snps-jfaust opened this issue Jan 10, 2025 · 0 comments

Comments

@snps-jfaust
Copy link

Works with only Intel or only AMD nodes. With a mix, it always fails in the same way. It completes loop 3 and gets stuck in the MPI_Barrier for loop 4.

#include
#include <mpi.h>

int main()
{
    int argc = 0;
    MPI_Init(&argc, nullptr);

    const int count = 100;
    for (int i = 0; i < count; ++i)
    {
        std::cout << " Attempting Barrier " << i + 1 << std::endl;
        MPI_Barrier(MPI_COMM_WORLD);
        std::cout << " Completed Barrier " << i + 1 << std::endl;
    }

    MPI_Finalize();
}

command line, from intel_machine:
mpiexec -hosts 2 localhost amd_machine -wdir "\network\path" \path-to-exe

output:

[0] Attempting Barrier 1
[1] Attempting Barrier 1
[0] Completed Barrier 1
[0] Attempting Barrier 2
[1] Completed Barrier 1
[0] Completed Barrier 2
[1] Attempting Barrier 2
[0] Attempting Barrier 3
[0] Completed Barrier 3
[0] Attempting Barrier 4
[1] Completed Barrier 2
[1] Attempting Barrier 3
[1] Completed Barrier 3
[1] Attempting Barrier 4

job aborted:
[ranks] message

[0] terminated

[1] fatal error
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(MPI_COMM_WORLD) failed
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.  (errno 10060)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant