Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using physical threads #641

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

DiamonDinoia
Copy link
Collaborator

@DiamonDinoia DiamonDinoia commented Feb 26, 2025

Attempting to fix #596

Two main reasons:

  • Vector unit are per physical core.
  • Manually vectorized code is (also the FFT) are memory bound.

Using HT cores impairs performance due to extra scheduling and puts unnecessary strain onto the memory controller

if (sysctlbyname("hw.physicalcpu", &cores, &size, nullptr, 0) == 0)
physicalCoreCount = static_cast<unsigned int>(cores);

#elif defined(__linux__)
Copy link
Collaborator

@mreineck mreineck Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small word of caution: this will always report the full number of physical cores on the system, independent of possible restrictions on the executable itself. On computing center nodes, the system may limit a job to a part of the available hardware (e.g. via taskset); if this happens, the code here will overestimate the number of usable cores.
On systems with power and efficiency cores the code will probably (this is just speculation, I haven't tested it) return the sum of both core counts, which may also be larger than the amount of cores that should actually be used.

I'm not saying that the code should be changed (I wouldn't know how...), but this could be worth remembering in case there are reports about weird behaviour in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this emerged from some discussions with @blackwer yesterday. I'll see if I find a way to figure out the active cores.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest looking at hwloc for inspiration, and it looks like you've found a very similar solution that they did. For posterity:
https://github.com/open-mpi/hwloc/blob/1779eae26f2f55a510e9d73cb648b68e71d9e6d4/hwloc/topology-linux.c#L1036

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest libpthreads ... it seems this functionality is available everywhere these days :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the functionality in pthread? I didn't go the route of hwloc because I read somewhere that is an issue on maybe freebsd? This seemed more portable to me

Copy link
Collaborator

@mreineck mreineck Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ducc I'm using

size_t available_hardware_threads()
  {
  static const size_t available_hardware_threads_ = []()
    {
#if __has_include(<pthread.h>) && defined(__linux__) && defined(_GNU_SOURCE)
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
    size_t res=0;
    for (size_t i=0; i<CPU_SETSIZE; ++i)
      if (CPU_ISSET(i, &cpuset)) ++res;
#else
    size_t res = std::max<size_t>(1, std::thread::hardware_concurrency());
#endif
    return res;
    }();
  return available_hardware_threads_;
  }

which is very close to your getAllowedCoreCount function, but it gets the affinity masks from libpthreads instead of the GNU libc. It might be more portable, but I'm not sure.

Copy link
Collaborator

@ahbarnett ahbarnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seek opportunities to delete code :) Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perfomance loss without export OMP_NUM_THREADS=1
4 participants