-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using physical threads #641
base: master
Are you sure you want to change the base?
using physical threads #641
Conversation
if (sysctlbyname("hw.physicalcpu", &cores, &size, nullptr, 0) == 0) | ||
physicalCoreCount = static_cast<unsigned int>(cores); | ||
|
||
#elif defined(__linux__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small word of caution: this will always report the full number of physical cores on the system, independent of possible restrictions on the executable itself. On computing center nodes, the system may limit a job to a part of the available hardware (e.g. via taskset
); if this happens, the code here will overestimate the number of usable cores.
On systems with power and efficiency cores the code will probably (this is just speculation, I haven't tested it) return the sum of both core counts, which may also be larger than the amount of cores that should actually be used.
I'm not saying that the code should be changed (I wouldn't know how...), but this could be worth remembering in case there are reports about weird behaviour in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this emerged from some discussions with @blackwer yesterday. I'll see if I find a way to figure out the active cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest looking at hwloc
for inspiration, and it looks like you've found a very similar solution that they did. For posterity:
https://github.com/open-mpi/hwloc/blob/1779eae26f2f55a510e9d73cb648b68e71d9e6d4/hwloc/topology-linux.c#L1036
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest libpthreads
... it seems this functionality is available everywhere these days :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the functionality in pthread? I didn't go the route of hwloc because I read somewhere that is an issue on maybe freebsd? This seemed more portable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ducc
I'm using
size_t available_hardware_threads()
{
static const size_t available_hardware_threads_ = []()
{
#if __has_include(<pthread.h>) && defined(__linux__) && defined(_GNU_SOURCE)
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
size_t res=0;
for (size_t i=0; i<CPU_SETSIZE; ++i)
if (CPU_ISSET(i, &cpuset)) ++res;
#else
size_t res = std::max<size_t>(1, std::thread::hardware_concurrency());
#endif
return res;
}();
return available_hardware_threads_;
}
which is very close to your getAllowedCoreCount
function, but it gets the affinity masks from libpthreads
instead of the GNU libc
. It might be more portable, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seek opportunities to delete code :) Thanks
Attempting to fix #596
Two main reasons:
Using HT cores impairs performance due to extra scheduling and puts unnecessary strain onto the memory controller