Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPF] different execution time on different platforms #6217

Closed
YeHuanjie opened this issue Jan 15, 2025 · 14 comments · Fixed by #6223
Closed

[PPF] different execution time on different platforms #6217

YeHuanjie opened this issue Jan 15, 2025 · 14 comments · Fixed by #6223

Comments

@YeHuanjie
Copy link
Contributor

YeHuanjie commented Jan 15, 2025

same ppf code, it took about 6-7s on windows, but 700s on ubuntu!
pcl on both platforms is "release", and the same version 1.14.0.
usually, code runs faster on ubuntu than windows.
if i make and install "debug" version first and then switch to "release"(make and install too), it would be "release" version running?
what should i check? any feedback or suggestions are greatly appreciated.

@YeHuanjie YeHuanjie added kind: bug Type of issue status: triage Labels incomplete labels Jan 15, 2025
@YeHuanjie
Copy link
Contributor Author

YeHuanjie commented Jan 15, 2025

the following code is pretty slow on ubuntu20.04:

auto start = std::chrono::high_resolution_clock::now();
pcl::PointCloud<pcl::PPFSignature>::Ptr cloud_model_ppf(new pcl::PointCloud<pcl::PPFSignature>());
pcl::PPFEstimation<pcl::PointNormal, pcl::PointNormal, pcl::PPFSignature> ppf_estimator;
ppf_estimator.setInputCloud(ppf_model);
ppf_estimator.setInputNormals(ppf_model);
ppf_estimator.compute(*cloud_model_ppf);

pcl::PPFHashMapSearch::Ptr hashmap_search(new pcl::PPFHashMapSearch(10.0f / 180.0f * float(M_PI), 0.01f));
hashmap_search->setInputFeatureCloud(cloud_model_ppf);

std::cout << "Registering models to scene ..." << std::endl;

pcl::PPFRegistration<pcl::PointNormal, pcl::PointNormal> ppf_registration;
ppf_registration.setSceneReferencePointSamplingRate(5);
ppf_registration.setPositionClusteringThreshold(0.01f);
ppf_registration.setRotationClusteringThreshold(20.0f / 180.0f * float(M_PI));
ppf_registration.setSearchMethod(hashmap_search);
ppf_registration.setInputSource(ppf_model);
ppf_registration.setInputTarget(ppf_scene);

pcl::PointCloud<pcl::PointNormal>::Ptr cloud_output_subsampled(new pcl::PointCloud<pcl::PointNormal>());
ppf_registration.align(*cloud_output_subsampled);

auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = end - start;
ppf_time_ = elapsed.count();
std::cout << "ppf time: " << ppf_time_ << std::endl;

@mvieth
Copy link
Member

mvieth commented Jan 15, 2025

if i make and install "debug" version first and then switch to "release"(make and install too), it would be "release" version running?

Usually yes, but if you want to make sure, I would recommend to completely remove the installed PCL files (the ones in /usr), remove the temporary build files (so the build directory), and finally build and install from scratch.

I assume you set CMAKE_BUILD_TYPE to Release? Do you do this when you compile PCL and when you compile your own code?

@YeHuanjie
Copy link
Contributor Author

YeHuanjie commented Jan 15, 2025

I assume you set CMAKE_BUILD_TYPE to Release? Do you do this when you compile PCL and when you compile your own code?

yes, both pcl and my project set CMAKE_BUILD_TYPE "Release"

CMakeCache.txt

//Choose the type of build, options are: None Debug Release RelWithDebInfo
// MinSizeRel.
CMAKE_BUILD_TYPE:STRING=Release

@mvieth
Copy link
Member

mvieth commented Jan 15, 2025

If you post the point clouds you used in the code above, I can test how long it takes on my computer.

Also, you could try switching to a different compiler on Ubuntu. Either a newer version of the same compiler, or to a completely different compiler (clang if you use gcc now, or gcc if you use clang now).

Either way, I think it is unlikely that this is caused by a bug in PCL.

@DirtyLI
Copy link

DirtyLI commented Jan 16, 2025

it took me 550s to finish ppf, including hash search and registration.

point clouds.zip

@YeHuanjie
Copy link
Contributor Author

it took me 550s to finish ppf, including hash search and registration.

point clouds.zip

by the way, the type of point cloud is pcl::PointNormal

@mvieth
Copy link
Member

mvieth commented Jan 16, 2025

For me it takes roughly 200s. Most of the time is spent in std::unordered_multimap which is used by pcl::PPFHashMapSearch. That could be an explanation for the different running times on Windows and Ubuntu: different C++ standard libraries are used on the different operating systems, which may implement std::unordered_multimap differently. The shape of the point cloud might also play a role. It could also make sense to take a look at the hashing function. I will do that in a few days, when I have more time.

@mvieth mvieth added module: registration and removed status: triage Labels incomplete labels Jan 16, 2025
@YeHuanjie
Copy link
Contributor Author

Hi, I simply changed the hash calculation(in pcl-pcl-1.14.0/registration/include/pcl/registration/ppf_registration.h line 74), and the execution time reduced from 500s to 28s! However, I am not sure whether the results are the same.

    std::size_t
    operator()(const HashKeyStruct& s) const noexcept
    {
      const std::size_t h1 = std::hash<int>{}(s.first);
      const std::size_t h2 = std::hash<int>{}(s.second.first);
      const std::size_t h3 = std::hash<int>{}(s.second.second.first);
      const std::size_t h4 = std::hash<int>{}(s.second.second.second);
      // return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3);
      return h1 + 17 * h2 + 257 * h3 + 65537 * h4;
    }

@mvieth
Copy link
Member

mvieth commented Jan 17, 2025

Interesting. Why did you choose h1 + 17 * h2 + 257 * h3 + 65537 * h4? Is this formula known to not produce any hash collisions? Or would that depend on the value ranges of h1, h2, h3, h4?

@YeHuanjie
Copy link
Contributor Author

YeHuanjie commented Jan 18, 2025

First of all, it should be noted that I am not an expert in this field.

I remember that the advantage of using prime multiplication to calculate hash is that it is fast to compute. Since the choice of primes is random, it can effectively avoid collisions and has a more uniform distribution.

Another highly recommended approach is to use boost::hash_combine.
The Boost hash class can be referenced at boost-hash.
It might be simplified to the following function:

template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
  seed ^= (value + (value >> 3)) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
      const std::size_t h1 = std::hash<int>{}(s.first);
      const std::size_t h2 = std::hash<int>{}(s.second.first);
      const std::size_t h3 = std::hash<int>{}(s.second.second.first);
      const std::size_t h4 = std::hash<int>{}(s.second.second.second);
      // return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3);
      std::size_t seed = 0;
      hash_combine_impl(seed, h1);
      hash_combine_impl(seed, h2);
      hash_combine_impl(seed, h3);
      hash_combine_impl(seed, h4);
      return seed;
}

The rewritten hash_combine_impl function took 13.95 seconds for the same point cloud data. Maybe I need to test it on more data.

@mvieth
Copy link
Member

mvieth commented Jan 18, 2025

It might be simplified to the following function:

template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
  seed ^= (value + (value >> 3)) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

But that is for the case that std::size_t is less than 32 bits, or am I misinterpreting the boost code?

I wrote a simple program to test for hash collisions (not an exhaustive test, of course):

std::size_t hashfunc(int a, int b, int c, int d) {
  const std::size_t h1 = std::hash<int>{}(a);
  const std::size_t h2 = std::hash<int>{}(b);
  const std::size_t h3 = std::hash<int>{}(c);
  const std::size_t h4 = std::hash<int>{}(d);
  std::size_t seed = 0;
  hash_combine_impl(seed, h1);
  hash_combine_impl(seed, h2);
  hash_combine_impl(seed, h3);
  hash_combine_impl(seed, h4);
  return seed;
}

int main() {
  std::map<std::size_t, std::tuple<int, int, int, int>> contains;
  for(int d=-30; d<50; ++d) {
    std::cout << "d=" << d << std::endl;
    for(int c=-30; c<30; ++c) {
      for(int b=-30; b<30; ++b) {
        for(int a=-30; a<30; ++a) {
          std::size_t hash = hashfunc(a, b, c, d);
          if(contains.count(hash)==1) {
            if(std::get<0>(contains[hash])!=a || std::get<1>(contains[hash])!=b || std::get<2>(contains[hash])!=c || std::get<3>(contains[hash])!=d) {
              std::cout << "Hash collision! " << a << " " << b << " " << c << " " << d << " " << std::get<0>(contains[hash]) << " " << std::get<1>(contains[hash]) << " " << std::get<2>(contains[hash]) << " " << std::get<3>(contains[hash]) << " " << hashfunc(a, b, c, d) << " " << hashfunc(std::get<0>(contains[hash]), std::get<1>(contains[hash]), std::get<2>(contains[hash]), std::get<3>(contains[hash])) << std::endl;
            }
          } else {
            contains[hash]={a, b, c, d};
          }
        }
      }
    }
  }
}

and it is displaying lots of collisions. However, this one seems to work nicely:

std::size_t hashfunc(int a, int b, int c, int d) {
  //RS hash function https://www.partow.net/programming/hashfunctions/index.html
   std::size_t b_    = 378551;
   std::size_t a_    = 63689;
   std::size_t hash = 0;
   hash = hash * a_ + a;
   a_    = a_ * b_;
   hash = hash * a_ + b;
   a_    = a_ * b_;
   hash = hash * a_ + c;
   a_    = a_ * b_;
   hash = hash * a_ + d;
   return hash;

Perhaps you can test how fast your program runs if you use this one.

@YeHuanjie
Copy link
Contributor Author

YeHuanjie commented Jan 20, 2025

I tried different hash functions on the same data, here is the results.
To mitigate the effects of randomness, each program was tested 5 times.
1、RS-hash:12.1925 seconds

inline std::size_t hashfunc(int a, int b, int c, int d)
{
  std::size_t b_    = 378551;
  std::size_t a_    = 63689;
  std::size_t hash = 0;
  hash = hash * a_ + a;
  a_    = a_ * b_;
  hash = hash * a_ + b;
  a_    = a_ * b_;
  hash = hash * a_ + c;
  a_    = a_ * b_;
  hash = hash * a_ + d;
  return hash;
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  std::size_t seed = 0;
  seed = hashfunc(s.first, s.second.first, s.second.second.first, s.second.second.second);
  return seed;
}

2、RS-hash after std::hash:12.3736 seconds

inline std::size_t hashfunc(int a, int b, int c, int d)
{
  std::size_t b_    = 378551;
  std::size_t a_    = 63689;
  std::size_t hash = 0;
  hash = hash * a_ + a;
  a_    = a_ * b_;
  hash = hash * a_ + b;
  a_    = a_ * b_;
  hash = hash * a_ + c;
  a_    = a_ * b_;
  hash = hash * a_ + d;
  return hash;
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  const std::size_t h1 = std::hash<int>{}(s.first);
  const std::size_t h2 = std::hash<int>{}(s.second.first);
  const std::size_t h3 = std::hash<int>{}(s.second.second.first);
  const std::size_t h4 = std::hash<int>{}(s.second.second.second);

  std::size_t seed = 0;
  seed = hashfunc(h1, h2, h3, h4);
  return seed;
}

3、boost hash after std::hash——12.2473 seconds

template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
  seed ^= (value + (value >> 3)) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  const std::size_t h1 = std::hash<int>{}(s.first);
  const std::size_t h2 = std::hash<int>{}(s.second.first);
  const std::size_t h3 = std::hash<int>{}(s.second.second.first);
  const std::size_t h4 = std::hash<int>{}(s.second.second.second);

  std::size_t seed = 0;
  hash_combine_impl(seed, h1);
  hash_combine_impl(seed, h2);
  hash_combine_impl(seed, h3);
  hash_combine_impl(seed, h4);
  return seed;
}

4、origin version——513.917 seconds

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  const std::size_t h1 = std::hash<int>{}(s.first);
  const std::size_t h2 = std::hash<int>{}(s.second.first);
  const std::size_t h3 = std::hash<int>{}(s.second.second.first);
  const std::size_t h4 = std::hash<int>{}(s.second.second.second);
  return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3);
}

@mvieth
Copy link
Member

mvieth commented Jan 20, 2025

Great, thanks for testing! So approach 1. RS-hash seems to be best. If you like, feel free to open a pull request to change the PCL code to that one. If you do, maybe don't define a separate hashfunc but instead put the complete code into operator().

@YeHuanjie
Copy link
Contributor Author

pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants