[PPF] different execution time on different platforms #6217

YeHuanjie · 2025-01-15T08:21:37Z

same ppf code, it took about 6-7s on windows, but 700s on ubuntu!
pcl on both platforms is "release", and the same version 1.14.0.
usually, code runs faster on ubuntu than windows.
if i make and install "debug" version first and then switch to "release"(make and install too), it would be "release" version running?
what should i check? any feedback or suggestions are greatly appreciated.

YeHuanjie · 2025-01-15T08:53:33Z

the following code is pretty slow on ubuntu20.04:

auto start = std::chrono::high_resolution_clock::now();
pcl::PointCloud<pcl::PPFSignature>::Ptr cloud_model_ppf(new pcl::PointCloud<pcl::PPFSignature>());
pcl::PPFEstimation<pcl::PointNormal, pcl::PointNormal, pcl::PPFSignature> ppf_estimator;
ppf_estimator.setInputCloud(ppf_model);
ppf_estimator.setInputNormals(ppf_model);
ppf_estimator.compute(*cloud_model_ppf);

pcl::PPFHashMapSearch::Ptr hashmap_search(new pcl::PPFHashMapSearch(10.0f / 180.0f * float(M_PI), 0.01f));
hashmap_search->setInputFeatureCloud(cloud_model_ppf);

std::cout << "Registering models to scene ..." << std::endl;

pcl::PPFRegistration<pcl::PointNormal, pcl::PointNormal> ppf_registration;
ppf_registration.setSceneReferencePointSamplingRate(5);
ppf_registration.setPositionClusteringThreshold(0.01f);
ppf_registration.setRotationClusteringThreshold(20.0f / 180.0f * float(M_PI));
ppf_registration.setSearchMethod(hashmap_search);
ppf_registration.setInputSource(ppf_model);
ppf_registration.setInputTarget(ppf_scene);

pcl::PointCloud<pcl::PointNormal>::Ptr cloud_output_subsampled(new pcl::PointCloud<pcl::PointNormal>());
ppf_registration.align(*cloud_output_subsampled);

auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = end - start;
ppf_time_ = elapsed.count();
std::cout << "ppf time: " << ppf_time_ << std::endl;

mvieth · 2025-01-15T09:42:19Z

if i make and install "debug" version first and then switch to "release"(make and install too), it would be "release" version running?

Usually yes, but if you want to make sure, I would recommend to completely remove the installed PCL files (the ones in /usr), remove the temporary build files (so the build directory), and finally build and install from scratch.

I assume you set CMAKE_BUILD_TYPE to Release? Do you do this when you compile PCL and when you compile your own code?

YeHuanjie · 2025-01-15T09:44:48Z

I assume you set CMAKE_BUILD_TYPE to Release? Do you do this when you compile PCL and when you compile your own code?

yes, both pcl and my project set CMAKE_BUILD_TYPE "Release"

CMakeCache.txt

//Choose the type of build, options are: None Debug Release RelWithDebInfo
// MinSizeRel.
CMAKE_BUILD_TYPE:STRING=Release

mvieth · 2025-01-15T10:23:18Z

If you post the point clouds you used in the code above, I can test how long it takes on my computer.

Also, you could try switching to a different compiler on Ubuntu. Either a newer version of the same compiler, or to a completely different compiler (clang if you use gcc now, or gcc if you use clang now).

Either way, I think it is unlikely that this is caused by a bug in PCL.

DirtyLI · 2025-01-16T01:55:21Z

it took me 550s to finish ppf, including hash search and registration.

point clouds.zip

YeHuanjie · 2025-01-16T01:58:18Z

it took me 550s to finish ppf, including hash search and registration.

point clouds.zip

by the way, the type of point cloud is pcl::PointNormal

mvieth · 2025-01-16T20:21:21Z

For me it takes roughly 200s. Most of the time is spent in std::unordered_multimap which is used by pcl::PPFHashMapSearch. That could be an explanation for the different running times on Windows and Ubuntu: different C++ standard libraries are used on the different operating systems, which may implement std::unordered_multimap differently. The shape of the point cloud might also play a role. It could also make sense to take a look at the hashing function. I will do that in a few days, when I have more time.

YeHuanjie · 2025-01-17T09:52:02Z

Hi, I simply changed the hash calculation(in pcl-pcl-1.14.0/registration/include/pcl/registration/ppf_registration.h line 74), and the execution time reduced from 500s to 28s! However, I am not sure whether the results are the same.

    std::size_t
    operator()(const HashKeyStruct& s) const noexcept
    {
      const std::size_t h1 = std::hash<int>{}(s.first);
      const std::size_t h2 = std::hash<int>{}(s.second.first);
      const std::size_t h3 = std::hash<int>{}(s.second.second.first);
      const std::size_t h4 = std::hash<int>{}(s.second.second.second);
      // return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3);
      return h1 + 17 * h2 + 257 * h3 + 65537 * h4;
    }

mvieth · 2025-01-17T10:37:02Z

Interesting. Why did you choose h1 + 17 * h2 + 257 * h3 + 65537 * h4? Is this formula known to not produce any hash collisions? Or would that depend on the value ranges of h1, h2, h3, h4?

YeHuanjie · 2025-01-18T03:44:25Z

First of all, it should be noted that I am not an expert in this field.

I remember that the advantage of using prime multiplication to calculate hash is that it is fast to compute. Since the choice of primes is random, it can effectively avoid collisions and has a more uniform distribution.

Another highly recommended approach is to use boost::hash_combine.
The Boost hash class can be referenced at boost-hash.
It might be simplified to the following function:

template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
  seed ^= (value + (value >> 3)) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
      const std::size_t h1 = std::hash<int>{}(s.first);
      const std::size_t h2 = std::hash<int>{}(s.second.first);
      const std::size_t h3 = std::hash<int>{}(s.second.second.first);
      const std::size_t h4 = std::hash<int>{}(s.second.second.second);
      // return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3);
      std::size_t seed = 0;
      hash_combine_impl(seed, h1);
      hash_combine_impl(seed, h2);
      hash_combine_impl(seed, h3);
      hash_combine_impl(seed, h4);
      return seed;
}

The rewritten hash_combine_impl function took 13.95 seconds for the same point cloud data. Maybe I need to test it on more data.

mvieth · 2025-01-18T16:50:05Z

It might be simplified to the following function:

template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
  seed ^= (value + (value >> 3)) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

But that is for the case that std::size_t is less than 32 bits, or am I misinterpreting the boost code?

I wrote a simple program to test for hash collisions (not an exhaustive test, of course):

std::size_t hashfunc(int a, int b, int c, int d) {
  const std::size_t h1 = std::hash<int>{}(a);
  const std::size_t h2 = std::hash<int>{}(b);
  const std::size_t h3 = std::hash<int>{}(c);
  const std::size_t h4 = std::hash<int>{}(d);
  std::size_t seed = 0;
  hash_combine_impl(seed, h1);
  hash_combine_impl(seed, h2);
  hash_combine_impl(seed, h3);
  hash_combine_impl(seed, h4);
  return seed;
}

int main() {
  std::map<std::size_t, std::tuple<int, int, int, int>> contains;
  for(int d=-30; d<50; ++d) {
    std::cout << "d=" << d << std::endl;
    for(int c=-30; c<30; ++c) {
      for(int b=-30; b<30; ++b) {
        for(int a=-30; a<30; ++a) {
          std::size_t hash = hashfunc(a, b, c, d);
          if(contains.count(hash)==1) {
            if(std::get<0>(contains[hash])!=a || std::get<1>(contains[hash])!=b || std::get<2>(contains[hash])!=c || std::get<3>(contains[hash])!=d) {
              std::cout << "Hash collision! " << a << " " << b << " " << c << " " << d << " " << std::get<0>(contains[hash]) << " " << std::get<1>(contains[hash]) << " " << std::get<2>(contains[hash]) << " " << std::get<3>(contains[hash]) << " " << hashfunc(a, b, c, d) << " " << hashfunc(std::get<0>(contains[hash]), std::get<1>(contains[hash]), std::get<2>(contains[hash]), std::get<3>(contains[hash])) << std::endl;
            }
          } else {
            contains[hash]={a, b, c, d};
          }
        }
      }
    }
  }
}

and it is displaying lots of collisions. However, this one seems to work nicely:

std::size_t hashfunc(int a, int b, int c, int d) {
  //RS hash function https://www.partow.net/programming/hashfunctions/index.html
   std::size_t b_    = 378551;
   std::size_t a_    = 63689;
   std::size_t hash = 0;
   hash = hash * a_ + a;
   a_    = a_ * b_;
   hash = hash * a_ + b;
   a_    = a_ * b_;
   hash = hash * a_ + c;
   a_    = a_ * b_;
   hash = hash * a_ + d;
   return hash;

Perhaps you can test how fast your program runs if you use this one.

YeHuanjie · 2025-01-20T06:43:23Z

I tried different hash functions on the same data, here is the results.
To mitigate the effects of randomness, each program was tested 5 times.
1、RS-hash：12.1925 seconds

inline std::size_t hashfunc(int a, int b, int c, int d)
{
  std::size_t b_    = 378551;
  std::size_t a_    = 63689;
  std::size_t hash = 0;
  hash = hash * a_ + a;
  a_    = a_ * b_;
  hash = hash * a_ + b;
  a_    = a_ * b_;
  hash = hash * a_ + c;
  a_    = a_ * b_;
  hash = hash * a_ + d;
  return hash;
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  std::size_t seed = 0;
  seed = hashfunc(s.first, s.second.first, s.second.second.first, s.second.second.second);
  return seed;
}

2、RS-hash after std::hash：12.3736 seconds

inline std::size_t hashfunc(int a, int b, int c, int d)
{
  std::size_t b_    = 378551;
  std::size_t a_    = 63689;
  std::size_t hash = 0;
  hash = hash * a_ + a;
  a_    = a_ * b_;
  hash = hash * a_ + b;
  a_    = a_ * b_;
  hash = hash * a_ + c;
  a_    = a_ * b_;
  hash = hash * a_ + d;
  return hash;
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  const std::size_t h1 = std::hash<int>{}(s.first);
  const std::size_t h2 = std::hash<int>{}(s.second.first);
  const std::size_t h3 = std::hash<int>{}(s.second.second.first);
  const std::size_t h4 = std::hash<int>{}(s.second.second.second);

  std::size_t seed = 0;
  seed = hashfunc(h1, h2, h3, h4);
  return seed;
}

3、boost hash after std::hash——12.2473 seconds

template <typename SizeT>
inline void hash_combine_impl(SizeT& seed, SizeT value)
{
  seed ^= (value + (value >> 3)) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  const std::size_t h1 = std::hash<int>{}(s.first);
  const std::size_t h2 = std::hash<int>{}(s.second.first);
  const std::size_t h3 = std::hash<int>{}(s.second.second.first);
  const std::size_t h4 = std::hash<int>{}(s.second.second.second);

  std::size_t seed = 0;
  hash_combine_impl(seed, h1);
  hash_combine_impl(seed, h2);
  hash_combine_impl(seed, h3);
  hash_combine_impl(seed, h4);
  return seed;
}

4、origin version——513.917 seconds

std::size_t
operator()(const HashKeyStruct& s) const noexcept
{
  const std::size_t h1 = std::hash<int>{}(s.first);
  const std::size_t h2 = std::hash<int>{}(s.second.first);
  const std::size_t h3 = std::hash<int>{}(s.second.second.first);
  const std::size_t h4 = std::hash<int>{}(s.second.second.second);
  return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3);
}

mvieth · 2025-01-20T08:36:52Z

Great, thanks for testing! So approach 1. RS-hash seems to be best. If you like, feel free to open a pull request to change the PCL code to that one. If you do, maybe don't define a separate hashfunc but instead put the complete code into operator().

YeHuanjie · 2025-01-21T00:55:23Z

pull request

YeHuanjie added kind: bug Type of issue status: triage Labels incomplete labels Jan 15, 2025

mvieth added module: registration and removed status: triage Labels incomplete labels Jan 16, 2025

mvieth linked a pull request Jan 21, 2025 that will close this issue

Speed up PPFRegistration by using a hash function with fewer collisions #6223

Merged

mvieth closed this as completed in #6223 Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PPF] different execution time on different platforms #6217

[PPF] different execution time on different platforms #6217

YeHuanjie commented Jan 15, 2025 •

edited

Loading

YeHuanjie commented Jan 15, 2025 •

edited

Loading

mvieth commented Jan 15, 2025

YeHuanjie commented Jan 15, 2025 •

edited

Loading

mvieth commented Jan 15, 2025

DirtyLI commented Jan 16, 2025

YeHuanjie commented Jan 16, 2025

mvieth commented Jan 16, 2025

YeHuanjie commented Jan 17, 2025

mvieth commented Jan 17, 2025

YeHuanjie commented Jan 18, 2025 •

edited

Loading

mvieth commented Jan 18, 2025

YeHuanjie commented Jan 20, 2025 •

edited

Loading

mvieth commented Jan 20, 2025

YeHuanjie commented Jan 21, 2025

[PPF] different execution time on different platforms #6217

[PPF] different execution time on different platforms #6217

Comments

YeHuanjie commented Jan 15, 2025 • edited Loading

YeHuanjie commented Jan 15, 2025 • edited Loading

mvieth commented Jan 15, 2025

YeHuanjie commented Jan 15, 2025 • edited Loading

mvieth commented Jan 15, 2025

DirtyLI commented Jan 16, 2025

YeHuanjie commented Jan 16, 2025

mvieth commented Jan 16, 2025

YeHuanjie commented Jan 17, 2025

mvieth commented Jan 17, 2025

YeHuanjie commented Jan 18, 2025 • edited Loading

mvieth commented Jan 18, 2025

YeHuanjie commented Jan 20, 2025 • edited Loading

mvieth commented Jan 20, 2025

YeHuanjie commented Jan 21, 2025

YeHuanjie commented Jan 15, 2025 •

edited

Loading

YeHuanjie commented Jan 15, 2025 •

edited

Loading

YeHuanjie commented Jan 15, 2025 •

edited

Loading

YeHuanjie commented Jan 18, 2025 •

edited

Loading

YeHuanjie commented Jan 20, 2025 •

edited

Loading