What pext does is different from what the pext instruction does #8

jason7708 · 2024-09-17T06:47:16Z

When I tried certain inputs and did not enable -mbmi2, my mask was calculated as 163839. The compilation crashed because the nbits calculation produced an extremely large number. And I found that the issue was caused by an incorrect clz calculaiton.

constexpr auto clz = [](auto x) {
  decltype(x) n{};
  if (not x) return decltype(x)(sizeof(x) * __CHAR_BIT__);
  while (x >>= 1u) n++;
  return n;
};
constexpr auto mbits = __builtin_popcountl(mask);
constexpr auto nbits = size - mbits - clz(mask.value);
constexpr auto coefficient = [&] {
  auto set = false;
  auto dst = nbits;
  T result{};
  for (auto i = 0u; i < size; ++i) {
    const auto curr = ((T(1) << i) & mask) != T();
    if (curr and not set) result = result | (T(1) << (dst - i));
    dst += curr;
    set = curr;
  }
  return result;
}();
return (((a & mask) * coefficient) >> nbits) & ((T(1) << mbits) - T(1));

I fixed the error and submitted a pull request.

In addition, I found that the pext algorithm here is not correct.
The article from the link you provided mentions that when the gaps between the bits we are interested in are too small, multiplying by a coefficient can cause addition that leads to a carry, which makes the result incorrect.

Assuming the mask is 21 and a is also 21, the coefficient will become (4 + 2 + 1) = 7 but the result will be 4 while it should be 7.

    0 0 0 1 0 1 0 1   (value << 0)
    0 0 1 0 1 0 1 0   (value << 1)
    0 1 0 1 0 1 0 0   (value << 2)    +
---------------------------------------

You can easily generate the mask as 21 using the following input and the lookup result will be incorrect.

static constexpr std::array inputs {
    std::pair {0u, 0u},    // 00000
    std::pair {1u, 1u},    // 00001
    std::pair {4u, 2u},    // 00100
    std::pair {5u, 3u},    // 00101
    std::pair {16u, 4u},   // 10000
    std::pair {17u, 5u},   // 10001
    std::pair {20u, 6u},   // 10100
    std::pair {21u, 7u},   // 10101
};

I also checked the Intel repository link you provided, and it has the same issue with pseudo_pext_t. However, their perfect hash lookup does not produce errors because they use pseudo_pext_t to ensure that hash values are unique when selecting masks. In contrast, MPH only checks the key & mask.
I also posted an issue there, but I'm not sure if this was an intentional design choice.

Since I’m unsure how you would like the modification to be made, if the goal is to ensure mask selection like Intel's approach, the behavior of the pext function still appears inconsistent and somewhat strange. Ensuring adequate gaps between mask bits would increase the computation, which doesn’t seem like a very practical solution. Therefore, for now, I have only fixed the clz function.

The text was updated successfully, but these errors were encountered:

kris-jusiak · 2024-09-17T15:08:08Z

thanks @jason7708 , you are absolutely correct and thanks for the clz fix. Either fix would be very appreciated, these computations are only at compile time so as long as it wont blow up compilation times any solution qoukd be great, thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What pext does is different from what the pext instruction does #8

What pext does is different from what the pext instruction does #8

jason7708 commented Sep 17, 2024

kris-jusiak commented Sep 17, 2024

What pext does is different from what the pext instruction does #8

What pext does is different from what the pext instruction does #8

Comments

jason7708 commented Sep 17, 2024

kris-jusiak commented Sep 17, 2024