Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What pext does is different from what the pext instruction does #8

Open
jason7708 opened this issue Sep 17, 2024 · 1 comment
Open

Comments

@jason7708
Copy link
Contributor

When I tried certain inputs and did not enable -mbmi2, my mask was calculated as 163839. The compilation crashed because the nbits calculation produced an extremely large number. And I found that the issue was caused by an incorrect clz calculaiton.

constexpr auto clz = [](auto x) {
  decltype(x) n{};
  if (not x) return decltype(x)(sizeof(x) * __CHAR_BIT__);
  while (x >>= 1u) n++;
  return n;
};
constexpr auto mbits = __builtin_popcountl(mask);
constexpr auto nbits = size - mbits - clz(mask.value);
constexpr auto coefficient = [&] {
  auto set = false;
  auto dst = nbits;
  T result{};
  for (auto i = 0u; i < size; ++i) {
    const auto curr = ((T(1) << i) & mask) != T();
    if (curr and not set) result = result | (T(1) << (dst - i));
    dst += curr;
    set = curr;
  }
  return result;
}();
return (((a & mask) * coefficient) >> nbits) & ((T(1) << mbits) - T(1));

I fixed the error and submitted a pull request.

In addition, I found that the pext algorithm here is not correct.
The article from the link you provided mentions that when the gaps between the bits we are interested in are too small, multiplying by a coefficient can cause addition that leads to a carry, which makes the result incorrect.

Assuming the mask is 21 and a is also 21, the coefficient will become (4 + 2 + 1) = 7 but the result will be 4 while it should be 7.

    0 0 0 1 0 1 0 1   (value << 0)
    0 0 1 0 1 0 1 0   (value << 1)
    0 1 0 1 0 1 0 0   (value << 2)    +
---------------------------------------

You can easily generate the mask as 21 using the following input and the lookup result will be incorrect.

static constexpr std::array inputs {
    std::pair {0u, 0u},    // 00000
    std::pair {1u, 1u},    // 00001
    std::pair {4u, 2u},    // 00100
    std::pair {5u, 3u},    // 00101
    std::pair {16u, 4u},   // 10000
    std::pair {17u, 5u},   // 10001
    std::pair {20u, 6u},   // 10100
    std::pair {21u, 7u},   // 10101
};

I also checked the Intel repository link you provided, and it has the same issue with pseudo_pext_t. However, their perfect hash lookup does not produce errors because they use pseudo_pext_t to ensure that hash values are unique when selecting masks. In contrast, MPH only checks the key & mask.
I also posted an issue there, but I'm not sure if this was an intentional design choice.

Since I’m unsure how you would like the modification to be made, if the goal is to ensure mask selection like Intel's approach, the behavior of the pext function still appears inconsistent and somewhat strange. Ensuring adequate gaps between mask bits would increase the computation, which doesn’t seem like a very practical solution. Therefore, for now, I have only fixed the clz function.

@kris-jusiak
Copy link
Contributor

thanks @jason7708 , you are absolutely correct and thanks for the clz fix. Either fix would be very appreciated, these computations are only at compile time so as long as it wont blow up compilation times any solution qoukd be great, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants