Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Дата
Msg-id CAASwCXdc1Yh10=gS1JjtF=Yn-zRsTG01k0Yw3M2xVrs7K5zq6w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Список pgsql-hackers
Many thanks for working on this, amazing work, really nice you made it a separate reusable Perl-module.

The generated hash functions reads one character at a time.
I've seen a performance trick in other hash functions [1]
to instead read multiple bytes in each iteration,
and then handle the remaining bytes after the loop.


I've done some testing and it looks like a ~30% speed-up of the generated
ScanKeywords_hash_func() function would be possible.

If you think this approach is promising, I would be happy to prepare a patch for it,
but I wanted to check with the project this idea has not already been considered and ruled out
for some technical reasons I've failed to see, is there any?

For this to work you would need to use larger constants for $hash_mult1 and $hash_mult2 though.
I've successfully used these values:
$hash_mult1 0x2c1b3c6d
$hash_mult2 (0x297a2d39, 0x85ebca6b, 0xc2b2ae35, 0x7feb352d, 0x846ca68b)

Here is the idea:

Generated C-code:

  for (; keylen >= 4; keylen -= 4, k += 4)
  {
    uint32_t v;
    memcpy(&v, k, 4);
    v |= 0x20202020;
    a = a * 739982445 + v;
    b = b * 2246822507 + v;
  }
  uint32_t v = 0;
  switch (keylen)
  {
  case 3:
    memcpy(&v, k, 3);
    v |= 0x202020;
    break;
  case 2:
    memcpy(&v, k, 2);
    v |= 0x2020;
    break;
  case 1:
    memcpy(&v, k, 1);
    v |= 0x20;
    break;
  }
  a = a * 739982445 + v;
  b = b * 2246822507 + v;
  return h[a % 883] + h[b % 883];

(Reding 8 bytes a time instead would perhaps be a win since some keywords are quite long.)

Perl-code:

sub _calc_hash
{
my ($key, $mult, $seed) = @_;

my $result = $seed;
my $i=0;
my $keylen = length($key);

for (; $keylen>=4; $keylen-=4, $i+=4) {
my $cn = (ord(substr($key,$i+0,1)) << 0)
       | (ord(substr($key,$i+1,1)) << 8)
| (ord(substr($key,$i+2,1)) << 16)
| (ord(substr($key,$i+3,1)) << 24);
$cn |= 0x20202020 if $case_fold;
$result = ($result * $mult + $cn) % 4294967296;
}

my $cn = 0;
if ($keylen == 3) {
$cn = (ord(substr($key,$i+0,1)) << 0)
       | (ord(substr($key,$i+1,1)) << 8)
| (ord(substr($key,$i+2,1)) << 16);
$cn |= 0x202020 if $case_fold;
} elsif ($keylen == 2) {
$cn = (ord(substr($key,$i+0,1)) << 0)
       | (ord(substr($key,$i+1,1)) << 8);
$cn |= 0x2020 if $case_fold;
} elsif ($keylen == 1) {
$cn = (ord(substr($key,$i+0,1)) << 0);
$cn |= 0x20 if $case_fold;
}
$result = ($result * $mult + $cn) % 4294967296;

return $result;
}



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: PostgreSQL pollutes the file system
Следующее
От: legrand legrand
Дата:
Сообщение: Re: [survey] New "Stable" QueryId based on normalized query text