Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

Поиск

Список

Период

Сортировка

От	Joel Jacobson
Тема	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Дата	20 марта 2019 г. 19:55:56
Msg-id	CAASwCXdc1Yh10=gS1JjtF=Yn-zRsTG01k0Yw3M2xVrs7K5zq6w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Список	pgsql-hackers

Дерево обсуждения

Many thanks for working on this, amazing work, really nice you made it a separate reusable Perl-module.

The generated hash functions reads one character at a time.

I've seen a performance trick in other hash functions [1]

to instead read multiple bytes in each iteration,

and then handle the remaining bytes after the loop.

[1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29

I've done some testing and it looks like a ~30% speed-up of the generated

ScanKeywords_hash_func() function would be possible.

If you think this approach is promising, I would be happy to prepare a patch for it,

but I wanted to check with the project this idea has not already been considered and ruled out

for some technical reasons I've failed to see, is there any?

For this to work you would need to use larger constants for $hash_mult1 and $hash_mult2 though.

I've successfully used these values:

$hash_mult1 0x2c1b3c6d

$hash_mult2 (0x297a2d39, 0x85ebca6b, 0xc2b2ae35, 0x7feb352d, 0x846ca68b)

Here is the idea:

Generated C-code:

for (; keylen >= 4; keylen -= 4, k += 4)

{

uint32_t v;

memcpy(&v, k, 4);

v |= 0x20202020;

a = a * 739982445 + v;

b = b * 2246822507 + v;

}

uint32_t v = 0;

switch (keylen)

{

case 3:

memcpy(&v, k, 3);

v |= 0x202020;

break;

case 2:

memcpy(&v, k, 2);

v |= 0x2020;

break;

case 1:

memcpy(&v, k, 1);

v |= 0x20;

break;

}

a = a * 739982445 + v;

b = b * 2246822507 + v;

return h[a % 883] + h[b % 883];

(Reding 8 bytes a time instead would perhaps be a win since some keywords are quite long.)

Perl-code:

sub _calc_hash

{

my ($key, $mult, $seed) = @_;

my $result = $seed;

my $i=0;

my $keylen = length($key);

for (; $keylen>=4; $keylen-=4, $i+=4) {

my $cn = (ord(substr($key,$i+0,1)) << 0)

| (ord(substr($key,$i+1,1)) << 8)

| (ord(substr($key,$i+2,1)) << 16)

| (ord(substr($key,$i+3,1)) << 24);

$cn |= 0x20202020 if $case_fold;

$result = ($result * $mult + $cn) % 4294967296;

}

my $cn = 0;

if ($keylen == 3) {

$cn = (ord(substr($key,$i+0,1)) << 0)

| (ord(substr($key,$i+1,1)) << 8)

| (ord(substr($key,$i+2,1)) << 16);

$cn |= 0x202020 if $case_fold;

} elsif ($keylen == 2) {

$cn = (ord(substr($key,$i+0,1)) << 0)

| (ord(substr($key,$i+1,1)) << 8);

$cn |= 0x2020 if $case_fold;

} elsif ($keylen == 1) {

$cn = (ord(substr($key,$i+0,1)) << 0);

$cn |= 0x20 if $case_fold;

}

$result = ($result * $mult + $cn) % 4294967296;

return $result;

}

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)