Re: WIP: BRIN multi-range indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: WIP: BRIN multi-range indexes
Дата
Msg-id 1388c876-2d07-c694-4f95-37032d711627@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Hi,

Attached is an updated version of the patch series, addressing all the 
failures on cfbot (at least I hope so). This turned out to be more fun 
than I expected, as the issues went unnoticed on 64-bits and only failed 
on 32-bits. That's also why I'm not entirely sure this will make cfbot 
happy as that seems to be x86_64, but the issues are real so let's see.

1) I already outlined the issue in the previous message:

     MAXALIGN(a * b) != MAXALIGN(a) * b

and there's an assert that we used exactly the same amount of memory we 
allocated, so this caused a crash. Strange that it'd fail on 32-bits and 
not 64-bits, but perhaps there's some math reason for that, or maybe it 
was just pure luck.


2) The rest of the issues generally boils down to types that are byval 
on 64-bits, but byref on 32-bits. Like int8 or float8 for example. The 
first place causing issues were cross-type operators, i.e. the bloom 
opclasses did things like this in pg_amop.dat:

   { amopfamily => 'brin/integer_bloom_ops', amoplefttype => 'int2',
     amoprighttype => 'int8', amopstrategy => '1',
     amopopr => '=(int2,int8)', amopmethod => 'brin' },

so it was possible to do this:

     WHERE int8column = 1234::int2

in which case we used the int8 opclass, so the consistent function 
thought it's working with int8, and used the hash function defined for 
that opclass in pg_amproc. That's hashint8 of course, but we called that 
on Datum storing int2. Clearly, dereferencing that pointer is guaranteed 
to fail with a segfault.

I think there are two options to fix this. Firstly, we can remove the 
cross-type operators, so that the left/right type is always the same. 
That'll work fine for most cases, and it's pretty simple. It's also what 
the hash_ops opclasses do, so I've done that.

An alternative would be to do something like minmax does for stategies, 
and consider the subtype (i.e. type of the right argument). It's trick a 
bit tricky, though, because it assumes the hash functions for the two 
types are "compatible" and produce the same hash for the same value. 
AFAIK that's correct for the usual cases (int2/int4/int8) and it'd be 
restricted by pg_amop. But hash_ops don't do that for some reason, so I 
wonder what am I missing. (The other thing is where to define these hash 
functions - right now pg_amproc only tracks hash function for the "base" 
data type, and there may be multiple supported subtypes, so where to 
store that? Perhaps we could use the hash function from the default hash 
opclass for each type.)

Anyway, I've decided to keep this simple for now, and I've ripped-out 
the cross-type operators. We can add that back later, if needed.


3) There were a couple byref failures in the distance functions, which 
generally used "double" internally (which I'm not sure is guaranteed to 
be 64-bit types) instead of float8, and used plain "return" instead of 
PG_RETURN_FLOAT8() in a couple places. Silly mistakes.


4) A particulary funny mistake was in actually calculating the hashes 
for bloom filter, which is using hash_uint32_extended (so that we can 
seed it). The trouble is that while hash_uint32() returns uint32, 
hash_uint32_extended returns ... uint64. So we calculated a hash, but 
then used the *pointer* to the uint64 value, not the value. I have to 
say, the "uint32" in the function name is somewhat misleading.


This passes all my tests, including valgrind on the 32-bit rpi4 machine, 
the stress test (testing both the bloom and multi-minmax opclasses etc.)


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Custom compression methods
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)