Re: b-tree index search algorithms

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: b-tree index search algorithms
Дата
Msg-id 13532.1342546724@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: b-tree index search algorithms  (Samuel Vogel <s@muel-vogel.de>)
Ответы Re: b-tree index search algorithms  (Samuel Vogel <s@muel-vogel.de>)
Список pgsql-hackers
Samuel Vogel <s@muel-vogel.de> writes:
> Am 17.07.12 05:21, schrieb Tom Lane:
>> Samuel Vogel <s@muel-vogel.de> writes:
>>> I'm currently on a university research project if performance could be
>>> increased by substituting different inter-node search algorithms instead
>>> of the currently used binary search.

>> Hm, what have you got in mind exactly?

> At first I will try a simple interpolation search, but problems start 
> there since I need to have a numerical representation of the index keys 
> (or map them to one) to do the interpolation.

Dunno about that.  btree knows nothing about the datatypes it's working
on except that they have comparison functions.  Converting the values
to some sort of numeric scale that you can interpolate on seems
logically dubious and fraught with practical difficulties.  Now, we do
have some code in selfuncs.c that tries to do that, for some data types,
but it's only for planner estimation purposes, and we don't rely very
heavily on its results even in that context.  Depending on it to be
right for search purposes sounds pretty scary.

>> Not clear what you did wrong from this amount of detail, but integer
>> keys ought to be pretty obvious at the debugger level.

> Okay, to be more specific: Printing 
> 'DatumGetInt32(scankey->sk_argument)' in '_bt_compare' never shows me 50 
> when I execute this query: SELECT * FROM simpletest WHERE id = 50;

Um, what does it show you?  DatumGetInt32 is a macro, and at least in
gdb that won't work at all:

(gdb) p DatumGetInt32(scankey->sk_argument)
No symbol "DatumGetInt32" in current context.

However, just looking at the value produces sane answers for me:

(gdb) p *scankey
$1 = {sk_flags = 196608, sk_attno = 1, sk_strategy = 0, sk_subtype = 23,  sk_collation = 0, sk_func = {fn_addr =
0x486ec0<btint4cmp>, fn_oid = 351,    fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000',    fn_stats = 2 '\002',
fn_extra= 0x0, fn_mcxt = 0x2b82fe8, fn_expr = 0x0},  sk_argument = 50}
 
(gdb) p scankey->sk_argument 
$2 = 50

>> PG's b-trees do not hash anything.  If you're not seeing interpretable
>> key values then you're doing something wrong in your inspection
>> methodology.

> Okay, how are indexes on char/text columns handled then?

The datum values will be pointers to strings.

> ... is my assumption correct, that in 
> that case the Datum is only a link to where the actual data is stored 
> and only 'scankey->sk_func' knows how to make use of it (compare it).
> In that case it would be extremly hard to get to a numeric 
> representation which can be used for the interpolation.

The btree code is (or reasonably can be) aware that such values are
pass-by-reference, and how to get to the bits.  But the comparison
semantics of two different values are not something it knows about
except by asking the comparison function.  This can be quite a
nontrivial matter even for text, since we follow strcoll() comparison
rules.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: CompactCheckpointerRequestQueue versus pad bytes
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: isolation check takes a long time