Re: knngist - 0.8

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: knngist - 0.8
Дата
Msg-id AANLkTinu0hr-=uMOBgxiBqR-JPfp+kBKNC9hdVf-93Ge@mail.gmail.com
обсуждение исходный текст
Ответ на Re: knngist - 0.8  (Teodor Sigaev <teodor@sigaev.ru>)
Ответы Re: knngist - 0.8  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
2010/11/12 Teodor Sigaev <teodor@sigaev.ru>:
>> Slightly-more-fleshed out proof of concept patch attached, with actual
>> syntax, documentation, and pg_dump support added.  This might be
>> thought of as a subset of the builtin_knngist_core patch, without the
>> parts that make it actually do something useful (which is mostly
>> match_pathkey_to_index - which I'm still rather hoping to abstract in
>> some way via the access method interface, though I'm currently unsure
>> what the best way to do that is).
>
> I don't see in your patch how to propagate knowledge of kind of operation
> (AMOP_SEARCH or AMOP_ORDER) to GiST and consistent method. For both of them
> they aren't distinguishable. That's not acceptably for both, because GiST
> needs to choose right traversal algorithm, consistentFn needs role to decide
> return infinity or false (-1) value.

My version of the patch is suffering from a bad case of not being
done.  It sort of started out as a proof-of-concept to show what the
syntax might look like and seems to be gradually growing into
something more real.  I'm not sure what we'll end up with.

> My variants informs GiST by SK_ORDER flags and consistentFn looks at
> strategy number (strategy numbers are different for different purposes).

Yeah.  At ten thousand feet, I think the open design question here is
to what extent it's OK to rely on the fact that the ORDER BY clauses
we wish to optimize happen to look a lot like the WHERE clauses we
already know how to optimize: namely, they're both binary opclauses of
the form <indexed-column> <op> <constant>.  Your patch manages to
reuse a LOT of existing machinery by shoving ordering expressions
through the same code paths that quals take.  Code reuse is generally
a good thing, but here's we're forming RestrictInfo and ScanKey
objects out of things that are neither restrictions nor keys, which
might lead to maintainability problems down the road.  I'd like to get
some input from Tom on how he feels about that, and any alternatives
he sees.

It seems to me that our concept of ScanDirection is really woefully
under-expressive.  For example, given:

CREATE TABLE foo (   id integer NOT NULL,   name character varying NOT NULL,   PRIMARY KEY (id)
);

We use the index for the first of these but not the second:

select * from foo order by id nulls last;
select * from foo order by id nulls first;

In an ideal world, we'd like to handle the second one by finding the
first non-NULL entry in the index, scanning away from the NULLs, and
then, when we run off the end, looping back around to spit out the
NULL entries.  Or, similarly, if we have an index on (id, name), we
can handle the first two of the following with the index, but not the
second two:

select * from foo order by id, name;
select * from foo order by id desc, name desc;
select * from foo order by id, name desc;
select * from foo order by id, name nulls last;
(can't handle this last one even if name is marked NOT NULL!)

It seems like it might even be possible to handle these (albeit maybe
not real efficiently) via a sufficiently complicated scanning order,
but even if we had the code to do the scan, we have no way of telling
the index what scan direction we want: forward, backward, and no
movement don't really cut it.  The idea that forward and backward mean
anything at all is really pretty btree-centric.

The trick, of course, is to come up with something better.  I have a
vague notion of using something like an array or list of objects of
the form <index column, asc/desc, nulls first/last, value (null except
for knngist)>.  That could easily be extended with collation or
whatever as needed.  The trick is - you need to create this object and
then ask the index - hey, is this something you can support?  And the
index needs to then respond by saying whether it can do that (and
maybe doing some sort of translation into what instructions should be
provided at execution time).

Trouble is, that sounds like a lot of work on code I'm not altogether
comfortable with, and I'd really like to get KNNGIST in shape for 9.1
if possible.  I'm not real sure where to draw the line between, on the
one hand, not wanting to commit something that is excessively crufty
and, on the other hand, wanting to finish in finite time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Spread checkpoint sync
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Latches with weak memory ordering (Re: max_wal_senders must die)