Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Дата
Msg-id CAPpHfdvkC23+hTzw+E7LwG23r0dbCo-zT8zeYa47ycbo-XmqCw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sun, Nov 27, 2011 at 10:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff.  This is not good, because you have
no idea what scale the subtype differences will be expressed on.  The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.
Current GiST code only compare penalty values of inserting same tuple. And don't see why it may alters. So, values obtained from subtype_diff and hard-wired values would be never compared each other.
 
2. It's too large/complicated.  You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size.  I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.
I've tested double sorting split algorithm itself pretty much on synthetic datasets. See paper for details. Strategy of separation of different classes of ranges really need more testing. But obtaining large enough real-life datasets is pretty problematic for me.
 
There are a lot of garden-variety coding problems, too, for instance here:

+                 *penalty = Max(DatumGetFloat8(FunctionCall2(
+                     subtype_diff, orig_lower.val, new_lower.val)), 0.0);

which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument.  But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.
Oh, I see. It will be fixed.

I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
because I think these are uncontroversial and in the nature of
"must fix quickly".  The redesign of the penalty and picksplit
functions should be discussed separately.
I think the same.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: information schema/aclexplode doesn't know about default privileges
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Feature proposal: www_fdw