Re: Fix picksplit with nan values

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Fix picksplit with nan values
Дата
Msg-id 22995.1383935908@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Fix picksplit with nan values  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: Fix picksplit with nan values  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I wrote attached patch by following principles:
> 1) NaN coordinates shouldn't crash or hang GiST.
> 2) NaN coordinates should be processed in GiST index scan like in
> sequential scan.
> 3) NaN coordinates shouldn't lead to significant slowdown.

I looked at this patch for awhile.  It seems like there's still an awful
lot of places in gistproc.c that are not worrying about NANs, and it's not
clear to me that they don't need to.  For instance, despite the changes in
adjustBox(), it'll still be possible to have boxes with NAN boundaries if
all the contained values are all-NAN boxes.  It doesn't seem like
gist_box_penalty will behave very sanely for that; it'll return a NAN
penalty which seems unhelpful.  The static box_penalty function doesn't
work sanely for NANs either, and if it can return NAN then you also have
to worry about NAN deltas in common_entry_cmp.  And isn't it still
possible for the Assert in gist_box_picksplit to fire?

> That could be illustrated on following test-case:

> create table test1 as (select point(random(), random()) as p from
> generate_series(1,1000000));
> create index test1_idx on test1 using gist(p);
> create table temp as (select * from test1);
> insert into temp (select point('nan'::float8, 'nan'::float8) from
> generate_series(1,1000));
> insert into temp (select point('nan'::float8, random()) from
> generate_series(1,1000));
> insert into temp (select point(random(), 'nan'::float8) from
> generate_series(1,1000));
> create table test2 as (select * from temp order by random());
> create index test2_idx on test2 using gist(p);
> drop table temp;

I think this test case is unlikely to generate any all-NAN index entry
boxes, because almost certainly the initial entries will be non-NANs, and
you've got it set up to keep incoming NANs from adjusting the boundaries
of an existing box.  You'd get better code coverage if you started by
inserting some NANs into an empty index.

Also, as a stylistic matter, I thought your previous solution of
copying float8_cmp_internal was a better idea than manually inlining it
(sans comments!) in multiple places as this version does.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: backup.sgml-cmd-v003.patch
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Protocol forced to V2 in low-memory conditions?