Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

Поиск
Список
Период
Сортировка
От Nikita Glukhov
Тема Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops
Дата
Msg-id 4450e7a6-01e7-0fb2-a01e-98fb5405d217@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On 30.01.2017 12:04, Kyotaro HORIGUCHI wrote:
> Hello,
>
> At Mon, 30 Jan 2017 07:12:03 +0300, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote
> in <9ea5b157-478c-8874-bc9b-050076b7d042@postgrespro.ru>:
>> Working on the tests for new SP-GiST opclasses for polygons and
>> circles, I've found a bug in the SP-GiST box_ops (added in 9.6): some operators
>> (&<, &>, $<|, |&>) have wrong tests in spg_box_quad_inner_consistent().
>> This obviously leads to incorrect results of a SP-GiST index scan (see tests
>> in the attached patch, their results were taken from a sequential scan).
>
> Your problem is not necessarily evident for others.  Perhaps you
> have to provide an explanation and/or a test case that describes
> what is wrong for you so that others can get a clue for this
> problem. Simpler test is better.

The problem is that there are mixed low/high values for x/y axes.  For example,
overLeft4D() compares x-RangeBox rect_box->range_box_x with y-Range &query->right,
and also lower2D() here must use query->high instead of query->low.

The corresponding test is very simple: insert 10000 nearly arbitrary boxes and
see the difference between results of sequential scan and results of index scan.

regression=# CREATE TEMPORARY TABLE quad_box_tbl (b box);

regression=# INSERT INTO quad_box_tbl
    SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
    FROM generate_series(1, 100) x,
         generate_series(1, 100) y;

regression=# CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);

regression=# SET enable_seqscan = ON;
regression=# SET enable_indexscan = OFF;
regression=# SET enable_bitmapscan = OFF;

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<  box '((100,200),(300,500))';
  count
-------
   2900
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &>  box '((100,200),(300,500))';
  count
-------
   9100
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box '((100,200),(300,500))';
  count
-------
   4900
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box '((100,200),(300,500))';
  count
-------
   8100
(1 row)

regression=# SET enable_seqscan = OFF;
regression=# SET enable_indexscan = ON;
regression=# SET enable_bitmapscan = ON;

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<  box '((100,200),(300,500))';
  count
-------
   2430
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &>  box '((100,200),(300,500))';
  count
-------
   6208
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box '((100,200),(300,500))';
  count
-------
   1048
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box '((100,200),(300,500))';
  count
-------
   5084
(1 row)


> The test,
>
> | +INSERT INTO quad_box_tbl
> | +    SELECT i, box(point(x, y), point(x + w, y + h))
> | +    FROM (SELECT i,
> | +                random() * 1000 as x, random() * 1000 as y,
> | +                random() * 20 as w, random() * 20 as h
>
> is inserting quad_boxes generated using random numbers then,
>
> | +SELECT count(*) FROM quad_box_tbl WHERE b <<  box '((100,200),(300,500))';
> | + count
> | +-------
> | +   891
>
> counting them in this way is unstable. Even though this were
> stable due to a fixed initial, this would be unacceptable, I
> think. This kind of test should use nonrandom numbers.

I have replaced random data in this test with stable uniformly distributed data.
I would also like to use existing box data from rect.data that is loaded to
slow_emp4000 table in copy.sql test, but box.sql test is executed before copy.sql.


> Even though I don't understand this in depth, the following
> change seems somewhat wrong in direction. Changing the second
> argument type seems breaking the basis of the design.
>
> | -lower2D(RangeBox *range_box, Range *query)
> | +lower2D(RangeBox *range_box, double query)

Maybe.  I also thought that these changes are quite doubtful, so I decided to
replace lowerEqual2D()/higherEqual2D() with overlower2D()/overhigher2D() and
not to touch lower2D()/higher2D().

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Corey Huinker
Дата:
Сообщение: Re: [HACKERS] COPY as a set returning function
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Improvements in psql hooks for variables