Re: [HACKERS] CUBE seems a bit confused about ORDER BY

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: [HACKERS] CUBE seems a bit confused about ORDER BY
Дата
Msg-id CAPpHfds92FTR3E+QUZSUr4WJx5WZU7Pht9ivUgOe2QZ7-GfUuQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] CUBE seems a bit confused about ORDER BY  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: [HACKERS] CUBE seems a bit confused about ORDER BY  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Список pgsql-hackers
On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I've noticed this suspicious behavior of "cube" data type with ORDER BY,
which I believe is a bug in the extension (or the GiST index support).
The following example comes directly from regression tests added by
33bd250f (so CC Teodor and Stas, who are mentioned in the commit).

This query should produce results with ordering "ascending by 2nd
coordinate or upper right corner". To make it clear, I've added the
"c~>4" expression to the query, otherwise it's right from the test.

test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
 c~>4 |             c
------+---------------------------
   50 | (30333, 50),(30273, 6)
   75 | (43301, 75),(43227, 43)
  142 | (19650, 142),(19630, 51)
  160 | (2424, 160),(2424, 81)
  171 | (3449, 171),(3354, 108)
  155 | (18037, 155),(17941, 109)
  208 | (28511, 208),(28479, 114)
  217 | (19946, 217),(19941, 118)
  191 | (16906, 191),(16816, 139)
  187 | (759, 187),(662, 163)
  266 | (22684, 266),(22656, 181)
  255 | (24423, 255),(24360, 213)
  249 | (45989, 249),(45910, 222)
  377 | (11399, 377),(11360, 294)
  389 | (12162, 389),(12103, 309)
(15 rows)

As you can see, it's not actually sorted by the c~>4 coordinate (but by
c~>2, which it the last number).

Moreover, disabling index scans fixes the ordering:

test=# set enable_indexscan = off;
SET
test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
ascending by 2nd coordinate or upper right corner
 ?column? |             c
----------+---------------------------
       50 | (30333, 50),(30273, 6)
       75 | (43301, 75),(43227, 43)
      142 | (19650, 142),(19630, 51)
      155 | (18037, 155),(17941, 109)
      160 | (2424, 160),(2424, 81)
      171 | (3449, 171),(3354, 108)
      187 | (759, 187),(662, 163)
      191 | (16906, 191),(16816, 139)
      208 | (28511, 208),(28479, 114)
      217 | (19946, 217),(19941, 118)
      249 | (45989, 249),(45910, 222)
      255 | (24423, 255),(24360, 213)
      266 | (22684, 266),(22656, 181)
      367 | (31018, 367),(30946, 333)
      377 | (11399, 377),(11360, 294)
(15 rows)


Seems like a bug somewhere in gist_cube_ops, I guess?

+1,
that definitely looks like a bug. Thank you for reporting!
I'll take a look on it in couple days.

I've reviewed code of ~> operator and its KNN-GiST support.  Unfortunately, it appears that it's broken in design...  The explanation is above.

We've following implementation of ~> operator.

if (coord <= DIM(cube))
{
if (IS_POINT(cube))
PG_RETURN_FLOAT8(cube->x[coord - 1]);
else
PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
cube->x[coord - 1 + DIM(cube)]));
}
else
{
if (IS_POINT(cube))
PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
else
PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
cube->x[coord - 1 - DIM(cube)]));
}

Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N - 1] are upper bounds.  However, we might have indexed cubes of different dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects lower bound of 3rd dimension.

Therefore, despite ~> operator was specially invented to be supported by KNN-GIST, it can't serve this way.  

Regarding particular case discovered by Tomas, I think the error is in the GiST supporting code.

if (strategy == CubeKNNDistanceCoord)
{
int coord = PG_GETARG_INT32(1);
if (DIM(cube) == 0)
retval = 0.0;
else if (IS_POINT(cube))
retval = cube->x[(coord - 1) % DIM(cube)];
else
retval = Min(cube->x[(coord - 1) % DIM(cube)],
cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
}

g_cube_distance() always returns lower bound of cube.  It should return upper bound for coord > DIM(cube).

It would be also nice to provide descending ordering using KNN-GiST.  It would be especially effective for upper bound.  Since, KNN-GiST doesn't support explicit descending ordering, it might be useful to make ~> operator return negative of coordinate when negative argument is provided.  For instance, '(1,2), (3,4)'::cube ~> -1 return -1.

I'd like to propose following way to resolve design issue.  cube ~> (2*N - 1) should return lower bound of Nth coordinate of the cube while cube ~> 2*N should return upper bound of Nth coordinate.  Then it would be independent on number of coordinates in particular cube.  For sure, it would be user-visible incompatible change.  But I think there is not so many users of this operator yet.  Also, current behavior of ~> seems quite useless.

Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Implementing pg_receivewal --no-sync
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] [bug fix] postgres.exe crashes with access violation onWindows while starting up