Re: BUG #15795: ERROR: could not find pathkey item to sort

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: BUG #15795: ERROR: could not find pathkey item to sort
Дата
Msg-id da30ee93-1655-2220-f413-29017c7606a5@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: BUG #15795: ERROR: could not find pathkey item to sort  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #15795: ERROR: could not find pathkey item to sort  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On 2019/05/08 19:33, Tom Lane wrote:
> For the archives, though, this isn't hard to reproduce:
> 
> regression=# create table person(_id int, _actual_type_name text, name text);
> CREATE TABLE
> regression=# SELECT  DISTINCT  A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
>  FROM  ( (   SELECT  DISTINCT _id as _id0, _actual_type_name as
> _actual_type_name0, name as name0  FROM  person  ) union all  ( 
> SELECT  DISTINCT  _id as _id0, _actual_type_name as _actual_type_name0, name
> as name0  FROM  person)) as A WHERE ( A.name0  =  A.name0 );
> ERROR:  could not find pathkey item to sort
> 
> Curiously, this only fails for me in 9.6 and 10, not earlier or later
> branches.

Here are my findings after debugging this on 9.5, 9.6, 10, and 11:

As you also said, the problem only seems to occur in 9.6 and 10.  It does
because the MergeAppend path created for UNION ALL subquery in the outer
query's FROM contains one pathkey item too many in its pathkeys field.
That's because each of its child subpaths has pathkeys (_id,
_actual_type_name, name), whereas the outer query only wants (_id,
_actual_type_name).  I hoped that convert_subquery_pathkeys() called
during child subquery's path creation would've taken the unnecessary
"name" out because the outer query doesn't need it, but it didn't.  It
fails to do so because the subquery's pathkey "name" is being matched
successfully to an outer EC installed due to the outer query's WHERE
(A.name0 = A.name0); that's via the following code in
convert_subquery_pathkeys():

                outer_ec =
                    get_eclass_for_sort_expr(root,
                                             outer_expr,

I checked if and how convert_subquery_pathkeys() determines which of the
subquery's pathkeys are useful to the outer query and there does exist a
scoring system to evaluate the usefulness of subquery's pathkeys to outer
query, but I didn't fully understand it.  In any case, the main problem
seems to be that convert_subquery_pathkeys() can't keep "name" from
appearing in the output pathkeys that it produces.  Based on that premise,
I added the following code to convert_subquery_pathkeys():

+
+        if (retvallen == outer_query_keys)
+             break;

which seems to fix the issue.  Alternatively, maybe we can apply
truncate_useless_pathkeys() to the result of convert_subquery_pathkeys().

The problem doesn't manifest with 9.5 or 11 (even HEAD for that matter),
because a sort-based path is not chosen in their case for unique-fying
(UNION ALL uses Append, not MergeAppend on cost grounds), so there's no
attempt to look for "pathkey item to sort" to begin with.

In 11's (and HEAD's) case, even if MergeAppend had won in terms of
costing, the problem wouldn't occur at least for this query, because
"name" doesn't appear in the outer query's ECs due to the following commit:

  commit 8ec5429e2f422f4d570d4909507db0d4ca83bbac
  Author: Tom Lane <tgl@sss.pgh.pa.us>
  Date:   Sun Oct 8 12:23:32 2017 -0400

      Reduce "X = X" to "X IS NOT NULL", if it's easy to do so.

However it's easy to tweak the query such that "name" *will* end up in
outer query's ECs as follows:

SELECT  DISTINCT  A._id0 as _id0, A._actual_type_name0 as _actual_type_name0
 FROM  ( (   SELECT  DISTINCT _id as _id0, _actual_type_name as
_actual_type_name0, name as name0  FROM  person  ) union all  (
SELECT  DISTINCT  _id as _id0, _actual_type_name as _actual_type_name0, name
as name0  FROM  person)) as A INNER JOIN person A1 ON ( A.name0  =  A1.name );

Now, A.name0 and A1.name in the join condition do form a an EC, which does
trick convert_subquery_pathkeys() into accepting the child subqueries'
"name" key.

IOW, if the "fix" I mentioned above is correct, it will have to applied to
all the branches, because this seems to be a fundamental problem with
convert_subquery_pathkeys().

Thoughts?

Thanks,
Amit




В списке pgsql-bugs по дате отправления:

Предыдущее
От: Monica Maria Piza Merchan
Дата:
Сообщение: prerunscript.command.line.error
Следующее
От: Francisco Olarte
Дата:
Сообщение: Re: prerunscript.command.line.error