Re: BUG #14107: Major query planner bug regarding subqueries and indices

Поиск

Список

Период

Сортировка

От	Mathias Kunter
Тема	Re: BUG #14107: Major query planner bug regarding subqueries and indices
Дата	22 апреля 2016 г. 14:02:55
Msg-id	69e376c7-6c2c-c96d-2ad6-3734affb59b7@gmail.com обсуждение
Ответ на	Re: BUG #14107: Major query planner bug regarding subqueries and indices (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: BUG #14107: Major query planner bug regarding subqueries and indices
Список	pgsql-bugs

Дерево обсуждения

> Thanks for the report, but I think your subject line is a little
> exaggerated.

Sorry for that :-)

> I'd not actually thought about expanding this to IN and NOT IN, but
> likely it would be possible, providing NULLs could be handled
> correctly too. Was this the optimisation you think is missing?

No - let's consider a real-world example where the problem becomes more
evident. Assume we have the following two tables:

CREATE TABLE book (id SERIAL NOT NULL, name VARCHAR, authorId SERIAL,
CONSTRAINT book_pkey PRIMARY KEY (id));

CREATE TABLE author (id SERIAL NOT NULL, surname VARCHAR, CONSTRAINT
author_pkey PRIMARY KEY (id));

All relevant columns are indexed. The book table contains 1.3 M rows.

Now let's say we want to find books by either name or by author name. A
simple problem, actually. The following query is used (which could of
course also be rewritten to use a JOIN instead, but that's not the point
here). We find out that it has an unexpected and unacceptably long
execution time:


EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' OR authorId IN
(SELECT id FROM author WHERE surname = 'Rowling');
                                         QUERY PLAN
-------------------------------------------------------------------------------------------
  Seq Scan on book  (cost=13.68..26792.88 rows=576709 width=40)
    Filter: (((name)::text = 'Harry Potter'::text) OR (hashed SubPlan 1))
    SubPlan 1
      ->  Bitmap Heap Scan on author  (cost=4.20..13.67 rows=6 width=4)
            Recheck Cond: ((surname)::text = 'Rowling'::text)
            ->  Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
                  Index Cond: ((surname)::text = 'Rowling'::text)


In contrast to that, consider the query plan for the following logically
equivalent query, which executes in just a few milliseconds (!):

EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' UNION SELECT *
FROM book WHERE authorId IN (SELECT id FROM author WHERE surname =
'Rowling');
                                                  QUERY PLAN
------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=454.87..455.99 rows=112 width=29)
    Group Key: book.id, book.name, book.authorid
    ->  Append  (cost=0.43..454.03 rows=112 width=29)
          ->  Index Scan using book_name_index on book
(cost=0.43..16.48 rows=3 width=29)
                Index Cond: ((name)::text = 'Harry Potter'::text)
          ->  Nested Loop  (cost=4.63..436.43 rows=109 width=29)
                ->  Bitmap Heap Scan on author  (cost=4.20..13.67 rows=6
width=4)
                      Recheck Cond: ((surname)::text = 'Rowling'::text)
                      ->  Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
                            Index Cond: ((surname)::text = 'Rowling'::text)
                ->  Index Scan using book_authorid_index on book book_1
(cost=0.43..70.28 rows=18 width=29)
                      Index Cond: (authorid = author.id)

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #14107: Major query planner bug regarding subqueries and indices