Just a note about column equivalence disarming the planner

Поиск
Список
Период
Сортировка
От Shaun Thomas
Тема Just a note about column equivalence disarming the planner
Дата
Msg-id 4E17668F.1090603@peak6.com
обсуждение исходный текст
Список pgsql-performance
Hello folks,

This isn't really a problem, so much as an observation of just how much
the internals have changed over the years. We've got an older version
we're hoping to upgrade soon, and a developer asked me to optimize this
today:

SELECT order_id
   FROM order
  WHERE order_id = root_order_id
    AND order_id IN (
          SELECT DISTINCT m.root_order_id
            FROM wacky_orders c
            JOIN order m USING (root_order_id)
           WHERE m.order_type = 'regular'
           GROUP BY m.root_order_id, m.route_id
          HAVING COUNT(1) > 1
        );

 From what I could tell, the query was fine. But this part of the
explain was confusing the hell out of me:

->  Seq Scan on order  (cost=0.00..218943.98 rows=24092 width=20)
      Filter: (order_id = root_order_id)

The thing is, that subquery there only produced 150 rows. So I shrugged
and simplified further by making a temp table, and got this:

SELECT order_id
   FROM order m
   JOIN zany_orders z ON (m.order_id = z.root_order_id)
  WHERE m.order_id = m.root_order_id;

Which produced this:

Merge Join  (cost=220705.29..220826.19 rows=1 width=10)
  Merge Cond: (m.order_id = z.root_order_id)
  ->  Sort  (cost=220697.42..220757.65 rows=24092 width=20)
        Sort Key: m.order_id
        ->  Seq Scan on order m  (cost=0.00..218943.98 rows=24092 width=20)
              Filter: (order_id = root_order_id)
  ->  Sort  (cost=7.87..8.24 rows=149 width=11)
        Sort Key: z.root_order_id
        ->  Seq Scan on zany_orders z  (cost=0.00..2.49 rows=149 width=11)

Ok, now it's just screwing with me. The order table has about 5M rows,
and this is clearly not a good idea, here. But then I took a closer
look. Why did it decide to filter based on a condition 90% of the table
fits, and then *merge* those results in with the 150-row temp table?

So, for giggles, I cast a column type to tell the planner it shouldn't
consider the columns equivalent:

SELECT master_order_id
   FROM order m
   JOIN zany_orders z ON (m.order_id = z.root_order_id)
  WHERE m.order_id::VARCHAR = m.root_order_id;

And voila:

Nested Loop  (cost=0.00..839.82 rows=1 width=8)
   ->  Seq Scan on zany_orders z  (cost=0.00..2.49 rows=149 width=11)
   ->  Index Scan using order_pkey on order m  (cost=0.00..5.60 rows=1
width=8)
         Index Cond: (m.order_id = z.root_order_id)
         Filter: ((order_id)::varchar = root_order_id)

I tried this with a mere 9.0 install and it wasn't having any of it. It
completely ignored the red-herring WHERE clause except as a post-filter.
I'm pretty sure that if it were possible to manifest itself to slap me
for even trying, it would have done so.

I've noticed lots of little things like this recently, and I have to
say, the planner has made huge improvements regardless of what
perception may reflect sometimes. It still has some holes and room for
improvement, but I just wanted to thank the devs for all their hard work.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: "VACUUM FULL ANALYZE" vs. Autovacuum Contention
Следующее
От: Anish Kejariwal
Дата:
Сообщение: issue with query optimizer when joining two partitioned tables