Re: Postgres refusing to use >1 core

Поиск
Список
Период
Сортировка
От Shaun Thomas
Тема Re: Postgres refusing to use >1 core
Дата
Msg-id 4DCC0167.8080903@peak6.com
обсуждение исходный текст
Ответ на Re: Postgres refusing to use >1 core  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Postgres refusing to use >1 core
Список pgsql-performance
On 05/12/2011 09:51 AM, Tom Lane wrote:

> It does.  I was a bit surprised that Shaun apparently got a plan that
> didn't include a materialize step, because when I test a similar query
> here, I get:

Remember when I said "old version" that prevented us from using CTEs?
We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0).
It's basically calculating the group aggregation wrong, but is that
enough to trigger it to go nuts?

SELECT c.*
  FROM customer c
  JOIN (SELECT session_id, count(1) as counter
          FROM session
         WHERE date_created >= '2011-05-11 05:00'
           AND date_created < '2011-05-11 06:00'
           AND from_interface = 'UNKNOWN'
         GROUP BY 1) a ON (c.customer_id = a.session_id)
 WHERE c.process_date = CURRENT_DATE - interval '1 day'
   AND c.row_out IS NULL;

So sayeth the planner:

 Nested Loop  (cost=167.49..2354.62 rows=6 width=237) (actual time=43.949..166858.604 rows=168 loops=1)
   ->  GroupAggregate  (cost=167.49..176.97 rows=2 width=8) (actual time=1.042..2.827 rows=209 loops=1)
         ->  Sort  (cost=167.49..170.64 rows=1260 width=8) (actual time=1.037..1.347 rows=230 loops=1)
               Sort Key: session.session_id
               ->  Index Scan using idx_session_date_created on session  (cost=0.00..102.61 rows=1260 width=8) (actual
time=0.044.
.0.690 rows=230 loops=1)
                     Index Cond: ((date_created >= '11-MAY-11 05:00:00'::timestamp without time zone) AND (date_created
<'11-MAY-11 06:00:00':: 
timestamp without time zone))
                     Filter: ((from_interface)::text = 'UNKNOWN'::text)
   ->  Index Scan using idx_customer_customer_id on customer c  (cost=0.00..1088.78 rows=3 width=237) (actual
time=19.820..798.348rows=1 loops= 
209)
         Index Cond: (c.customer_id = a.session_id)
         Filter: ((process_date = (('now'::text)::date - '@ 1 day'::interval)) AND (row_out IS NULL))
 Total runtime: 166859.040 ms

That one hour extract is much, much slower than this:

SELECT 1
  FROM customer c
  JOIN (SELECT session_id, count(*) as counter
            FROM session
           WHERE date_created >= '2011-05-08'
           GROUP BY 1) a ON (c.customer_id = a.session_id)
 WHERE c.process_date = CURRENT_DATE
   AND c.row_out IS NULL;

Which gives this plan:

 Merge Join  (cost=244565.52..246488.78 rows=377 width=0) (actual time=1958.781..2385.667 rows=22205 loops=1)
   Merge Cond: (a.session_id = c.customer_id)
   ->  GroupAggregate  (cost=19176.22..20275.99 rows=271 width=8) (actual time=1142.179..1459.779 rows=26643 loops=1)
         ->  Sort  (cost=19176.22..19541.68 rows=146184 width=8) (actual time=1142.152..1374.328 rows=179006 loops=1)
               Sort Key: session.session_id
               ->  Index Scan using idx_session_date_created on session  (cost=0.00..6635.51 rows=146184 width=8)
(actualtime=0.0 
20..160.339 rows=179267 loops=1)
                     Index Cond: (date_created >= '08-MAY-11 00:00:00'::timestamp without time zone)
   ->  Sort  (cost=225389.30..225797.47 rows=163267 width=8) (actual time=816.585..855.459 rows=155067 loops=1)
         Sort Key: c.customer_id
         ->  Index Scan using idx_customer_rpt on customer c  (cost=0.00..211252.93 rows=163267 width=8) (actual
time=0.037..90.337rows=155067  
loops=1)
               Index Cond: (process_date = '10-MAY-11 00:00:00'::timestamp without time zone)
               Filter: (row_out IS NULL)

But make the inner query slightly smaller, and...

 Nested Loop  (cost=13755.53..223453.98 rows=276 width=0)
   ->  GroupAggregate  (cost=13755.53..14558.26 rows=198 width=8)
         ->  Sort  (cost=13755.53..14022.28 rows=106700 width=8)
               Sort Key: session.session_id
               ->  Index Scan using idx_session_date_created on session  (cost=0.00..4844.37 rows=106700 width=8)
                     Index Cond: (date_created >= '09-MAY-11 00:00:00'::timestamp without time zone)
   ->  Index Scan using idx_customer_customer_id on customer c  (cost=0.00..1055.01 rows=1 width=8)
         Index Cond: (c.customer_id = a.session_id)
         Filter: ((process_date = '10-MAY-11 00:00:00'::timestamp without time zone) AND (row_out IS NULL))

I didn't want to wait two hours for that to finish. ;) But the
stats are all pretty darn close, so far as I can tell. The only
thing that's off is the group aggregate... by about two orders
of magnitude. So I just chalked it up to 8.2 being relatively
horrible, and punted to just using a temp table to trick the
optimizer into doing it right.

But my greater point was that even doing it all in SQL doesn't
always work, which we all know. Use of EXPLAIN abounds, but that
doesn't necessarily mean a dev will know how to fix a bad plan.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

В списке pgsql-performance по дате отправления:

Предыдущее
От: Willy-Bas Loos
Дата:
Сообщение: since when has pg_stat_user_indexes.idx_scan been counting?
Следующее
От: Eric McKeeth
Дата:
Сообщение: Re: Postgres refusing to use >1 core