Re: Slow HashAggregate/cache access

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Slow HashAggregate/cache access
Дата
Msg-id CAKJS1f8HsLSLKvW6YhCMCTGbAmBAQFBWiyxyv5c6WNo0XB=DAA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Slow HashAggregate/cache access  (Maxim Boguk <maxim.boguk@gmail.com>)
Ответы Re: Slow HashAggregate/cache access
Re: Slow HashAggregate/cache access
Список pgsql-performance
On 6 August 2015 at 06:25, Maxim Boguk <maxim.boguk@gmail.com> wrote:


On Wed, Aug 5, 2015 at 11:41 AM, Alexandre de Arruda Paes <adaldeia@gmail.com> wrote:
Hi,

First, sorry to compare Post with other database system, but I know nothing about Oracle...

This customer have an application made with a framework thats generates the SQL statements (so, We can't make any query optimizations) .

We did the following tests:

1) Postgresql 9.3 and Oracle 10 in a desktop machine(8 GB RAM, 1 SATA disk,Core i5)
2) Postgresql 9.3 in a server + FC storage (128 GB RAM, Xeon 32 cores, SAS disks)


​I think I know where issue is.
The PostgreSQL planner unable pass join conditions into subquery with aggregate functions (it's well known limitation).


I think this statement is quite misleading. Let's look at an example:

create table t1 (a int not null, v int not null);
create table t2 (a int not null);
insert into t1 select s.i,10 from generate_series(1,1000) s(i),generate_series(1,1000);
insert into t2 select generate_series(1,1000);
create index on t1 (a); 


explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s inner join t2 on t2.a = s.a where t2.a = 1;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..59.76 rows=1 width=12)
   ->  GroupAggregate  (cost=0.42..42.24 rows=1 width=8)
         Group Key: t1.a
         ->  Index Scan using t1_a_idx on t1  (cost=0.42..37.38 rows=969 width=8)
               Index Cond: (a = 1)
   ->  Seq Scan on t2  (cost=0.00..17.50 rows=1 width=4)
         Filter: (a = 1)
(7 rows)

As you can see, the predicate is pushes down just fine into a subquery with aggregates.

The likely reason that PostgreSQL Is not behaving the same as SQL Server and Oracle is because the predicate pushdowns are limited to equality operators only as internally these are all represented by a series of "equivalence classes" which in this case say that 1 = t2.a = t1.a, therefore it's possible to apply t1.a = 1 at the lowest level.

These equivalence classes don't currently handle non-equality operators. Here's an example:

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s inner join t2 on t2.a = s.a where t2.a <= 1;
                               QUERY PLAN
------------------------------------------------------------------------
 Hash Join  (cost=19442.51..19466.27 rows=1 width=12)
   Hash Cond: (t1.a = t2.a)
   ->  HashAggregate  (cost=19425.00..19435.00 rows=1000 width=8)
         Group Key: t1.a
         ->  Seq Scan on t1  (cost=0.00..14425.00 rows=1000000 width=8)
   ->  Hash  (cost=17.50..17.50 rows=1 width=4)
         ->  Seq Scan on t2  (cost=0.00..17.50 rows=1 width=4)
               Filter: (a <= 1)
(8 rows)

Notice the seq scan on t1 instead of the index scan on t1_a_idx.

A way around this is to manually push the predicate down into the subquery:

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 where t1.a <= 1 group by a) s inner join t2 on t2.a = s.a where t2.a <= 1;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..21.98 rows=1 width=12)
   Join Filter: (t1.a = t2.a)
   ->  GroupAggregate  (cost=0.42..4.46 rows=1 width=8)
         Group Key: t1.a
         ->  Index Scan using t1_a_idx on t1  (cost=0.42..4.44 rows=1 width=8)
               Index Cond: (a <= 1)
   ->  Seq Scan on t2  (cost=0.00..17.50 rows=1 width=4)
         Filter: (a <= 1)
(8 rows) 


The query in question is likely performing badly because of this:

                     ->  Seq Scan on fr13t1  (cost=0.00..25072.50 rows=311800 width=21) (actual time=0.007..115.766 rows=311800 loops=1)
                           Filter: (fr01codemp = '1'::smallint)
                           Buffers: shared hit=21175

Just how selective is fr01codemp = '1'::smallint ? Is there an index on that column ?

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-performance по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Slow HashAggregate/cache access
Следующее
От: Alexandre de Arruda Paes
Дата:
Сообщение: Re: Slow HashAggregate/cache access