Re: Slow HashAggregate/cache access

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Slow HashAggregate/cache access
Дата	5 августа 2015 г. 22:07:33
Msg-id	CAKJS1f8HsLSLKvW6YhCMCTGbAmBAQFBWiyxyv5c6WNo0XB=DAA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Slow HashAggregate/cache access (Maxim Boguk <maxim.boguk@gmail.com>)
Ответы	Re: Slow HashAggregate/cache access Re: Slow HashAggregate/cache access
Список	pgsql-performance

Дерево обсуждения

On 6 August 2015 at 06:25, Maxim Boguk <maxim.boguk@gmail.com> wrote:

On Wed, Aug 5, 2015 at 11:41 AM, Alexandre de Arruda Paes <adaldeia@gmail.com> wrote:
Hi,

First, sorry to compare Post with other database system, but I know nothing about Oracle...

This customer have an application made with a framework thats generates the SQL statements (so, We can't make any query optimizations) .

We did the following tests:

1) Postgresql 9.3 and Oracle 10 in a desktop machine(8 GB RAM, 1 SATA disk,Core i5)
2) Postgresql 9.3 in a server + FC storage (128 GB RAM, Xeon 32 cores, SAS disks)

I think I know where issue is.
The PostgreSQL planner unable pass join conditions into subquery with aggregate functions (it's well known limitation).

I think this statement is quite misleading. Let's look at an example:

create table t1 (a int not null, v int not null);

create table t2 (a int not null);

insert into t1 select s.i,10 from generate_series(1,1000) s(i),generate_series(1,1000);

insert into t2 select generate_series(1,1000);

create index on t1 (a);

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s inner join t2 on t2.a = s.a where t2.a = 1;

QUERY PLAN

----------------------------------------------------------------------------------

Nested Loop (cost=0.42..59.76 rows=1 width=12)

-> GroupAggregate (cost=0.42..42.24 rows=1 width=8)

Group Key: t1.a

-> Index Scan using t1_a_idx on t1 (cost=0.42..37.38 rows=969 width=8)

Index Cond: (a = 1)

-> Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4)

Filter: (a = 1)

(7 rows)

As you can see, the predicate is pushes down just fine into a subquery with aggregates.

The likely reason that PostgreSQL Is not behaving the same as SQL Server and Oracle is because the predicate pushdowns are limited to equality operators only as internally these are all represented by a series of "equivalence classes" which in this case say that 1 = t2.a = t1.a, therefore it's possible to apply t1.a = 1 at the lowest level.

These equivalence classes don't currently handle non-equality operators. Here's an example:

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 group by a) s inner join t2 on t2.a = s.a where t2.a <= 1;

QUERY PLAN

------------------------------------------------------------------------

Hash Join (cost=19442.51..19466.27 rows=1 width=12)

Hash Cond: (t1.a = t2.a)

-> HashAggregate (cost=19425.00..19435.00 rows=1000 width=8)

Group Key: t1.a

-> Seq Scan on t1 (cost=0.00..14425.00 rows=1000000 width=8)

-> Hash (cost=17.50..17.50 rows=1 width=4)

-> Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4)

Filter: (a <= 1)

(8 rows)

Notice the seq scan on t1 instead of the index scan on t1_a_idx.

A way around this is to manually push the predicate down into the subquery:

explain select t2.a,s.sumv from (select a,sum(v) sumv from t1 where t1.a <= 1 group by a) s inner join t2 on t2.a = s.a where t2.a <= 1;

QUERY PLAN

-------------------------------------------------------------------------------

Nested Loop (cost=0.42..21.98 rows=1 width=12)

Join Filter: (t1.a = t2.a)

-> GroupAggregate (cost=0.42..4.46 rows=1 width=8)

Group Key: t1.a

-> Index Scan using t1_a_idx on t1 (cost=0.42..4.44 rows=1 width=8)

Index Cond: (a <= 1)

-> Seq Scan on t2 (cost=0.00..17.50 rows=1 width=4)

Filter: (a <= 1)

(8 rows)

The query in question is likely performing badly because of this:

-> Seq Scan on fr13t1 (cost=0.00..25072.50 rows=311800 width=21) (actual time=0.007..115.766 rows=311800 loops=1)

Filter: (fr01codemp = '1'::smallint)

Buffers: shared hit=21175

Just how selective is fr01codemp = '1'::smallint ? Is there an index on that column ?

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Slow HashAggregate/cache access