Обсуждение: Explains of queries to partitioned tables

Поиск
Список
Период
Сортировка

Explains of queries to partitioned tables

От
Vlad Arkhipov
Дата:
There is a partitioned table with 2 partitions:

drop table if exists p cascade;

create table p (
   id bigint not null primary key,
   ts timestamp);

create table p_actual ( check (ts is null) ) inherits (p);
create table p_historical ( check (ts is not null) ) inherits (p);

-- I skipped the triggers and rules creation

insert into p (id, ts) values (1, '2000-01-01');
insert into p (id, ts) values (2, null);
insert into p (id, ts) values (3, '2001-01-01');
insert into p (id, ts) values (4, '2005-01-01');

analyze p;
analyze p_actual;
analyze p_historical;

Here is the explain output for the query 'select * from p where ts is null'

Result  (cost=0.00..188.10 rows=10 width=16) (actual time=0.028..0.038
rows=1 loops=1)
   ->  Append  (cost=0.00..188.10 rows=10 width=16) (actual
time=0.023..0.029 rows=1 loops=1)
         ->  Seq Scan on p  (cost=0.00..187.00 rows=9 width=16) (actual
time=0.002..0.002 rows=0 loops=1)
               Filter: (ts IS NULL)
         ->  Seq Scan on p_actual p  (cost=0.00..1.10 rows=1 width=16)
(actual time=0.014..0.016 rows=1 loops=1)
               Filter: (ts IS NULL)
Total runtime: 0.080 ms

You can notice that the optimizer expects 10 rows in the table p and as
a result of this assumption the whole query is estimated as 10 rows.
Whether it will cause a performance impact further? pg_stats does not
contain any statistics on the table 'p'. Is this a cause of such behaviour?
The estimation is worse for some other queries, for example 'select *
from p where ts is not null'

Result  (cost=0.00..188.30 rows=1764 width=16) (actual time=0.021..0.049
rows=3 loops=1)
   ->  Append  (cost=0.00..188.30 rows=1764 width=16) (actual
time=0.016..0.032 rows=3 loops=1)
         ->  Seq Scan on p  (cost=0.00..187.00 rows=1761 width=16)
(actual time=0.003..0.003 rows=0 loops=1)
               Filter: (ts IS NOT NULL)
         ->  Seq Scan on p_historical p  (cost=0.00..1.30 rows=3
width=16) (actual time=0.008..0.015 rows=3 loops=1)
               Filter: (ts IS NOT NULL)
Total runtime: 0.095 ms


Re: Explains of queries to partitioned tables

От
Robert Haas
Дата:
On Mon, Jul 26, 2010 at 4:47 AM, Vlad Arkhipov <arhipov@dc.baikal.ru> wrote:
> There is a partitioned table with 2 partitions:
>
> drop table if exists p cascade;
>
> create table p (
>  id bigint not null primary key,
>  ts timestamp);
>
> create table p_actual ( check (ts is null) ) inherits (p);
> create table p_historical ( check (ts is not null) ) inherits (p);
>
> -- I skipped the triggers and rules creation
>
> insert into p (id, ts) values (1, '2000-01-01');
> insert into p (id, ts) values (2, null);
> insert into p (id, ts) values (3, '2001-01-01');
> insert into p (id, ts) values (4, '2005-01-01');
>
> analyze p;
> analyze p_actual;
> analyze p_historical;
>
> Here is the explain output for the query 'select * from p where ts is null'
>
> Result  (cost=0.00..188.10 rows=10 width=16) (actual time=0.028..0.038
> rows=1 loops=1)
>  ->  Append  (cost=0.00..188.10 rows=10 width=16) (actual time=0.023..0.029
> rows=1 loops=1)
>        ->  Seq Scan on p  (cost=0.00..187.00 rows=9 width=16) (actual
> time=0.002..0.002 rows=0 loops=1)
>              Filter: (ts IS NULL)
>        ->  Seq Scan on p_actual p  (cost=0.00..1.10 rows=1 width=16) (actual
> time=0.014..0.016 rows=1 loops=1)
>              Filter: (ts IS NULL)
> Total runtime: 0.080 ms
>
> You can notice that the optimizer expects 10 rows in the table p and as a
> result of this assumption the whole query is estimated as 10 rows. Whether
> it will cause a performance impact further? pg_stats does not contain any
> statistics on the table 'p'. Is this a cause of such behaviour?
> The estimation is worse for some other queries, for example 'select * from p
> where ts is not null'
>
> Result  (cost=0.00..188.30 rows=1764 width=16) (actual time=0.021..0.049
> rows=3 loops=1)
>  ->  Append  (cost=0.00..188.30 rows=1764 width=16) (actual
> time=0.016..0.032 rows=3 loops=1)
>        ->  Seq Scan on p  (cost=0.00..187.00 rows=1761 width=16) (actual
> time=0.003..0.003 rows=0 loops=1)
>              Filter: (ts IS NOT NULL)
>        ->  Seq Scan on p_historical p  (cost=0.00..1.30 rows=3 width=16)
> (actual time=0.008..0.015 rows=3 loops=1)
>              Filter: (ts IS NOT NULL)
> Total runtime: 0.095 ms

It would be easier to comment on this if you mentioned things like
which version of PG you're using, and what you have
constraint_exclusion set to, but as a general comment analyze doesn't
store statistics for any tables that are empty, because it assumes
that at some point you're going to put data in them.  So in this case
p_historical is probably using fake stats.  But it's not clear that it
really matters: you haven't got any relevant indices, so a sequential
scan is the only possible plan; and even if you did have some, there's
only 4 rows, so a sequential scan is probably the only plan that makes
sense anyway.  And your query ran in a tenth of a millisecond, which
is pretty zippy.  So I'm not really sure what the problem is.  If this
isn't the real data, post an example with the real data and ask for
help about that.

http://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company