Обсуждение: Partitioning versus autovacuum

Поиск
Список
Период
Сортировка

Partitioning versus autovacuum

От
Greg Stark
Дата:
So we now support `ANALYZE partitioned_table` which will gather statistics for the main table by gathering stats from all the partitions.

However as far as I can tell autovacuum will never actually trigger this analyze. Because we never generate any update records for the parent table in the statistics. Have I missed something?

I didn't find any discussion of this in the threads from when partitioning was committed but there were a lot of discussions and I could easily have missed it.

Is there a story for this? Some way to configure things so that autovacuum will analyze partitioned tables?

Or should we look at doing something? Maybe whether we analyze a child we should also update the parent -- and if there's no stats yet run analyze on it?

This may be a serious enough problem for users that it may warrant backpatching. Not having any stats is resulting in some pretty weird plans for us.

Re: Partitioning versus autovacuum

От
Greg Stark
Дата:
Actually I did just find it in the To-do wiki:

Have autoanalyze of parent tables occur when child tables are modified


On Mon., Sep. 30, 2019, 1:48 p.m. Greg Stark, <stark@mit.edu> wrote:
So we now support `ANALYZE partitioned_table` which will gather statistics for the main table by gathering stats from all the partitions.

However as far as I can tell autovacuum will never actually trigger this analyze. Because we never generate any update records for the parent table in the statistics. Have I missed something?

I didn't find any discussion of this in the threads from when partitioning was committed but there were a lot of discussions and I could easily have missed it.

Is there a story for this? Some way to configure things so that autovacuum will analyze partitioned tables?

Or should we look at doing something? Maybe whether we analyze a child we should also update the parent -- and if there's no stats yet run analyze on it?

This may be a serious enough problem for users that it may warrant backpatching. Not having any stats is resulting in some pretty weird plans for us.

Re: Partitioning versus autovacuum

От
Greg Stark
Дата:
Actually -- I'm sorry to followup to myself (twice) -- but that's
wrong. That Todo item predates the modern partitioning code. It came
from when the partitioned statistics were added for inheritance trees.
The resulting comment almost doesn't make sense any more since it
talks about updates to the parent table and treats them as distinct
from updates to the children.

In any case it's actually not true any more as updates to the parent
table aren't even tracked any more -- see below. My modest proposal is
that we should count any updates that arrive through the parent table
as mods for both the parent and child.

A more ambitious proposal would have updates to the children also
count against the parent somehow but I'm not sure exactly how. And I'm
not sure we shouldn't be updating the parent statistics whenever we
run analyze on a child anyways but again I'm not sure how.

postgres=# postgres=# create table p (i integer primary key, t text)
partition by range (i) ;
CREATE TABLE
postgres=# create table p0 partition of p for values from (0) to (10);
CREATE TABLE
postgres=# analyze p;
ANALYZE
postgres=# analyze p0;
ANALYZE
postgres=# select pg_stat_get_mod_since_analyze('p'::regclass) as p,
pg_stat_get_mod_since_analyze('p0'::regclass) as p0;
 p | p0
---+----
 0 |  0
(1 row)

postgres=# insert into p values (2);
INSERT 0 1
postgres=# select pg_stat_get_mod_since_analyze('p'::regclass) as p,
pg_stat_get_mod_since_analyze('p0'::regclass) as p0;
 p | p0
---+----
 0 |  1
(1 row)



Re: Partitioning versus autovacuum

От
Amit Langote
Дата:
Hi Greg,

On Tue, Oct 1, 2019 at 4:03 AM Greg Stark <stark@mit.edu> wrote:
>
> Actually -- I'm sorry to followup to myself (twice) -- but that's
> wrong. That Todo item predates the modern partitioning code. It came
> from when the partitioned statistics were added for inheritance trees.
> The resulting comment almost doesn't make sense any more since it
> talks about updates to the parent table and treats them as distinct
> from updates to the children.
>
> In any case it's actually not true any more as updates to the parent
> table aren't even tracked any more -- see below. My modest proposal is
> that we should count any updates that arrive through the parent table
> as mods for both the parent and child.

Yeah, we need to teach autovacuum to consider analyzing partitioned
tables.  That is still a TODO for declarative partitioning.

We do need to weigh the trade-offs here.  In the thread quoted in your
previous email, Tom expresses a concern [1] about ending up doing
excessive work, because partitions would be scanned twice -- first to
collect their own statistics and then to collect the parent's when the
parent table is analyzed.  Maybe if we find a way to calculate
parent's stats from the partitions' stats without scanning the
partitions, that would be great.

Another thing to consider is that users now (as of v11) have the
option of using partitionwise plans.  Consider joining two huge
partitioned tables.  If they are identically partitioned, Postgres
planner considers joining pairs of matching partitions and appending
the outputs of these smaller joins.  In this case, even if the
non-partitionwise join couldn't use hash join, individual smaller
joins could, because partition stats would be up to date.  The
requirements that the tables being joined be identically partitioned
(or be partitioned at all) might be a bit too restrictive though.

> A more ambitious proposal would have updates to the children also
> count against the parent somehow but I'm not sure exactly how. And I'm
> not sure we shouldn't be updating the parent statistics whenever we
> run analyze on a child anyways but again I'm not sure how.

As I mentioned above, we could try to figure out a way to "merge" the
individual partitions' statistics when they're refreshed into the
parent's stats.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/489.1276114285%40sss.pgh.pa.us



Re: Partitioning versus autovacuum

От
Greg Stark
Дата:
At the risk of forking this thread... I think there's actually a
planner estimation bug here too.

Consider this test case of a simple partitioned table and a simple
join. The cardinality estimates for each partition and the Append node
are all perfectly accurate. But the estimate for the join is way off.
The corresponding test case without partitioning produces a perfect
cardinality estimate for the join.

I've never completely wrapped my head around the planner selectivity
estimations. IIRC join restrictions are treated differently from
single-relation restrictions. Perhaps what's happening here is that
the single-relation restrictions are being correctly estimated based
on the child partitions but the join restriction code hasn't been
taught the same tricks?



stark=# create table p (i integer, j integer) partition by list  (i);
CREATE TABLE

stark=# create table p0 partition of p for values in (0);
CREATE TABLE
stark=# create table p1 partition of p for values in (1);
CREATE TABLE

stark=# insert into p select 0,generate_series(1,1000);
INSERT 0 1000
stark=# insert into p select 1,generate_series(1,1000);
INSERT 0 1000

stark=# analyze p0;
ANALYZE
stark=# analyze p1;
ANALYZE

stark=# create table q (i integer);
CREATE TABLE
stark=# insert into q values (0);
INSERT 0 1
stark=# analyze q;
ANALYZE

-- Query partitioned table, get wildly off row estimates for join

stark=# explain analyze select * from q join p using (i) where j
between 1 and 500;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                 QUERY PLAN
                                       │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Hash Join  (cost=1.02..44.82 rows=5 width=8) (actual
time=0.060..1.614 rows=500 loops=1)                    │
│   Hash Cond: (p0.i = q.i)
                                       │
│   ->  Append  (cost=0.00..40.00 rows=1000 width=8) (actual
time=0.030..1.127 rows=1000 loops=1)             │
│         ->  Seq Scan on p0  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.029..0.440 rows=500 loops=1) │
│               Filter: ((j >= 1) AND (j <= 500))
                                       │
│               Rows Removed by Filter: 500
                                       │
│         ->  Seq Scan on p1  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.018..0.461 rows=500 loops=1) │
│               Filter: ((j >= 1) AND (j <= 500))
                                       │
│               Rows Removed by Filter: 500
                                       │
│   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual
time=0.011..0.012 rows=1 loops=1)                      │
│         Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                       │
│         ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.005..0.006 rows=1 loops=1)       │
│ Planning time: 0.713 ms
                                       │
│ Execution time: 1.743 ms
                                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)


-- Query non-partitioned table get accurate row estimates for join

stark=# create table pp as (Select * from p);
SELECT 2000
stark=# analyze pp;
ANALYZE

stark=# explain analyze select * from q join pp using (i) where j
between 1 and 500;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                               QUERY PLAN
                                   │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Hash Join  (cost=1.02..48.77 rows=500 width=8) (actual
time=0.027..0.412 rows=500 loops=1)              │
│   Hash Cond: (pp.i = q.i)
                                   │
│   ->  Seq Scan on pp  (cost=0.00..39.00 rows=1000 width=8) (actual
time=0.014..0.243 rows=1000 loops=1) │
│         Filter: ((j >= 1) AND (j <= 500))
                                   │
│         Rows Removed by Filter: 1000
                                   │
│   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual
time=0.005..0.005 rows=1 loops=1)                  │
│         Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                   │
│         ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.003..0.003 rows=1 loops=1)   │
│ Planning time: 0.160 ms
                                   │
│ Execution time: 0.456 ms
                                   │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

Re: Partitioning versus autovacuum

От
yuzuko
Дата:
Hello Greg,

> At the risk of forking this thread... I think there's actually a
> planner estimation bug here too.
>
I think that is not a bug.  The estimation error occurred there were no
parent's statistics.  We should run analyze on *partitioned table*.

Here is your test case:
create table p (i integer, j integer) partition by list (i);
create table p0 partition of p for values in (0);
create table p1 partition of p for values in (1);
insert into p select 0,generate_series(1,1000);
insert into p select 1,generate_series(1,1000);
analyze p;

explain analyze select * from q join p using (i) where j between 1 and 500;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.02..54.77 rows=500 width=8) (actual
time=0.180..2.960 rows=500 loops=1)
   Hash Cond: (p0.i = q.i)
   ->  Append  (cost=0.00..45.00 rows=1000 width=8) (actual
time=0.033..1.887 rows=1000 loops=1)
         ->  Seq Scan on p0  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.025..0.524 rows=500 loops=1)
               Filter: ((j >= 1) AND (j <= 500))
               Rows Removed by Filter: 500
         ->  Seq Scan on p1  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.014..0.499 rows=500 loops=1)
               Filter: ((j >= 1) AND (j <= 500))
               Rows Removed by Filter: 500
   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual
time=0.103..0.104 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.072..0.074 rows=1 loops=1)
 Planning Time: 0.835 ms
 Execution Time: 3.310 ms
(14 rows)

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center