Re: extended stats on partitioned tables

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: extended stats on partitioned tables
Дата
Msg-id e2196aa4-4a52-6168-6f13-f50b4d0edf48@enterprisedb.com
обсуждение исходный текст
Ответ на extended stats on partitioned tables  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: extended stats on partitioned tables  (Justin Pryzby <pryzby@telsasoft.com>)
Re: extended stats on partitioned tables  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On 9/23/21 11:26 PM, Justin Pryzby wrote:
> extended stats objects are allowed on partitioned tables since v10.
> https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
> 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b
> 
> But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
> This was the consequence of a commit to avoid an error I reported with stats on
> inheritence parents (not partitioned tables).
> 
> preceding 859b3003de, stats on the parent table *did* improve the estimate,
> so this part of the commit message seems to have been wrong?
> |commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
> |    Don't build extended statistics on inheritance trees
> ...
> |    Moreover, the current selectivity estimation code only works with individual
> |    relations, so building statistics on inheritance trees would be pointless
> |    anyway.
> 
> |CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
> |CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
> |TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
> |CREATE STATISTICS pp ON (a),(b) FROM p;
> |VACUUM ANALYZE p;
> |SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;
> 
> |postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
> | HashAggregate  (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)
> 
> |postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
> | HashAggregate  (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)
> 
> So I think this is a regression, and extended stats should be populated for
> partitioned tables - I had actually done that for some parent tables and hadn't
> noticed that the stats objects no longer do anything.
> 
> That begs the question if the current behavior for inheritence parents is
> correct..
> 
> CREATE TABLE p (i int, a int, b int);
> CREATE TABLE pd () INHERITS (p);
> INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
> CREATE STATISTICS pp ON (a),(b) FROM p;
> VACUUM ANALYZE p;
> explain analyze SELECT a,b FROM p GROUP BY 1,2;
> 
> | HashAggregate  (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 rows=10 loops=1)
> 

Agreed, that seems like a regression, but I don't see how to fix that 
without having the extra flag in the catalog. Otherwise we can store 
just one version for each statistics object :-(

> Since child tables can be queried directly, it's a legitimate question whether
> we should collect stats for the table heirarchy or (since the catalog only
> supports one) only the table itself.  I'd think that stats for the table
> hierarchy would be more commonly useful (but we shouldn't change the behavior
> in existing releases again).  Anyway it seems unfortunate that
> statistic_ext_data still has no stxinherited.
> 

Yeah, we probably need the flag - I planned to get it into 14, but then 
I got distracted by something else :-/

Attached is a PoC that I quickly bashed together today. It's pretty raw, 
but it passed "make check" and I think it does most of the things right. 
Can you try if this fixes the estimates with partitioned tables?

Extended statistics use two catalogs, pg_statistic_ext for definition, 
while pg_statistic_ext_data stores the built statistics objects - the 
flag needs to be in the "data" catalog, and managing the records is a 
bit challenging - the current PoC code mostly works, but I had to relax 
some error checks and I'm sure there are cases when we fail to remove a 
row, or something like that.

> Note that for partitioned tables if I enable enable_partitionwise_aggregate,
> then stats objects on the child tables can be helpful (but that's also
> confusing to the question at hand).
> 

Yeah. I think it'd be helpful to assemble a script with various test 
cases demonstrating how we estimate various cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: OpenSSL 3.0.0 compatibility
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: logical replication restrictions