Обсуждение: [PERFORM] CREATE STATISTICS and join selectivity

Поиск
Список
Период
Сортировка

[PERFORM] CREATE STATISTICS and join selectivity

От
Justin Pryzby
Дата:
I just noticed that PG10 CREATE STATISTICS (dependencies) doesn't seem to work
for joins on multiple columns; is that right?

With statistics on table for 20171111 but not 20171110:

ts=# CREATE STATISTICS x ON site_id,sect_id FROM eric_enodeb_cell_20171111;
ts=# ANALYZE VERBOSE eric_enodeb_cell_20171111;

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171110 a JOIN eric_enodeb_cell_20171110 b USING(start_time,
sect_id)WHERE a.site_id=318 AND sect_id=1489;
 
Nested Loop  (cost=0.83..4565.09 rows=1 width=0) (actual time=23.595..69.541 rows=96 loops=1)
=> bad estimate on redundant WHERE WITHOUT multivar statistics

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171111 a JOIN eric_enodeb_cell_20171111 b USING(start_time,
sect_id)WHERE a.site_id=318 AND sect_id=1489;
 
Nested Loop  (cost=0.83..4862.41 rows=96 width=0) (actual time=0.034..3.882 rows=96 loops=1)
=> good estimate on redundant WHERE WITH multivar statistics

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171110 a JOIN eric_enodeb_cell_20171110 b USING(start_time,
sect_id);
Merge Join  (cost=18249.85..19624.18 rows=54858 width=0) (actual time=157.252..236.945 rows=55050 loops=1)
=> good estimate on JOIN on SECT_id without stats

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171110 a JOIN eric_enodeb_cell_20171110 b USING(start_time,
site_id);
Merge Join  (cost=0.83..14431.81 rows=261499 width=0) (actual time=0.031..259.382 rows=262638 loops=1)
=> good estimate on JOIN on SITE_id without stats

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171111 a JOIN eric_enodeb_cell_20171111 b USING(start_time,
site_id);
Merge Join  (cost=0.83..14706.29 rows=268057 width=0) (actual time=37.360..331.276 rows=268092 loops=1)
=> good estimate on JOIN on SITE_id with stats

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171111 a JOIN eric_enodeb_cell_20171111 b USING(start_time,
sect_id);
Merge Join  (cost=18560.89..19959.67 rows=55944 width=0) (actual time=130.865..198.439 rows=55956 loops=1)
=> good estimate on JOIN on SECT_id with stats

ts=# explain ANALYZE SELECT FROM eric_enodeb_cell_20171111 a JOIN eric_enodeb_cell_20171111 b USING(start_time,
sect_id,site_id);
 
Gather  (cost=1000.83..12222.06 rows=460 width=0) (actual time=1.686..149.707 rows=55956 loops=1)
=> poor estimate on redundant JOIN WITH stats (??)

I've already fixed our reports to avoid this kind of thing and support our PG95
customers, but I tentatively would've expected PG10 MV stats to "know" that
USING(site_id, sect_id) is no more selective than USING(sect_id), same as it
knows that's true for WHERE site... AND sect....

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] CREATE STATISTICS and join selectivity

От
David Rowley
Дата:
On 16 November 2017 at 09:19, Justin Pryzby <pryzby@telsasoft.com> wrote:
> I just noticed that PG10 CREATE STATISTICS (dependencies) doesn't seem to work
> for joins on multiple columns; is that right?

Unfortunately, for now, they're not used for join selectivity
estimates, only for the base rel selectivities. That's all there was
time for with PG10. This is highly likely to be improved sometime in
the future.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance