Re: Why is plan (and performance) different on partitioned table?

Поиск
Список
Период
Сортировка
От Mark Liberman
Тема Re: Why is plan (and performance) different on partitioned table?
Дата
Msg-id 9D938282F8C6EE43B748B910386DE93E0138B43D@srvgpimail1.GPI.local
обсуждение исходный текст
Ответ на Why is plan (and performance) different on partitioned table?  ("Mark Liberman" <mliberman@mixedsignals.com>)
Ответы Re: Why is plan (and performance) different on partitioned table?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance

>If you don't have anything in the parent table br_1min, then deleting
>the (presumably obsolete) pg_statistic rows for it should fix your
>immediate problem.  Otherwise, consider applying the attached.

Tom, thanks alot for your reply.  A few follow-up questions, and one potential "bug"?

I've been experimenting with deleting the rows from pg_statistics.  FYI, there were statistics for all master tables prior to us partioning the data.  We then manually inserted the rows into each inherited partition and, when done - did a truncate of the master table.

So, here's what I'm finding. 

1) When I delete the rows from pg_statistics, the new plan is, indeed, a hash join.

explain analyze
SELECT *
FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id
WHERE ((end_time >= '2006-05-01 17:12:18-07' AND end_time < '2006-05-01 17:13:18-07'))
  AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713,153725,153739,153722) ;

Hash Join  (cost=763.35..807.35 rows=1 width=109) (actual time=3.631..36.181 rows=45 loops=1)
   Hash Cond: ("outer".modules_id = "inner".id)
   ->  Append  (cost=1.04..40.64 rows=877 width=32) (actual time=0.198..34.872 rows=910 loops=1)
         ->  Bitmap Heap Scan on br_1min bfs1  (cost=1.04..8.70 rows=6 width=32) (actual time=0.060..0.060 rows=0 loops=1)
               Recheck Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
               ->  Bitmap Index Scan on br_1min_end_idx  (cost=0.00..1.04 rows=6 width=0) (actual time=0.054..0.054 rows=0 loops=1)
                     Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
         ->  Index Scan using br_1min_20557_end_idx on br_1min_20557 bfs1  (cost=0.00..25.91 rows=869 width=32) (actual time=0.136..1.858 rows=910 loops=1)
               Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
         ->  Index Scan using br_1min_20570_end_idx on br_1min_20570 bfs1  (cost=0.00..3.02 rows=1 width=32) (actual time=0.092..0.092 rows=0 loops=1)
               Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
         ->  Index Scan using br_1min_20583_end_idx on br_1min_20583 bfs1  (cost=0.00..3.02 rows=1 width=32) (actual time=32.034..32.034 rows=0 loops=1)
               Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
   ->  Hash  (cost=761.61..761.61 rows=281 width=77) (actual time=0.487..0.487 rows=45 loops=1)
         ->  Bitmap Heap Scan on br_mods mod  (cost=20.98..761.61 rows=281 width=77) (actual time=0.264..0.435 rows=45 loops=1)
               Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722))
               ->  BitmapOr  (cost=20.98..20.98 rows=281 width=0) (actual time=0.223..0.223 rows=0 loops=1)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.091..0.091 rows=14 loops=1)
                           Index Cond: (downloads_id = 153226)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.037..0.037 rows=2 loops=1)
                           Index Cond: (downloads_id = 153714)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.010..0.010 rows=2 loops=1)
                           Index Cond: (downloads_id = 153730)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
                           Index Cond: (downloads_id = 153728)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
                           Index Cond: (downloads_id = 153727)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
                           Index Cond: (downloads_id = 153724)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
                           Index Cond: (downloads_id = 153713)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
                           Index Cond: (downloads_id = 153725)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=16 loops=1)
                           Index Cond: (downloads_id = 153739)
                     ->  Bitmap Index Scan on br_mods_downloads_id_idx  (cost=0.00..2.10 rows=28 width=0) (actual time=0.009..0.009 rows=1 loops=1)
                           Index Cond: (downloads_id = 153722)
 Total runtime: 36.605 ms
(38 rows)


Note:  there are 2 new partitions that our cron jobs automatically created yesterday that are being scanned, but they do not return any rows.

2) When I re-analyze the br_1min table, new rows do not appear in pg_statistics for that table.

Now, my questions:

1) If there are no statistics for the master table, does postgres use the statistics for any of the partitions, or does it create a plan without any statistics related to the partitioned tables (e.g. some default plan.)?

2) I'm curious where it got an estimate of 6 rows for br_1min in "Bitmap Heap Scan on br_1min bfs1  (cost=1.04..8.70 rows=6 width=32)"  Any insight?

3) Basically, I'm wondering if this strategy of deleting the rows in pg_statistics for the master tables will work in all conditions, or if it runs the risk of again using faulty statistics and choosing a bad plan.  Would I be better off setting enable_mergejoin = f in the session right before I  issue this query and then resetting it after?  What are the risks of that approach?


Now, the potentital bug:

It appears that after you truncate a table, the statistics for that table still remain in pg_statistics.  And, as long as there are no rows added back to that table, the same statistics remain for that table, after an ANALYZE, - and are used by queries.  Once, you re-insert any rows in the table, however, new statistics will be computed.  So, the bug appears to be that after a truncate, if there are no rows in a table, the old, out-dated statistics do not get overwritten.  To follow are some simple tests I did to illustrate that.  Maybe this is by design, or, should I post this on pg-hackers?  It might be that in my case, it's better that new statitics ARE NOT inserted into pg_statistics for empty tables, but maybe the fix could be to delete the old statistics for analyzes to an empty table.

Thanks again Tom for your feedback,

- Mark


prdb=# create table mark_temp (col1 int, col2 int);
CREATE TABLE
prdb=# create index mark_temp_idx on mark_temp(col1);
CREATE INDEX

... I then inserted several thousand rows ....

prdb=# analyze mark_temp;
ANALYZE
prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp');
 staattnum | stadistinct
-----------+-------------
         1 |        9671
         2 |           1
(2 rows)

prdb=# explain analyze select * from mark_temp where col1 = 1045;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using mark_temp_idx on mark_temp  (cost=0.00..51.35 rows=27 width=8) (actual time=0.013..0.015 rows=1 loops=1)
   Index Cond: (col1 = 1045)
 Total runtime: 0.048 ms
(3 rows)

prdb=# truncate table mark_temp;
TRUNCATE TABLE
prdb=# analyze mark_temp;
ANALYZE

NOTE:  STATISTICS ARE THE SAME AND IT'S STILL DOING AN INDEX SCAN INSTEAD OF A SEQ SCAN

prdb=# explain analyze select * from mark_temp where col1 = 1045;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using mark_temp_idx on mark_temp  (cost=0.00..3.14 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
   Index Cond: (col1 = 1045)
 Total runtime: 0.031 ms
(3 rows)

prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp');
 staattnum | stadistinct
-----------+-------------
         1 |        9671
         2 |           1
(2 rows)

prdb=# insert into mark_temp (col1,col2) values (1,100);
INSERT 0 1
prdb=# analyze mark_temp;

NOTE: AFTER INSERT, THERE ARE NEW STATISTICS AND IT'S DOING A SEQ SCAN NOW

ANALYZE
prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp');
 staattnum | stadistinct
-----------+-------------
         1 |          -1
         2 |          -1
(2 rows)

prdb=# explain analyze select * from mark_temp where col1 = 1045;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on mark_temp  (cost=0.00..1.01 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
   Filter: (col1 = 1045)
 Total runtime: 0.029 ms
(3 rows)







В списке pgsql-performance по дате отправления:

Предыдущее
От: Vivek Khera
Дата:
Сообщение: Re: postgresql transaction id monitoring with nagios
Следующее
От: Will Reese
Дата:
Сообщение: Re: Slow restoration question