Обсуждение: Join the master table with other table is very slow (partitioning)

Поиск
Список
Период
Сортировка

Join the master table with other table is very slow (partitioning)

От
Ao Jianwang
Дата:
Hi Experts,

I found if we join the master table with other small table, then the running time is slow. While, if we join each child table with the small table, then it's very fast. Any comments and suggestions are greatly appreciated.

For example, par_list table is small(about 50k rows), while par_est is very large, for each day it's about 400MB. Therefore, we partition it by day. However, the query plan for joining the master table with par_list is bad, so the running time is slow. The good plan should be join each partition table with par_list separately, then aggregate the result together. 

1. Join the master table with a small table. It's slow.
dailyest=# explain (analyze on, buffers on)        
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date BETWEEN '2012-07-08' and '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-#             ORDER BY e.date;

-----------------------
 GroupAggregate  (cost=745326.86..745326.88 rows=1 width=8) (actual time=6281.364..6281.366 rows=3 loops=1)
   Buffers: shared hit=3 read=175869
   ->  Sort  (cost=745326.86..745326.86 rows=1 width=8) (actual time=6281.358..6281.358 rows=6 loops=1)
         Sort Key: e.date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3 read=175869
         ->  Nested Loop  (cost=0.00..745326.85 rows=1 width=8) (actual time=1228.493..6281.349 rows=6 loops=1)
               Join Filter: (l.id = e.list_id)
               Rows Removed by Join Filter: 4040
               Buffers: shared hit=3 read=175869
               ->  Seq Scan on par_list l  (cost=0.00..1213.10 rows=2 width=4) (actual time=0.010..38.272 rows=2 loops=1)
                     Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY ('{36,39,6000}'::integer[])))
                     Rows Removed by Filter: 50190
                     Buffers: shared hit=3 read=269
               ->  Materialize  (cost=0.00..744102.56 rows=407 width=12) (actual time=9.707..3121.053 rows=2023 loops=2)
                     Buffers: shared read=175600
                     ->  Append  (cost=0.00..744100.52 rows=407 width=12) (actual time=19.410..6240.044 rows=2023 loops=1)
                           Buffers: shared read=175600
                           ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07_08 e  (cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627 rows=674 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10814878
                                 Buffers: shared read=58463
                           ->  Seq Scan on par_est_2012_07_09 e  (cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238 rows=676 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10826866
                                 Buffers: shared read=58528
                           ->  Seq Scan on par_est_2012_07_10 e  (cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312 rows=673 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10841989
                                 Buffers: shared read=58609
 Total runtime: 6281.444 ms
(35 rows)


2. Join each partition table with small table (par_list) and union the result. This runs very fast. However, it's not reasonable if we union 180 SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31. Any better suggestions.

dailyest=# explain (analyze on, buffers on)
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_08 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-08' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_09 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-09' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_10 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# ;

                                                                                             QUERY PLAN                                       
                                                      
----------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------
 Result  (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912 rows=3 loops=1)
   Buffers: shared hit=27 read=28
   ->  Append  (cost=0.00..91.49 rows=3 width=8) (actual time=83.735..254.910 rows=3 loops=1)
         Buffers: shared hit=27 read=28
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=83.735..83.735 rows=1 loops=1)
               Buffers: shared hit=9 read=12
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=63.920..83.728 rows=2 loops=1)
                     Buffers: shared hit=9 read=12
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7 read=4
                     ->  Index Only Scan using par_est_2012_07_08_pkey on par_est_2012_07_08 e  (cost=0.00..5.94 rows=1 width=12) (actual time=41.083..41.083 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-08'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=76.911..76.911 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=57.580..76.909 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_09_pkey on par_est_2012_07_09 e  (cost=0.00..5.94 rows=1 width=12) (actual time=38.440..38.442 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-09'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.49 rows=1 width=8) (actual time=94.262..94.262 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=74.393..94.259 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_10_pkey on par_est_2012_07_10 e  (cost=0.00..5.95 rows=1 width=12) (actual time=47.116..47.117 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-10'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
 Total runtime: 255.074 ms
(38 rows)

Re: Join the master table with other table is very slow (partitioning)

От
Ao Jianwang
Дата:
Hi Rumman,

Thanks for your response. I follow the guide to build the partition. The settings should be good. See the following result. Any insight? thanks.

dailyest=# select version();
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)

dailyest=# show constraint_exclusion;
 constraint_exclusion 
----------------------
 on
(1 row)

On Fri, Mar 15, 2013 at 11:04 PM, AI Rumman <rummandba@gmail.com> wrote:
Which version of Postgresql are you using?
Have you set constraint_exclusion to parition?


On Fri, Mar 15, 2013 at 11:02 AM, Ao Jianwang <aojw2008@gmail.com> wrote:
Hi Experts,

I found if we join the master table with other small table, then the running time is slow. While, if we join each child table with the small table, then it's very fast. Any comments and suggestions are greatly appreciated.

For example, par_list table is small(about 50k rows), while par_est is very large, for each day it's about 400MB. Therefore, we partition it by day. However, the query plan for joining the master table with par_list is bad, so the running time is slow. The good plan should be join each partition table with par_list separately, then aggregate the result together. 

1. Join the master table with a small table. It's slow.
dailyest=# explain (analyze on, buffers on)        
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date BETWEEN '2012-07-08' and '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-#             ORDER BY e.date;

-----------------------
 GroupAggregate  (cost=745326.86..745326.88 rows=1 width=8) (actual time=6281.364..6281.366 rows=3 loops=1)
   Buffers: shared hit=3 read=175869
   ->  Sort  (cost=745326.86..745326.86 rows=1 width=8) (actual time=6281.358..6281.358 rows=6 loops=1)
         Sort Key: e.date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3 read=175869
         ->  Nested Loop  (cost=0.00..745326.85 rows=1 width=8) (actual time=1228.493..6281.349 rows=6 loops=1)
               Join Filter: (l.id = e.list_id)
               Rows Removed by Join Filter: 4040
               Buffers: shared hit=3 read=175869
               ->  Seq Scan on par_list l  (cost=0.00..1213.10 rows=2 width=4) (actual time=0.010..38.272 rows=2 loops=1)
                     Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY ('{36,39,6000}'::integer[])))
                     Rows Removed by Filter: 50190
                     Buffers: shared hit=3 read=269
               ->  Materialize  (cost=0.00..744102.56 rows=407 width=12) (actual time=9.707..3121.053 rows=2023 loops=2)
                     Buffers: shared read=175600
                     ->  Append  (cost=0.00..744100.52 rows=407 width=12) (actual time=19.410..6240.044 rows=2023 loops=1)
                           Buffers: shared read=175600
                           ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07_08 e  (cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627 rows=674 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10814878
                                 Buffers: shared read=58463
                           ->  Seq Scan on par_est_2012_07_09 e  (cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238 rows=676 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10826866
                                 Buffers: shared read=58528
                           ->  Seq Scan on par_est_2012_07_10 e  (cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312 rows=673 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10841989
                                 Buffers: shared read=58609
 Total runtime: 6281.444 ms
(35 rows)


2. Join each partition table with small table (par_list) and union the result. This runs very fast. However, it's not reasonable if we union 180 SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31. Any better suggestions.

dailyest=# explain (analyze on, buffers on)
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_08 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-08' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_09 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-09' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_10 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# ;

                                                                                             QUERY PLAN                                       
                                                      
----------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------
 Result  (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912 rows=3 loops=1)
   Buffers: shared hit=27 read=28
   ->  Append  (cost=0.00..91.49 rows=3 width=8) (actual time=83.735..254.910 rows=3 loops=1)
         Buffers: shared hit=27 read=28
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=83.735..83.735 rows=1 loops=1)
               Buffers: shared hit=9 read=12
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=63.920..83.728 rows=2 loops=1)
                     Buffers: shared hit=9 read=12
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7 read=4
                     ->  Index Only Scan using par_est_2012_07_08_pkey on par_est_2012_07_08 e  (cost=0.00..5.94 rows=1 width=12) (actual time=41.083..41.083 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-08'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=76.911..76.911 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=57.580..76.909 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_09_pkey on par_est_2012_07_09 e  (cost=0.00..5.94 rows=1 width=12) (actual time=38.440..38.442 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-09'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.49 rows=1 width=8) (actual time=94.262..94.262 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=74.393..94.259 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_10_pkey on par_est_2012_07_10 e  (cost=0.00..5.95 rows=1 width=12) (actual time=47.116..47.117 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-10'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
 Total runtime: 255.074 ms
(38 rows)



Re: Join the master table with other table is very slow (partitioning)

От
AI Rumman
Дата:


On Fri, Mar 15, 2013 at 11:09 AM, Ao Jianwang <aojw2008@gmail.com> wrote:
Hi Rumman,

Thanks for your response. I follow the guide to build the partition. The settings should be good. See the following result. Any insight? thanks.

dailyest=# select version();
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)

dailyest=# show constraint_exclusion;
 constraint_exclusion 
----------------------
 on
(1 row)

On Fri, Mar 15, 2013 at 11:04 PM, AI Rumman <rummandba@gmail.com> wrote:
Which version of Postgresql are you using?
Have you set constraint_exclusion to parition?


On Fri, Mar 15, 2013 at 11:02 AM, Ao Jianwang <aojw2008@gmail.com> wrote:
Hi Experts,

I found if we join the master table with other small table, then the running time is slow. While, if we join each child table with the small table, then it's very fast. Any comments and suggestions are greatly appreciated.

For example, par_list table is small(about 50k rows), while par_est is very large, for each day it's about 400MB. Therefore, we partition it by day. However, the query plan for joining the master table with par_list is bad, so the running time is slow. The good plan should be join each partition table with par_list separately, then aggregate the result together. 

1. Join the master table with a small table. It's slow.
dailyest=# explain (analyze on, buffers on)        
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date BETWEEN '2012-07-08' and '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-#             ORDER BY e.date;

-----------------------
 GroupAggregate  (cost=745326.86..745326.88 rows=1 width=8) (actual time=6281.364..6281.366 rows=3 loops=1)
   Buffers: shared hit=3 read=175869
   ->  Sort  (cost=745326.86..745326.86 rows=1 width=8) (actual time=6281.358..6281.358 rows=6 loops=1)
         Sort Key: e.date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3 read=175869
         ->  Nested Loop  (cost=0.00..745326.85 rows=1 width=8) (actual time=1228.493..6281.349 rows=6 loops=1)
               Join Filter: (l.id = e.list_id)
               Rows Removed by Join Filter: 4040
               Buffers: shared hit=3 read=175869
               ->  Seq Scan on par_list l  (cost=0.00..1213.10 rows=2 width=4) (actual time=0.010..38.272 rows=2 loops=1)
                     Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY ('{36,39,6000}'::integer[])))
                     Rows Removed by Filter: 50190
                     Buffers: shared hit=3 read=269
               ->  Materialize  (cost=0.00..744102.56 rows=407 width=12) (actual time=9.707..3121.053 rows=2023 loops=2)
                     Buffers: shared read=175600
                     ->  Append  (cost=0.00..744100.52 rows=407 width=12) (actual time=19.410..6240.044 rows=2023 loops=1)
                           Buffers: shared read=175600
                           ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07_08 e  (cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627 rows=674 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10814878
                                 Buffers: shared read=58463
                           ->  Seq Scan on par_est_2012_07_09 e  (cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238 rows=676 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10826866
                                 Buffers: shared read=58528
                           ->  Seq Scan on par_est_2012_07_10 e  (cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312 rows=673 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10841989
                                 Buffers: shared read=58609
 Total runtime: 6281.444 ms
(35 rows)


2. Join each partition table with small table (par_list) and union the result. This runs very fast. However, it's not reasonable if we union 180 SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31. Any better suggestions.

dailyest=# explain (analyze on, buffers on)
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_08 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-08' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_09 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-09' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_10 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# ;

                                                                                             QUERY PLAN                                       
                                                      
----------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------
 Result  (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912 rows=3 loops=1)
   Buffers: shared hit=27 read=28
   ->  Append  (cost=0.00..91.49 rows=3 width=8) (actual time=83.735..254.910 rows=3 loops=1)
         Buffers: shared hit=27 read=28
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=83.735..83.735 rows=1 loops=1)
               Buffers: shared hit=9 read=12
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=63.920..83.728 rows=2 loops=1)
                     Buffers: shared hit=9 read=12
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7 read=4
                     ->  Index Only Scan using par_est_2012_07_08_pkey on par_est_2012_07_08 e  (cost=0.00..5.94 rows=1 width=12) (actual time=41.083..41.083 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-08'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=76.911..76.911 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=57.580..76.909 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_09_pkey on par_est_2012_07_09 e  (cost=0.00..5.94 rows=1 width=12) (actual time=38.440..38.442 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-09'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.49 rows=1 width=8) (actual time=94.262..94.262 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=74.393..94.259 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_10_pkey on par_est_2012_07_10 e  (cost=0.00..5.95 rows=1 width=12) (actual time=47.116..47.117 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-10'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
 Total runtime: 255.074 ms
(38 rows)



At first. you may try the following out and find out if the partition constraint exclusion is working or not::

explain
select * 
FROM  
par_est e
WHERE 
e.date BETWEEN '2012-07-08' and '2012-07-10' 

Re: Join the master table with other table is very slow (partitioning)

От
Ao Jianwang
Дата:
Hi Rumman,

I think it works. Please see the following result. Thanks.

dailyest=# explain select * from par_est e where e.date BETWEEN '2012-07-08' and '2012-07-10' 
;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..662886.68 rows=32485781 width=16)
   ->  Append  (cost=0.00..662886.68 rows=32485781 width=16)
         ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
         ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
         ->  Seq Scan on par_est_2012_07_08 e  (cost=0.00..220695.53 rows=10815502 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
         ->  Seq Scan on par_est_2012_07_09 e  (cost=0.00..220942.20 rows=10827613 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
         ->  Seq Scan on par_est_2012_07_10 e  (cost=0.00..221248.96 rows=10842664 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
(12 rows)




On Fri, Mar 15, 2013 at 11:12 PM, AI Rumman <rummandba@gmail.com> wrote:


On Fri, Mar 15, 2013 at 11:09 AM, Ao Jianwang <aojw2008@gmail.com> wrote:
Hi Rumman,

Thanks for your response. I follow the guide to build the partition. The settings should be good. See the following result. Any insight? thanks.

dailyest=# select version();
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)

dailyest=# show constraint_exclusion;
 constraint_exclusion 
----------------------
 on
(1 row)

On Fri, Mar 15, 2013 at 11:04 PM, AI Rumman <rummandba@gmail.com> wrote:
Which version of Postgresql are you using?
Have you set constraint_exclusion to parition?


On Fri, Mar 15, 2013 at 11:02 AM, Ao Jianwang <aojw2008@gmail.com> wrote:
Hi Experts,

I found if we join the master table with other small table, then the running time is slow. While, if we join each child table with the small table, then it's very fast. Any comments and suggestions are greatly appreciated.

For example, par_list table is small(about 50k rows), while par_est is very large, for each day it's about 400MB. Therefore, we partition it by day. However, the query plan for joining the master table with par_list is bad, so the running time is slow. The good plan should be join each partition table with par_list separately, then aggregate the result together. 

1. Join the master table with a small table. It's slow.
dailyest=# explain (analyze on, buffers on)        
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date BETWEEN '2012-07-08' and '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-#             ORDER BY e.date;

-----------------------
 GroupAggregate  (cost=745326.86..745326.88 rows=1 width=8) (actual time=6281.364..6281.366 rows=3 loops=1)
   Buffers: shared hit=3 read=175869
   ->  Sort  (cost=745326.86..745326.86 rows=1 width=8) (actual time=6281.358..6281.358 rows=6 loops=1)
         Sort Key: e.date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3 read=175869
         ->  Nested Loop  (cost=0.00..745326.85 rows=1 width=8) (actual time=1228.493..6281.349 rows=6 loops=1)
               Join Filter: (l.id = e.list_id)
               Rows Removed by Join Filter: 4040
               Buffers: shared hit=3 read=175869
               ->  Seq Scan on par_list l  (cost=0.00..1213.10 rows=2 width=4) (actual time=0.010..38.272 rows=2 loops=1)
                     Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY ('{36,39,6000}'::integer[])))
                     Rows Removed by Filter: 50190
                     Buffers: shared hit=3 read=269
               ->  Materialize  (cost=0.00..744102.56 rows=407 width=12) (actual time=9.707..3121.053 rows=2023 loops=2)
                     Buffers: shared read=175600
                     ->  Append  (cost=0.00..744100.52 rows=407 width=12) (actual time=19.410..6240.044 rows=2023 loops=1)
                           Buffers: shared read=175600
                           ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07_08 e  (cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627 rows=674 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10814878
                                 Buffers: shared read=58463
                           ->  Seq Scan on par_est_2012_07_09 e  (cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238 rows=676 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10826866
                                 Buffers: shared read=58528
                           ->  Seq Scan on par_est_2012_07_10 e  (cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312 rows=673 loops=1)
                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10841989
                                 Buffers: shared read=58609
 Total runtime: 6281.444 ms
(35 rows)


2. Join each partition table with small table (par_list) and union the result. This runs very fast. However, it's not reasonable if we union 180 SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31. Any better suggestions.

dailyest=# explain (analyze on, buffers on)
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_08 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-08' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_09 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-09' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_10 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-10' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# ;

                                                                                             QUERY PLAN                                       
                                                      
----------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------
 Result  (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912 rows=3 loops=1)
   Buffers: shared hit=27 read=28
   ->  Append  (cost=0.00..91.49 rows=3 width=8) (actual time=83.735..254.910 rows=3 loops=1)
         Buffers: shared hit=27 read=28
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=83.735..83.735 rows=1 loops=1)
               Buffers: shared hit=9 read=12
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=63.920..83.728 rows=2 loops=1)
                     Buffers: shared hit=9 read=12
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7 read=4
                     ->  Index Only Scan using par_est_2012_07_08_pkey on par_est_2012_07_08 e  (cost=0.00..5.94 rows=1 width=12) (actual time=41.083..41.083 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-08'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=76.911..76.911 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=57.580..76.909 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_09_pkey on par_est_2012_07_09 e  (cost=0.00..5.94 rows=1 width=12) (actual time=38.440..38.442 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-09'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.49 rows=1 width=8) (actual time=94.262..94.262 rows=1 loops=1)
               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=74.393..94.259 rows=2 loops=1)
                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017 rows=2 loops=1)
                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7
                     ->  Index Only Scan using par_est_2012_07_10_pkey on par_est_2012_07_10 e  (cost=0.00..5.95 rows=1 width=12) (actual time=47.116..47.117 rows=1 loops=2)
                           Index Cond: ((date = '2012-07-10'::date) AND (list_id = l.id) AND (aid = 333710667))
                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
 Total runtime: 255.074 ms
(38 rows)



At first. you may try the following out and find out if the partition constraint exclusion is working or not::

explain
select * 
FROM  
par_est e
WHERE 
e.date BETWEEN '2012-07-08' and '2012-07-10' 

Re: Join the master table with other table is very slow (partitioning)

От
Ao Jianwang
Дата:
Yes, the index name is par_est_2012_07_09_aid_index on the aid column. The plan is as follows. It seems looks better than the old one, since it choose the index scan. However, I don't think it's efficient, since it still append the result from child tables together, then join the small table (par_list). I expect each child table will join with the small table, then aggregate them together as the "UNION ALL" did. Any comments. Thanks.

explain
select * 
FROM  
par_est e
WHERE 
e.date BETWEEN '2012-07-12' and '2012-07-14'  
and e.aid = 310723177
and exists
 (
 select true
 from par_daily_list l
 where l.id = e.list_id and
  l.fid = 1 and
l.sid = 143441 and
  l.cid in (36, 39, 6000)
 )


                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..168.09 rows=1 width=16)
   ->  Index Scan using par_daily_list_sid_fid_cid_key on par_daily_list l  (cost=0.00..18.56 rows=2 width=4)
         Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
   ->  Append  (cost=0.00..74.71 rows=5 width=16)
         ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 310723177) AND (l.id = list_id))
         ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=16)
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 310723177) AND (l.id = list_id))
         ->  Bitmap Heap Scan on par_est_2012_07_08 e  (cost=20.86..24.88 rows=1 width=16)
               Recheck Cond: ((aid = 310723177) AND (list_id = l.id))
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
               ->  BitmapAnd  (cost=20.86..20.86 rows=1 width=0)
                     ->  Bitmap Index Scan on par_est_2012_07_08_aid_index  (cost=0.00..6.47 rows=138 width=0)
                           Index Cond: (aid = 310723177)
                     ->  Bitmap Index Scan on par_est_2012_07_08_le_index  (cost=0.00..14.11 rows=623 width=0)
                           Index Cond: (list_id = l.id)
         ->  Bitmap Heap Scan on par_est_2012_07_09 e  (cost=20.94..24.96 rows=1 width=16)
               Recheck Cond: ((aid = 310723177) AND (list_id = l.id))
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
               ->  BitmapAnd  (cost=20.94..20.94 rows=1 width=0)
                     ->  Bitmap Index Scan on par_est_2012_07_09_aid_index  (cost=0.00..6.44 rows=134 width=0)
                           Index Cond: (aid = 310723177)
                     ->  Bitmap Index Scan on par_est_2012_07_09_le_index  (cost=0.00..14.22 rows=637 width=0)
                           Index Cond: (list_id = l.id)
         ->  Bitmap Heap Scan on par_est_2012_07_10 e  (cost=20.85..24.87 rows=1 width=16)
               Recheck Cond: ((aid = 310723177) AND (list_id = l.id))
               Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date))
               ->  BitmapAnd  (cost=20.85..20.85 rows=1 width=0)
                     ->  Bitmap Index Scan on par_est_2012_07_10_aid_index  (cost=0.00..6.45 rows=135 width=0)
                           Index Cond: (aid = 310723177)
                     ->  Bitmap Index Scan on par_est_2012_07_10_le_index  (cost=0.00..14.11 rows=623 width=0)
                           Index Cond: (list_id = l.id)
(32 rows)

Re: Join the master table with other table is very slow (partitioning)

От
Tom Lane
Дата:
Ao Jianwang <aojw2008@gmail.com> writes:
> I found if we join the master table with other small table, then the
> running time is slow. While, if we join each child table with the small
> table, then it's very fast. Any comments and suggestions are greatly
> appreciated.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

You haven't shown us table schemas, particularly the index definitions.
It looks to me like the partition child tables probably don't have
indexes that are well adapted to this query.  Equality constraints
should be on leading columns of the index, but the only index I see
evidence of in your plans has the date column first.  Probably the
planner is considering an inner-indexscan plan and rejecting it as
being more expensive than this one, because it would have to scan too
much of the index.

            regards, tom lane


Re: Join the master table with other table is very slow (partitioning)

От
Ao Jianwang
Дата:
Hi Tom, Rumman

Here I use two levels of partition. That's, par_est is first partitioned by monthly (such as par_est_2012_07, ...), then for each monthly child table, we create the daily partition table (such as par_est_2012_07_01).  And, actually, I did some test on that. The result is as follows.
1) If postgres can join each child table (such as par_est_2012_07_08) with the small table (par_list), then use par_est_2012_07_08_pkey can let the postgres use index only scan (in UNION ALL), which is faster. However, postgres doesn't do like that.

dailyest=# \d par_est_2012_07_08
Table "public.par_est_2012_07_08"
  Column  |  Type   | Modifiers 
----------+---------+-----------
 list_id  | integer | not null
 aid      | integer | not null
 estimate | integer | not null
 date     | date    | not null
Indexes:
    "par_est_2012_07_08_pkey" PRIMARY KEY, btree (date, list_id, aid, estimate) CLUSTER
Check constraints:
    "par_est_2012_07_08_date_check" CHECK (date = '2012-07-12'::date)
    "par_est_2012_07_date_check" CHECK (date >= '2012-07-01'::date AND date <= '2012-07-31'::date)
Foreign-key constraints:
    "par_est_2012_07_08_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
Inherits: par_est_2012_07

dailyest=# \d par_list

Referenced by:
    TABLE "par_est_2012_07_01" CONSTRAINT "par_est_2012_07_01_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_02" CONSTRAINT "par_est_2012_07_02_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_03" CONSTRAINT "par_est_2012_07_03_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_04" CONSTRAINT "par_est_2012_07_04_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_05" CONSTRAINT "par_est_2012_07_05_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_06" CONSTRAINT "par_est_2012_07_06_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_07" CONSTRAINT "par_est_2012_07_07_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_08" CONSTRAINT "par_est_2012_07_08_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_09" CONSTRAINT "par_est_2012_07_09_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
    TABLE "par_est_2012_07_10" CONSTRAINT "par_est_2012_07_10_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)


2) As postgres just append the result from child tables and lastly join with the small table. I change the index of the child table to the following. So that the index can be used. However, it's still slower than the "UNION ALL" solution. Any comments, thanks.
dailyest=# \d par_est_2012_07_08
Table "public.par_est_2012_07_08"
  Column  |  Type   | Modifiers 
----------+---------+-----------
 list_id  | integer | not null
 aid      | integer | not null
 estimate | integer | not null
 date     | date    | not null
Indexes:
    "par_est_2012_07_08_aid_index" btree (aid)
    "par_est_2012_07_08_le_index" btree (list_id, estimate) CLUSTER
Check constraints:
    "par_est_2012_07_08_date_check" CHECK (date = '2012-07-08'::date)
    "par_est_2012_07_date_check" CHECK (date >= '2012-07-01'::date AND date <= '2012-07-31'::date)
Foreign-key constraints:
    "par_est_2012_07_08_list_id_fk" FOREIGN KEY (list_id) REFERENCES par_list(id)
Inherits: par_est_2012_07 

On Fri, Mar 15, 2013 at 11:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ao Jianwang <aojw2008@gmail.com> writes:
> I found if we join the master table with other small table, then the
> running time is slow. While, if we join each child table with the small
> table, then it's very fast. Any comments and suggestions are greatly
> appreciated.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

You haven't shown us table schemas, particularly the index definitions.
It looks to me like the partition child tables probably don't have
indexes that are well adapted to this query.  Equality constraints
should be on leading columns of the index, but the only index I see
evidence of in your plans has the date column first.  Probably the
planner is considering an inner-indexscan plan and rejecting it as
being more expensive than this one, because it would have to scan too
much of the index.

                        regards, tom lane