Constraint partition index usage

Поиск
Список
Период
Сортировка
От Michael Glaesemann
Тема Constraint partition index usage
Дата
Msg-id 6514E4AB-5D33-4B8A-ADF3-BCAB99FAA3A0@myyearbook.com
обсуждение исходный текст
Список pgsql-hackers
We've been looking at breaking out some of our larger logging-type  
tables (INSERT only, other than DELETEs for regularly removing old  
rows for maintenance) using constraint partitioning to avoid the need  
for heavy vacuuming. For some cases it's working well. In others,  
we're running into performance issues due to the planner no longer  
using indexes for some queries. I've included a couple of examples  
where we should have enough information to take advantage of the  
available indexes.

Here's the example schema.

=# \d foo.bars                Table "foo.bars"    Column |           Type           |       Modifiers
-----------+--------------------------+------------------------ user_id   | integer                  | not null bar
 | ip4                      | not null logged_at | timestamp with time zone | not null default now()
 
Triggers:    partition_bars BEFORE INSERT ON foo.bars FOR EACH ROW EXECUTE  
PROCEDURE foo.partition_bars()

=# \d foo.bars_20081013 -- example of one of 11 tables inheriting from  
foo.bars, each holding one week's-worth of data
            Table "foo.bars_20081013"    Column |           Type           |       Modifiers
-----------+--------------------------+------------------------ user_id   | integer                  | not null bar
 | ip4                      | not null logged_at | timestamp with time zone | not null default now()
 
Indexes:    "bars_20081013_bar_idx" btree (bar)    "bars_20081013_bar_logged_at_idx" btree (bar, logged_at)
"bars_20081013_logged_at_idx"btree (logged_at)
 
Check constraints:    "bars_20081013_logged_at_check" CHECK (logged_at >= '2008-10-13  
00:00:00-04'::timestamp with time zone AND logged_at < '2008-10-20  
00:00:00-04'::timestamp with time zone)
Inherits: foo.bars


Case 1: ORDER BY indexed_column LIMIT some_limit

Depending on the value of some_limit, we should see a win by returning  
the top some_limit rows from each inherited table as candidate rows,  
and then determine the top some_limit from among the candidates. (Note  
in the explain that not all of the inheriting tables have data.)


production=# explain select * from foo.bars order by logged_at desc  
limit 1;                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1023367.34..1023367.34 rows=1 width=16)   ->  Sort  (cost=1023367.34..1134058.31 rows=44276389 width=16)
    Sort Key: foo.bars.logged_at         ->  Result  (cost=0.00..801985.39 rows=44276389 width=16)               ->
Append (cost=0.00..801985.39 rows=44276389  
 
width=16)                     ->  Seq Scan on bars  (cost=0.00..32.70  
rows=1770 width=16)                     ->  Seq Scan on bars_20080915 bars   
(cost=0.00..101199.40 rows=5586490 width=16)                     ->  Seq Scan on bars_20080922 bars   
(cost=0.00..215666.84 rows=11907734 width=16)                     ->  Seq Scan on bars_20080908 bars   
(cost=0.00..32.70 rows=1770 width=16)                     ->  Seq Scan on bars_20080901 bars   
(cost=0.00..32.70 rows=1770 width=16)                     ->  Seq Scan on bars_20080825 bars   
(cost=0.00..32.70 rows=1770 width=16)                     ->  Seq Scan on bars_20080818 bars   
(cost=0.00..32.70 rows=1770 width=16)                     ->  Seq Scan on bars_20080811 bars   
(cost=0.00..32.70 rows=1770 width=16)                     ->  Seq Scan on bars_20080804 bars   
(cost=0.00..32.70 rows=1770 width=16)                     ->  Seq Scan on bars_20080929 bars   
(cost=0.00..215029.52 rows=11872652 width=16)                     ->  Seq Scan on bars_20081006 bars   
(cost=0.00..223559.96 rows=12343346 width=16)                     ->  Seq Scan on bars_20081013 bars   
(cost=0.00..46300.77 rows=2553777 width=16)
(17 rows)

production=# explain select * from foo.bars_20080915 order by  
logged_at desc limit 1;                                                                        QUERY  PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.03 rows=1 width=16)   ->  Index Scan Backward using bars_20080915_logged_at_idx on  
 
bars_20080915  (cost=0.00..167466.14 rows=5586490 width=16)
(2 rows)


Case 2: Return [min|max](indexed_column)

Again, by finding candidate rows from each inheriting table using the  
available index and then determining the min/max from among the  
candidates, we should have much better performance than performing seq  
scans on each inherting table.

production=# explain select min(logged_at) from foo.bars_20080915;
                QUERY  PLAN
 

---------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.03..0.04 rows=1 width=0)   InitPlan     ->  Limit  (cost=0.00..0.03 rows=1 width=8)           ->  Index
Scanusing bars_20080915_logged_at_idx on  
 
bars_20080915  (cost=0.00..167466.14 rows=5586490 width=8)                 Filter: (logged_at IS NOT NULL)
(5 rows)

production=# explain select min(logged_at) from foo.bars;                                                      QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=913072.22..913072.23 rows=1 width=8)   ->  Append  (cost=0.00..802333.27 rows=44295577 width=8)
-> Seq Scan on bars  (cost=0.00..32.70 rows=1770 width=8)         ->  Seq Scan on bars_20080915 bars
(cost=0.00..101199.40 
 
rows=5586490 width=8)         ->  Seq Scan on bars_20080922 bars  (cost=0.00..215666.84  
rows=11907734 width=8)         ->  Seq Scan on bars_20080908 bars  (cost=0.00..32.70  
rows=1770 width=8)         ->  Seq Scan on bars_20080901 bars  (cost=0.00..32.70  
rows=1770 width=8)         ->  Seq Scan on bars_20080825 bars  (cost=0.00..32.70  
rows=1770 width=8)         ->  Seq Scan on bars_20080818 bars  (cost=0.00..32.70  
rows=1770 width=8)         ->  Seq Scan on bars_20080811 bars  (cost=0.00..32.70  
rows=1770 width=8)         ->  Seq Scan on bars_20080804 bars  (cost=0.00..32.70  
rows=1770 width=8)         ->  Seq Scan on bars_20080929 bars  (cost=0.00..215029.52  
rows=11872652 width=8)         ->  Seq Scan on bars_20081006 bars  (cost=0.00..223559.96  
rows=12343346 width=8)         ->  Seq Scan on bars_20081013 bars  (cost=0.00..46648.65  
rows=2572965 width=8)
(14 rows)

I thought Greg Stark had looked at some performance tweaks wrt  
constraint partitioning, but I can't recall the details. Was that  
related to this at all?

Michael Glaesemann
michael.glaesemann@myyearbook.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Deriving Recovery Snapshots
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Updates of SE-PostgreSQL 8.4devel patches