Обсуждение: Am I wasting my time with partitions?

Поиск
Список
Период
Сортировка

Am I wasting my time with partitions?

От
Stuart Brooks
Дата:
It seems to me that postgresql doesn't use indexes when being asked for 
an ordered result sets from a partitioned table. I have an application 
where this is critical, but I was hoping to use partitions because of 
the ease of rotating out old rows.

Simply put, I have a table called LineItems which I need to be able to 
page from and so I need to be able to ask for N rows ordered on a 
certain index (with possible constraints).

eg. SELECT * FROM T ORDER BY col1,col2 LIMIT 10;

This works fine and is quick on a single table:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
metadb=> \d lineitems                 Table "test2.lineitems"   Column    |              Type              | Modifiers
--------------+--------------------------------+-----------lineitem_key | bigint                         | not nulltime
       | timestamp(6) without time zone | not nulldescription  | text                           | not nullbarcode
|text                           | not nullamount       | bigint                         | not null
 
Indexes:   "lineitems_amount_index" btree (amount, lineitem_key)

metadb=> explain select * from lineitems order by amount,lineitem_key 
limit 10;                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------Limit
(cost=0.00..0.74rows=10 width=49)  ->  Index Scan using lineitems_amount_index on lineitems  
 
(cost=0.00..39791.76 rows=535500 width=49)
(2 rows)>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>



If I partition the table by creating a top level table L, and inherited 
tables L1, L2 and issue the same request it does sequential scans on all 
the tables and takes orders of magnitude longer (see below).

In the example below I would have hoped that it would have used an index 
scan on each of the tables returning 10 rows each and then done a merge 
on them. Am I asking too much? Should I just use a single table and take 
the hits on deletes and vacuums?

RegardsStuart


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
metadb=> \d L                     Table "test2.l"   Column    |              Type              | Modifiers
--------------+--------------------------------+-----------lineitem_key | bigint                         | not nulltime
       | timestamp(6) without time zone | not nulldescription  | text                           | not nullbarcode
|text                           | not nullamount       | bigint                         | not null
 
Indexes:   "l_amount_index" btree (amount, lineitem_key)

metadb=> \d L1                    Table "test2.l1"   Column    |              Type              | Modifiers
--------------+--------------------------------+-----------lineitem_key | bigint                         | not nulltime
       | timestamp(6) without time zone | not nulldescription  | text                           | not nullbarcode
|text                           | not nullamount       | bigint                         | not null
 
Indexes:   "l1_amount_index" btree (amount, lineitem_key)
Inherits: l

metadb=> \d L2                    Table "test2.l2"   Column    |              Type              | Modifiers
--------------+--------------------------------+-----------lineitem_key | bigint                         | not nulltime
       | timestamp(6) without time zone | not nulldescription  | text                           | not nullbarcode
|text                           | not nullamount       | bigint                         | not null
 
Indexes:   "l2_amount_index" btree (amount, lineitem_key)
Inherits: l

metadb=> explain select * from l order by amount,lineitem_key limit 10;                                    QUERY PLAN
-------------------------------------------------------------------------------------Limit  (cost=22207.70..22207.72
rows=10width=88)  ->  Sort  (cost=22207.70..23548.09 rows=536156 width=88)        Sort Key: test2.l.amount,
test2.l.lineitem_key       ->  Result  (cost=0.00..10621.56 rows=536156 width=88)              ->  Append
(cost=0.00..10621.56rows=536156 width=88)                    ->  Seq Scan on l  (cost=0.00..16.90 rows=690 width=88)
               ->  Seq Scan on l1 l  (cost=0.00..4951.00 
 
rows=250000 width=49)                    ->  Seq Scan on l2 l  (cost=0.00..5653.66 
rows=285466 width=49)
(8 rows)


NB. Just addressing one of the inherited tables works fine.

metadb=> explain select * from l1 order by amount,lineitem_key limit 10;                                       QUERY
PLAN
------------------------------------------------------------------------------------------Limit  (cost=0.00..0.74
rows=10width=49)  ->  Index Scan using l1_amount_index on l1  (cost=0.00..18554.20 
 
rows=250000 width=49)
(2 rows)>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Re: Am I wasting my time with partitions?

От
Richard Huxton
Дата:
Stuart Brooks wrote:
> It seems to me that postgresql doesn't use indexes when being asked for 
> an ordered result sets from a partitioned table. I have an application 
> where this is critical, but I was hoping to use partitions because of 
> the ease of rotating out old rows.

> metadb=> explain select * from l order by amount,lineitem_key limit 10;
>                                     QUERY PLAN
>                     ->  Seq Scan on l  (cost=0.00..16.90 rows=690 width=88)
>                     ->  Seq Scan on l1 l  (cost=0.00..4951.00 
> rows=250000 width=49)
>                     ->  Seq Scan on l2 l  (cost=0.00..5653.66 
> rows=285466 width=49)

> NB. Just addressing one of the inherited tables works fine.
> 
> metadb=> explain select * from l1 order by amount,lineitem_key limit 10;

Well, you don't have an index it can use to find the smallest 
(amount,lineitem) across all of lX. If PG was smart enough to figure out 
that it only needed to check l1, then you do. Unfortunately it isn't.

If you add the constraint you use to partition by, does that help you?

--   Richard Huxton  Archonet Ltd


Re: Am I wasting my time with partitions?

От
Stuart Brooks
Дата:
> It seems to me that postgresql doesn't use indexes when being asked 
> for an ordered result sets from a partitioned table. I have an 
> application where this is critical, but I was hoping to use partitions 
> because of the ease of rotating out old rows.
>
>> metadb=> explain select * from l order by amount,lineitem_key limit 10;
>>                                     QUERY PLAN
>>                     ->  Seq Scan on l  (cost=0.00..16.90 rows=690 
>> width=88)
>>                     ->  Seq Scan on l1 l  (cost=0.00..4951.00 
>> rows=250000 width=49)
>>                     ->  Seq Scan on l2 l  (cost=0.00..5653.66 
>> rows=285466 width=49)
>
>> NB. Just addressing one of the inherited tables works fine.
>>
>> metadb=> explain select * from l1 order by amount,lineitem_key limit 10;
>
> Well, you don't have an index it can use to find the smallest 
> (amount,lineitem) across all of lX. If PG was smart enough to figure 
> out that it only needed to check l1, then you do. Unfortunately it isn't.
>
You're right, it can't determine which of the partitions will have the 
smallest value, but what it could do is pull the smallest value from 
each and compare. In the absence of the LIMIT there wouldn't be much 
which could be done, but the the LIMIT means it only actually needs to 
pull 10 rows from each partition. An alternative way of doing this would be:

(SELECT * FROM L1 ORDER BY amount,lineitem_key LIMIT 10)
UNION
(SELECT * FROM L2 ORDER BY amount,lineitem_key LIMIT 10)
ORDER BY amount,lineitem_key LIMIT 10;

Unfortunately this means one can't just address the parent table, but it 
does essentially what I'd hoped postgres would do for me :) It would be 
quite a long query if there were 100 partitions!
> If you add the constraint you use to partition by, does that help you?
>
I tried to strip the example down to its bare essentials but in this 
case I would be partitioning by lineitem_key and would obviously index 
and add a CONSTRAINT on that as well. I don't think it would help 
though, the query needs to merge from all tables.

Thanks for the response,Stuart