Обсуждение: LIMIT on partitioned-table!?

Поиск
Список
Период
Сортировка

LIMIT on partitioned-table!?

От
"Kim A. Brandt"
Дата:
Hello list,

does `postgres (PostgreSQL) 8.4.5' use the LIMIT of a query when it is run on a partitioned-table or am I doing
somethingwrong? It looks as if postgres queries all partitions and then LIMITing the records afterwards!? This results
ina long (>3 minutes) running query. What can I do to optimise this? 

The query could look like this:

     EXPLAIN ANALYSE
     SELECT *
     FROM flexserver.unitstat
     WHERE nodeid = 'abcd'
     AND ts > '2010-01-01 00:00:00'
     AND ts < '2011-02-15 15:00:00'
     ORDER BY nodeid, ts
     LIMIT 1000;

This is the `EXPLAIN ANALYSE'-output:

     Limit  (cost=232195.49..232197.99 rows=1000 width=194) (actual time=205846.722..205852.218 rows=1000 loops=1)
       ->  Sort  (cost=232195.49..232498.26 rows=121108 width=194) (actual time=205846.717..205848.684 rows=1000
loops=1)
             Sort Key: flexserver.unitstat.ts
             Sort Method:  top-N heapsort  Memory: 314kB
             ->  Result  (cost=0.00..225555.27 rows=121108 width=194) (actual time=444.969..205136.182 rows=203492
loops=1)
                   ->  Append  (cost=0.00..225555.27 rows=121108 width=194) (actual time=444.963..204236.800
rows=203492loops=1) 
                         ->  Seq Scan on unitstat  (cost=0.00..14.90 rows=1 width=258) (actual time=0.007..0.007 rows=0
loops=1)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone) AND ((nodeid)::text = 'abcd'::text)) 
                         ->  Bitmap Heap Scan on unitstat_y2011m01 unitstat  (cost=116.47..8097.17 rows=4189 width=194)
(actualtime=444.949..9900.002 rows=5377 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2011m01_nodeid_gps_ts  (cost=0.00..115.42
rows=4190width=0) (actual time=426.599..426.599 rows=5377 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Bitmap Heap Scan on unitstat_y2011m02 unitstat  (cost=52.67..3689.16 rows=1906 width=194)
(actualtime=73.512..3211.698 rows=796 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2011m02_nodeid_gps_ts  (cost=0.00..52.20
rows=1906width=0) (actual time=55.458..55.458 rows=796 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Index Scan using fki_unitstat_y2010m02_nodeid_ts_fkey on unitstat_y2010m02 unitstat
(cost=0.00..10179.11rows=5257 width=193) (actual time=39.531..11660.741 rows=6524 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                         ->  Index Scan using fki_unitstat_y2010m01_nodeid_ts_fkey on unitstat_y2010m01 unitstat
(cost=0.00..10324.31rows=5358 width=193) (actual time=38.255..9808.237 rows=7128 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                         ->  Bitmap Heap Scan on unitstat_y2010m11 unitstat  (cost=586.92..39314.99 rows=21965
width=195)(actual time=1417.528..26090.404 rows=24464 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2010m11_nodeid_gps_ts  (cost=0.00..581.43
rows=21970width=0) (actual time=1400.898..1400.898 rows=24464 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Bitmap Heap Scan on unitstat_y2010m12 unitstat  (cost=128.72..9050.29 rows=4683 width=194)
(actualtime=238.679..7472.936 rows=2014 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2010m12_nodeid_gps_ts  (cost=0.00..127.55
rows=4684width=0) (actual time=225.009..225.009 rows=2014 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Bitmap Heap Scan on unitstat_y2010m10 unitstat  (cost=101.74..9686.81 rows=4987 width=194)
(actualtime=488.130..35826.742 rows=25279 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2010m10_nodeid_gps_ts  (cost=0.00..100.49
rows=4988width=0) (actual time=472.796..472.796 rows=25279 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Bitmap Heap Scan on unitstat_y2010m09 unitstat  (cost=489.56..49567.74 rows=27466
width=194)(actual time=185.198..12753.315 rows=31099 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2010m09_nodeid_gps_ts  (cost=0.00..482.69
rows=27472width=0) (actual time=158.072..158.072 rows=31099 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Index Scan using fki_unitstat_y2010m08_nodeid_ts_fkey on unitstat_y2010m08 unitstat
(cost=0.00..9353.76rows=4824 width=194) (actual time=31.351..10259.090 rows=17606 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                         ->  Index Scan using fki_unitstat_y2010m07_nodeid_ts_fkey on unitstat_y2010m07 unitstat
(cost=0.00..8686.72rows=4492 width=194) (actual time=41.572..9636.335 rows=9511 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                         ->  Bitmap Heap Scan on unitstat_y2010m06 unitstat  (cost=311.50..32142.18 rows=17406
width=194)(actual time=113.857..12136.570 rows=17041 loops=1) 
                               Recheck Cond: ((nodeid)::text = 'abcd'::text)
                               Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                               ->  Bitmap Index Scan on idx_unitstat_y2010m06_nodeid_gps_ts  (cost=0.00..307.15
rows=17410width=0) (actual time=91.638..91.638 rows=17041 loops=1) 
                                     Index Cond: ((nodeid)::text = 'abcd'::text)
                         ->  Index Scan using fki_unitstat_y2010m05_nodeid_ts_fkey on unitstat_y2010m05 unitstat
(cost=0.00..11942.82rows=6279 width=193) (actual time=62.264..19887.675 rows=19246 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                         ->  Index Scan using fki_unitstat_y2010m04_nodeid_ts_fkey on unitstat_y2010m04 unitstat
(cost=0.00..11840.93rows=6194 width=193) (actual time=52.735..17302.361 rows=21936 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                         ->  Index Scan using fki_unitstat_y2010m03_nodeid_ts_fkey on unitstat_y2010m03 unitstat
(cost=0.00..11664.36rows=6101 width=194) (actual time=66.613..17541.374 rows=15471 loops=1) 
                               Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
     Total runtime: 205855.569 ms


Regards,

Kim

Re: LIMIT on partitioned-table!?

От
Shaun Thomas
Дата:
On 02/15/2011 08:23 AM, Kim A. Brandt wrote:

> does `postgres (PostgreSQL) 8.4.5' use the LIMIT of a query when it
> is run on a partitioned-table or am I doing something wrong? It looks
> as if postgres queries all partitions and then LIMITing the records
> afterwards!? This results in a long (>3 minutes) running query. What
> can I do to optimise this?

Make sure you have constraint_exclusion set to 'on' in your config.
Also, what are your checks for your partitions? You've got a pretty wide
range in your 'ts' checks, so if you're using them as your partition
definition, you're not helping yourself.

The main issue might just be that you've used an order clause. LIMIT
1000 or not, even if it can restrict the result set based on your CHECK
criteria, it'll still need to select every matching row from every
matched partition, order the results, and chop off the first 1000.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: LIMIT on partitioned-table!?

От
"Kim A. Brandt"
Дата:
Thank you Shaun,

removing the ORDER BY worked. But I am afraid to ask this. How can I order by partition? It seams that the planner has
pickeda random(!?) order of partition to select from. The returned records, from the selected partition, are correctly
sortedbythe index though. 

On 2011-02-15 15:49, Shaun Thomas wrote:
> On 02/15/2011 08:23 AM, Kim A. Brandt wrote:
>
>> does `postgres (PostgreSQL) 8.4.5' use the LIMIT of a query when it
>> is run on a partitioned-table or am I doing something wrong? It looks
>> as if postgres queries all partitions and then LIMITing the records
>> afterwards!? This results in a long (>3 minutes) running query. What
>> can I do to optimise this?
>
> Make sure you have constraint_exclusion set to 'on' in your config. Also, what are your checks for your partitions?
You'vegot a pretty wide range in your 'ts' checks, so if you're using them as your partition definition, you're not
helpingyourself. 

The parameter `constraint_exclusion' is set to `partition'. Postgres is on FreeBSD.

My checks (if I understand you right) are as follows:

     CREATE TABLE flexserver.unitstat_y2011m02
     (
       ts timestamp without time zone NOT NULL,
       nodeid character varying(10) NOT NULL,
       gps_ts timestamp without time zone NOT NULL,
       ...
       CONSTRAINT unitstat_y2011m02_ts_check CHECK (ts >= '2011-02-01 00:00:00'::timestamp without time zone AND ts <
'2011-03-0100:00:00'::timestamp without time zone) 
     )
     INHERITS (flexserver.unitstat);

Each partition is constrained to one month.

About the wide range, I am aware of that. This probably has to change anyway!? So the current (and probably final
solution)is to use a narrower search range. Thank you for the hint. 

> The main issue might just be that you've used an order clause. LIMIT 1000 or not, even if it can restrict the result
setbased on your CHECK criteria, it'll still need to select every matching row from every matched partition, order the
results,and chop off the first 1000. 

That was it. Just how can one order by partition if one would do a wide range search over multiple partitions?

The new query and EXPLAIN ANALYSE-output is:

     SELECT *
     FROM flexserver.unitstat
     WHERE nodeid = 'abcd'
     AND ts > '2010-01-01 00:00:00'
     AND ts < '2011-02-15 15:00:00'
     --ORDER BY nodeid, ts
     LIMIT 1000;


     Limit  (cost=0.00..1862.46 rows=1000 width=194) (actual time=2.569..18.948 rows=1000 loops=1)
       ->  Result  (cost=0.00..225611.08 rows=121136 width=194) (actual time=2.566..15.412 rows=1000 loops=1)
             ->  Append  (cost=0.00..225611.08 rows=121136 width=194) (actual time=2.558..11.243 rows=1000 loops=1)
                   ->  Seq Scan on unitstat  (cost=0.00..14.90 rows=1 width=258) (actual time=0.003..0.003 rows=0
loops=1)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone) AND ((nodeid)::text = 'abcd'::text)) 
                   ->  Bitmap Heap Scan on unitstat_y2011m01 unitstat  (cost=116.47..8097.17 rows=4189 width=194)
(actualtime=2.550..7.701 rows=1000 loops=1) 
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2011m01_nodeid_gps_ts  (cost=0.00..115.42 rows=4190
width=0)(actual time=1.706..1.706 rows=5377 loops=1) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Bitmap Heap Scan on unitstat_y2011m02 unitstat  (cost=52.92..3744.97 rows=1934 width=194) (never
executed)
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2011m02_nodeid_gps_ts  (cost=0.00..52.44 rows=1935
width=0)(never executed) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Index Scan using fki_unitstat_y2010m02_nodeid_ts_fkey on unitstat_y2010m02 unitstat
(cost=0.00..10179.11rows=5257 width=193) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                   ->  Index Scan using fki_unitstat_y2010m01_nodeid_ts_fkey on unitstat_y2010m01 unitstat
(cost=0.00..10324.31rows=5358 width=193) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                   ->  Bitmap Heap Scan on unitstat_y2010m11 unitstat  (cost=586.92..39314.99 rows=21965 width=195)
(neverexecuted) 
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2010m11_nodeid_gps_ts  (cost=0.00..581.43 rows=21970
width=0)(never executed) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Bitmap Heap Scan on unitstat_y2010m12 unitstat  (cost=128.72..9050.29 rows=4683 width=194)
(neverexecuted) 
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2010m12_nodeid_gps_ts  (cost=0.00..127.55 rows=4684
width=0)(never executed) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Bitmap Heap Scan on unitstat_y2010m10 unitstat  (cost=101.74..9686.81 rows=4987 width=194)
(neverexecuted) 
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2010m10_nodeid_gps_ts  (cost=0.00..100.49 rows=4988
width=0)(never executed) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Bitmap Heap Scan on unitstat_y2010m09 unitstat  (cost=489.56..49567.74 rows=27466 width=194)
(neverexecuted) 
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2010m09_nodeid_gps_ts  (cost=0.00..482.69 rows=27472
width=0)(never executed) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Index Scan using fki_unitstat_y2010m08_nodeid_ts_fkey on unitstat_y2010m08 unitstat
(cost=0.00..9353.76rows=4824 width=194) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                   ->  Index Scan using fki_unitstat_y2010m07_nodeid_ts_fkey on unitstat_y2010m07 unitstat
(cost=0.00..8686.72rows=4492 width=194) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                   ->  Bitmap Heap Scan on unitstat_y2010m06 unitstat  (cost=311.50..32142.18 rows=17406 width=194)
(neverexecuted) 
                         Recheck Cond: ((nodeid)::text = 'abcd'::text)
                         Filter: ((ts > '2010-01-01 00:00:00'::timestamp without time zone) AND (ts < '2011-02-15
15:00:00'::timestampwithout time zone)) 
                         ->  Bitmap Index Scan on idx_unitstat_y2010m06_nodeid_gps_ts  (cost=0.00..307.15 rows=17410
width=0)(never executed) 
                               Index Cond: ((nodeid)::text = 'abcd'::text)
                   ->  Index Scan using fki_unitstat_y2010m05_nodeid_ts_fkey on unitstat_y2010m05 unitstat
(cost=0.00..11942.82rows=6279 width=193) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                   ->  Index Scan using fki_unitstat_y2010m04_nodeid_ts_fkey on unitstat_y2010m04 unitstat
(cost=0.00..11840.93rows=6194 width=193) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
                   ->  Index Scan using fki_unitstat_y2010m03_nodeid_ts_fkey on unitstat_y2010m03 unitstat
(cost=0.00..11664.36rows=6101 width=194) (never executed) 
                         Index Cond: (((nodeid)::text = 'abcd'::text) AND (ts > '2010-01-01 00:00:00'::timestamp
withouttime zone) AND (ts < '2011-02-15 15:00:00'::timestamp without time zone)) 
     Total runtime: 21.219 ms


Now most partitions are not looked at (never executed). But how can one affect the order of partition (e.g. begin with
theoldest)? 

Sorry for asking the same thing thrice. I just need to understand this one.  :)


Kind regards,

Kim

Re: LIMIT on partitioned-table!?

От
Marti Raudsepp
Дата:
On Tue, Feb 15, 2011 at 21:33, Kim A. Brandt <kimabrandt@gmx.de> wrote:
> removing the ORDER BY worked. But I am afraid to ask this. How can I order
> by partition? It seams that the planner has picked a random(!?) order of
> partition to select from. The returned records, from the selected partition,
> are correctly sorted bythe index though.

If a single query accesses more than one partition, PostgreSQL
currently cannot read the values in index-sorted order. Hence with
ORDER BY and LIMIT, PostgreSQL cannot return *any* results before it
has read all matching rows and then sorted them. Adding a LIMIT
doesn't help much. Your only bet is to reduce the number of matched
rows, or make sure that you only access a single partition.

Increasing work_mem may speed up the sort step if you're hitting the
disk (EXPLAIN ANALYZE VERBOSE will tell you whether that's the case).

This will change in PostgreSQL 9.1 which has a new Merge Append plan node.

Regards,
Marti

Re: LIMIT on partitioned-table!?

От
"Kim A. Brandt"
Дата:
Thank you Marti,

I will go with the ``reduced number of matched rows'' and naturally be waiting for postgres 9.1 expectantly.


Kind regards,

Kim



On 2011-02-15 22:13, Marti Raudsepp wrote:
> On Tue, Feb 15, 2011 at 21:33, Kim A. Brandt<kimabrandt@gmx.de>  wrote:
>> removing the ORDER BY worked. But I am afraid to ask this. How can I order
>> by partition? It seams that the planner has picked a random(!?) order of
>> partition to select from. The returned records, from the selected partition,
>> are correctly sorted bythe index though.
>
> If a single query accesses more than one partition, PostgreSQL
> currently cannot read the values in index-sorted order. Hence with
> ORDER BY and LIMIT, PostgreSQL cannot return *any* results before it
> has read all matching rows and then sorted them. Adding a LIMIT
> doesn't help much. Your only bet is to reduce the number of matched
> rows, or make sure that you only access a single partition.
>
> Increasing work_mem may speed up the sort step if you're hitting the
> disk (EXPLAIN ANALYZE VERBOSE will tell you whether that's the case).
>
> This will change in PostgreSQL 9.1 which has a new Merge Append plan node.
>
> Regards,
> Marti