Re: strange query plan with LIMIT
От | Pavel Stehule |
---|---|
Тема | Re: strange query plan with LIMIT |
Дата | |
Msg-id | BANLkTikqL1yKaaLSc20uoSS2zO-GPG56vg@mail.gmail.com обсуждение исходный текст |
Ответ на | strange query plan with LIMIT (anthony.shipman@symstream.com) |
Список | pgsql-performance |
Hello did you run a ANALYZE statement on table tdiag? A statistics are absolutelly out. Regards Pavel Stehule 2011/6/7 <anthony.shipman@symstream.com>: > Version: PostgreSQL 8.3.5 (mammoth replicator) > > Schema: > > CREATE TABLE tdiag ( > diag_id integer DEFAULT nextval('diag_id_seq'::text), > create_time timestamp with time zone default now(), /* time this record > was created */ > diag_time timestamp with time zone not null, > device_id integer, /* optional */ > fleet_id integer, /* optional */ > customer_id integer, /* optional */ > module character varying, > node_kind smallint, > diag_level smallint, > tag character varying not null default '', > message character varying not null default '', > options text, > > PRIMARY KEY (diag_id) > ); > > create index tdiag_create_time ON tdiag(create_time); > > The number of rows is around 33 million with time stamps over the past two > weeks. > A VACUUM ANALYZE has been done recently on the table. > > The create_time order is almost identical to the id order. What I want > to find is the first or last entry by id in a given time range. The > query I am having a problem with is: > > symstream2=> explain analyze select * from tdiag where (create_time >>= '2011-06-03 > 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by > diag_id limit 1; > > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..16.75 rows=1 width=114) (actual time=69425.356..69425.358 > rows=1 loops=1) > -> Index Scan using tdiag_pkey on tdiag (cost=0.00..19114765.76 > rows=1141019 width=114) > (actual time=69425.352..69425.352 rows=1 loops=1) > Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with > time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > Total runtime: 69425.400 ms > > PG seems to decide it must scan the diag_id column and filter each row by the > create_time. > > > > If I leave out the limit I get > > symstream2=> explain analyze select * from tdiag where (create_time >>= '2011-06-03 > 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by > diag_id; > > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Sort (cost=957632.43..960484.98 rows=1141019 width=114) (actual > time=552.795..656.319 rows=86530 > loops=1) > Sort Key: diag_id > Sort Method: external merge Disk: 9872kB > -> Bitmap Heap Scan on tdiag (cost=25763.48..638085.13 rows=1141019 > width=114) (actual > time=43.232..322.441 rows=86530 loops=1) > Recheck Cond: ((create_time >= '2011-06-03 19:49:04+10'::timestamp > with time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > -> Bitmap Index Scan on tdiag_create_time (cost=0.00..25478.23 > rows=1141019 width=0) > (actual time=42.574..42.574 rows=86530 loops=1) > Index Cond: ((create_time >= '2011-06-03 > 19:49:04+10'::timestamp with time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > Total runtime: 736.440 ms > (8 rows) > > > > > I can be explicit about the query order: > > select * into tt from tdiag where (create_time >= '2011-06-03 > 09:49:04.000000+0' and create_time < > '2011-06-06 09:59:04.000000+0'); > > symstream2=> explain analyze select * from tt order by diag_id limit 1; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Limit (cost=2731.95..2731.95 rows=1 width=101) (actual time=440.165..440.166 > rows=1 loops=1) > -> Sort (cost=2731.95..2948.28 rows=86530 width=101) (actual > time=440.161..440.161 rows=1 > loops=1) > Sort Key: diag_id > Sort Method: top-N heapsort Memory: 17kB > -> Seq Scan on tt (cost=0.00..2299.30 rows=86530 width=101) (actual > time=19.602..330.873 > rows=86530 loops=1) > Total runtime: 440.209 ms > (6 rows) > > > > But if I try using a subquery I get > > symstream2=> explain analyze select * from (select * from tdiag where > (create_time >= '2011-06-03 > 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0')) as sub > order by diag_id limit > 1; > > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..16.75 rows=1 width=114) (actual time=90344.384..90344.385 > rows=1 loops=1) > -> Index Scan using tdiag_pkey on tdiag (cost=0.00..19114765.76 > rows=1141019 width=114) > (actual time=90344.380..90344.380 rows=1 loops=1) > Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with > time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > Total runtime: 90344.431 ms > > > How do I make this both fast and simple? > -- > Anthony Shipman | flailover systems: When one goes down it > Anthony.Shipman@symstream.com | flails about until the other goes down too. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
В списке pgsql-performance по дате отправления:
Предыдущее
От: Craig RingerДата:
Сообщение: Re: i want to ask monitory peformance memory postgresql with automatically