Hi all.
I'm seeing something fishy when trying to self-join two large tables and then
order by one column.
I have the following schema:
CREATE TABLE origo_person(
id SERIAL PRIMARY KEY,
firstname varchar,
lastname varchar,
created timestamp not null default now(),
created_by integer REFERENCES onp_user(id),
onp_user_id integer references onp_user(id)
);
create index origo_person_created_idx on origo_person(created);
create index origo_person_createdby_idx on origo_person(created_by);
create index origo_person_onp_user_id_idx on origo_person(onp_user_id);
create index origo_person_firstname_idx on origo_person(firstname);
insert into onp_user(id) values (1);
copy origo_person (firstname, lastname, created)
from '/home/andreak/simpleperson.sql';
update origo_person set created_by = 1;
update origo_person set onp_user_id = 1 where id = 1;
simpleperson.sql has more than 200K entries in COPY-format:
$ tail -5 /home/andreak/simpleperson.sql
INGOLF KALLEBERG 2007-08-21 22:23:43.571421
SIGRUNN BRUVIK 2007-08-21 22:23:43.571421
ELFRID FROGNER 2007-08-21 22:23:43.571421
GUNNAR KRISTOFFER DOVLAND 2007-08-21 22:23:43.571421
JAN ARNE HAARR 2007-08-21 22:23:43.571421
Now, the two first queries are *fast*, but the 3rd query is slow:
1. Fast:
EXPLAIN ANALYZE SELECT p.firstname, p.lastname FROM origo_person p order by
p.firstname ASC limit 5; QUERY
PLAN
-----------------------------------------------------------------------------Limit (cost=0.00..0.55 rows=5 width=17)
(actualtime=0.031..0.070 rows=5
loops=1) -> Index Scan using origo_person_firstname_idx on origo_person p
(cost=0.00..22277.13 rows=200827 width=17) (actual time=0.025..0.046 rows=5
loops=1)Total runtime: 0.128 ms
(3 rows)
2. Fast:
EXPLAIN ANALYZE SELECT p.firstname, p.lastname FROM origo_person p,
origo_person pcb WHERE pcb.onp_user_id = p.created_by limit 5;
QUERY PLAN
-----------------------------------------------------------------------------Limit (cost=0.00..31526.55 rows=1
width=17)(actual time=0.096..0.170 rows=5
loops=1) -> Merge Join (cost=0.00..31526.55 rows=1 width=17) (actual
time=0.091..0.145 rows=5 loops=1) Merge Cond: (p.created_by = pcb.onp_user_id) -> Index Scan using
origo_person_createdby_idxon origo_person p
(cost=0.00..10697.70 rows=200827 width=21) (actual time=0.045..0.057 rows=5
loops=1) -> Index Scan using origo_person_onp_user_id_idx on origo_person pcb
(cost=0.00..19824.70 rows=200827 width=4) (actual time=0.032..0.044 rows=5
loops=1)Total runtime: 0.264 ms
(6 rows)
3. Slow:
EXPLAIN ANALYZE SELECT p.firstname, p.lastname FROM origo_person p,
origo_person pcb WHERE pcb.onp_user_id = p.created_by order by p.firstname
ASC limit 5;
QUERY PLAN
-----------------------------------------------------------------------------Limit (cost=31526.56..31526.56 rows=1
width=17)(actual
time=2573.993..2574.015 rows=5 loops=1) -> Sort (cost=31526.56..31526.56 rows=1 width=17) (actual
time=2573.987..2573.994 rows=5 loops=1) Sort Key: p.firstname Sort Method: top-N heapsort Memory: 17kB
-> Merge Join (cost=0.00..31526.55 rows=1 width=17) (actual
time=0.098..2047.726 rows=200827 loops=1) Merge Cond: (p.created_by = pcb.onp_user_id) ->
IndexScan using origo_person_createdby_idx on origo_person
p (cost=0.00..10697.70 rows=200827 width=21) (actual time=0.052..428.445
rows=200827 loops=1) -> Index Scan using origo_person_onp_user_id_idx on
origo_person pcb (cost=0.00..19824.70 rows=200827 width=4) (actual
time=0.031..424.250 rows=200828 loops=1)Total runtime: 2574.113 ms
(9 rows)
Can anybody point out to me why PG doesn't perform better on the last query?
--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Karenslyst Allé 11 | know how to do a thing and to watch |
PO. Box 529 Skøyen | somebody else doing it wrong, without |
0214 Oslo | comment. |
NORWAY | |
Tlf: +47 24 15 38 90 | |
Fax: +47 24 15 38 91 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+