Обсуждение: somewhat slow query with subselect
Hi list, I have a table, not that many rows (400k) but with about 55 columns. There are some maintenance selects I want to do (every now and then) that I don't want to add indexes for. select ... from quite_big_table qbt where qbt.owner = 123 AND exists ( select null from quite_big_table qbt2 where qbt2.owner = qbt.owner AND qbt2.id <> qbt.id AND qbt2.filelength = qbt.filelength ) In a case with some 5000 rows belonging to owner 123, this select really takes a long time. Way longer than without the subselect and order by filelength. I agree that with the subselect it would take some extra juice, but in my mind it would do some hash in memory which wouldn't be too slow to lookup in. shared_buffers = 16GB temp_buffers = 1GB work_mem = 32MB maintenance_work_mem = 1GB #server has 64GB (64bit) I guess there is some penalty because of the size (# columns) of the table since it has to go thru more blocks on the disk. Would it have been beneficial if filelength was in a joined child table instead? How would you do this? Create a temporary table with owner+filelenght? Thanks! Marcus
Marcus Engene <mengpg2@engene.se> writes: > ... In a case with some 5000 rows belonging to owner 123, this select really > takes a long time. Way longer than without the subselect and order by > filelength. I agree that with the subselect it would take some extra > juice, but in my mind it would do some hash in memory which wouldn't be > too slow to lookup in. 8.4 can turn EXISTS subqueries into hash joins, but previous releases won't... regards, tom lane
Tom Lane wrote: > Marcus Engene <mengpg2@engene.se> writes: > >> ... In a case with some 5000 rows belonging to owner 123, this select really >> takes a long time. Way longer than without the subselect and order by >> filelength. I agree that with the subselect it would take some extra >> juice, but in my mind it would do some hash in memory which wouldn't be >> too slow to lookup in. >> > > 8.4 can turn EXISTS subqueries into hash joins, but previous releases > won't... > > regards, tom lane > > Thank you very much for your answer, Tom. I tried to join the table instead and it was way faster. Sort (cost=46769.87..46770.51 rows=258 width=48) Sort Key: pic.filesize -> Nested Loop (cost=34.30..46759.54 rows=258 width=48) Join Filter: ((picsame.objectid <> pic.objectid) AND (pic.filesize = picsame.filesize)) -> Nested Loop (cost=8.27..3099.28 rows=16 width=56) -> HashAggregate (cost=8.27..8.28 rows=1 width=4) -> Index Scan using user_c2 on user pu2 (cost=0.00..8.27 rows=1 width=4) Index Cond: ((username_locase)::text = 'prolificarts'::text) -> Index Scan using item_common_x1 on item_common pic (cost=0.00..3081.41 rows=767 width=52) Index Cond: (pic.user = pu2.objectid) -> Bitmap Heap Scan on item_common picsame (cost=26.03..2715.34 rows=767 width=16) Recheck Cond: (picsame.user = pic.user) -> Bitmap Index Scan on item_common_x1 (cost=0.00..25.84 rows=767 width=0) Index Cond: (picsame.user = pic.user) Best regards, Marcus