Re: Slow query with backwards index scan

От: Tilmann Singer
Тема: Re: Slow query with backwards index scan
Дата: ,
Msg-id: 20070730191025.GB4284@tils.net
(см: обсуждение, исходный текст)
Ответ на: Re: Slow query with backwards index scan  (Nis Jørgensen)
Ответы: Re: Slow query with backwards index scan  (Nis Jørgensen)
Список: pgsql-performance

Скрыть дерево обсуждения

Slow query with backwards index scan  (Tilmann Singer, )
 Re: Slow query with backwards index scan  (Nis Jørgensen, )
  Re: Slow query with backwards index scan  (Tilmann Singer, )
 Re: Slow query with backwards index scan  (, )
  Re: Slow query with backwards index scan  (Tilmann Singer, )
   Re: Slow query with backwards index scan  (Craig James, )
    Re: Slow query with backwards index scan  (Tilmann Singer, )
     Re: Slow query with backwards index scan  (Nis Jørgensen, )
      Re: Slow query with backwards index scan  (Tilmann Singer, )
       Re: Slow query with backwards index scan  (Nis Jørgensen, )
   Re: Slow query with backwards index scan  (Jeremy Harris, )
 Re: Slow query with backwards index scan  (, )

* Nis Jørgensen <> [20070730 18:33]:
> It seems to me the subselect plan would benefit quite a bit from not
> returning all rows, but only the 10 latest for each user. I believe the
> problem is similar to what is discussed for UNIONs here:
>
>
http://groups.google.dk/group/pgsql.general/browse_thread/thread/4f74d7faa8a5a608/367f5052b1bbf1c8?lnk=st&q=postgresql+limit++union&rnum=1&hl=en#367f5052b1bbf1c8
>
> which got implemented here:
>
>
http://groups.google.dk/group/pgsql.committers/browse_thread/thread/b1ac3c3026db096c/9b3e5bd2d612f565?lnk=st&q=postgresql+limit++union&rnum=26&hl=en#9b3e5bd2d612f565
>
> It seems to me the planner in this case would actually need to push the
> limit into the nested loop, since we want the plan to take advantage of
> the index (using a backwards index scan). I am ready to be corrected though.

If I understand that correctly than this means that it would benefit
the planning for something like

SELECT FROM (q1 UNION ALL q2) ORDER BY ... LIMIT ...

if any of q1 or q2 would satisfy the rows requested by limit early,
instead of planning q1 and q2 without having the limit of the outer
query an influence.

Unfortunately I'm having problems making my q2 reasonably efficient in
the first place, even before UNIONing it.

> You could try this (quite hackish) attempt at forcing the query planner
> to use this plan:
>
> SELECT * FROM large_table lt
>  WHERE user_id IN (SELECT contact_id FROM relationships WHERE
>  user_id=55555) AND created_at in (SELECT created_at FROM large_table
> lt2 WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
>  ORDER BY created_at DESC LIMIT 10;

No for the user with many matches at the beginning:

testdb=# EXPLAIN ANALYZE
SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE
 user_id=55555) AND created_at IN (SELECT created_at FROM large_table lt2
   WHERE lt2.user_id = lt.user_id ORDER BY created_at DESC limit 10)
 ORDER BY created_at DESC LIMIT 10;
                                                                                                 QUERY PLAN
                                   

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=45555.94..45555.97 rows=10 width=621) (actual time=70550.549..70550.616 rows=10 loops=1)
   ->  Sort  (cost=45555.94..45557.46 rows=605 width=621) (actual time=70550.542..70550.569 rows=10 loops=1)
         Sort Key: lt.created_at
         ->  Nested Loop  (cost=39.52..45527.99 rows=605 width=621) (actual time=2131.501..70548.313 rows=321 loops=1)
               ->  HashAggregate  (cost=39.52..39.53 rows=1 width=4) (actual time=0.406..0.615 rows=40 loops=1)
                     ->  Bitmap Heap Scan on relationships  (cost=4.33..39.49 rows=10 width=4) (actual
time=0.075..0.248rows=40 loops=1) 
                           Recheck Cond: (user_id = 55555)
                           ->  Bitmap Index Scan on relationships_user_id_contact_id_index  (cost=0.00..4.33 rows=10
width=0)(actual time=0.052..0.052 rows=40 loops=1) 
                                 Index Cond: (user_id = 55555)
               ->  Index Scan using large_user_id_started_at_index on large_table lt  (cost=0.00..45474.29 rows=1134
width=621)(actual time=1762.067..1763.637 rows=8 loops=40) 
                     Index Cond: (lt.user_id = relationships.contact_id)
                     Filter: (subplan)
                     SubPlan
                       ->  Limit  (cost=0.00..34.04 rows=10 width=8) (actual time=0.048..0.147 rows=10 loops=69018)
                             ->  Index Scan Backward using large_user_id_created_at_index on large_table lt2
(cost=0.00..7721.24rows=2268 width=8) (actual time=0.040..0.087 rows=10 loops=69018) 
                                   Index Cond: (user_id = $0)
 Total runtime: 70550.847 ms


The same plan is generated for the user with few matches and executes
very fast.


> If that doesn't work, you might have reached the point where you need to
> use some kind of record-keeping system to keep track of which records to
> look at.

Yes, I'm considering that unfortunately.

Seeing however that there are 2 different queries which result in very
efficient plans for one of the 2 different cases, but not the other,
makes me hope there is a way to tune the planner into always coming up
with the right plan. I'm not sure if I explained the problem well
enough and will see if I can come up with a reproduction case with
generated data.


Til


В списке pgsql-performance по дате сообщения:

От: Karl Denninger
Дата:
Сообщение: Query optimization....
От: Tom Lane
Дата:
Сообщение: Re: disk filling up