Re: [PERFORM] Query performance changes significantly depending onlimit value

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: [PERFORM] Query performance changes significantly depending onlimit value
Дата
Msg-id CAFj8pRDC=rDJEhoFSkWu36FHbLVC8MYy=HAXFNZ2=qKsdzx+DQ@mail.gmail.com
обсуждение исходный текст
Ответ на [PERFORM] Query performance changes significantly depending on limit value  (Rowan Seymour <rowanseymour@gmail.com>)
Ответы Re: [PERFORM] Query performance changes significantly depending onlimit value  (Rowan Seymour <rowanseymour@gmail.com>)
Список pgsql-performance


2017-02-23 14:11 GMT+01:00 Rowan Seymour <rowanseymour@gmail.com>:
Hi guys

I'm a bit stuck on a query that performs fantastically up to a certain limit value, after which the planner goes off in a completely different direction and performance gets dramatically worse. Am using Postgresql 9.3

You can see all the relevant schemas at http://pastebin.com/PNEqw2id and in the test database there are 1,000,000 records in contacts_contact, and about half of those will match the subquery on values_value.

The query in question is:

SELECT "contacts_contact".* FROM "contacts_contact"
INNER JOIN "contacts_contactgroup_contacts" ON ("contacts_contact"."id" = "contacts_contactgroup_contacts"."contact_id")
WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1
       AND "contacts_contact"."id" IN (
         SELECT U0."contact_id" FROM "values_value" U0 WHERE (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = UPPER('F'))
       )
) ORDER BY "contacts_contact"."id" DESC LIMIT 222;

With that limit of 222, it performs like:

Limit  (cost=3.09..13256.36 rows=222 width=88) (actual time=0.122..3.358 rows=222 loops=1)
  Buffers: shared hit=708 read=63
  ->  Nested Loop  (cost=3.09..59583.10 rows=998 width=88) (actual time=0.120..3.304 rows=222 loops=1)
        Buffers: shared hit=708 read=63
        ->  Merge Semi Join  (cost=2.65..51687.89 rows=2004 width=92) (actual time=0.103..1.968 rows=227 loops=1)
              Merge Cond: (contacts_contact.id = u0.contact_id)
              Buffers: shared hit=24 read=63
              ->  Index Scan Backward using contacts_contact_pkey on contacts_contact  (cost=0.42..41249.43 rows=1000000 width=88) (actual time=0.008..0.502 rows=1117 loops=1)
                    Buffers: shared hit=22 read=2
              ->  Index Scan using values_value_field_string_value_contact on values_value u0  (cost=0.43..7934.72 rows=2004 width=4) (actual time=0.086..0.857 rows=227 loops=1)
                    Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
                    Buffers: shared hit=2 read=61
        ->  Index Only Scan using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts  (cost=0.43..3.93 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=227)
              Index Cond: ((contactgroup_id = 1) AND (contact_id = contacts_contact.id))
              Heap Fetches: 0
              Buffers: shared hit=684
Total runtime: 3.488 ms


But if increase the limit to 223 then it performs like:

Limit  (cost=8785.68..13306.24 rows=223 width=88) (actual time=2685.830..2686.534 rows=223 loops=1)
  Buffers: shared hit=767648 read=86530
  ->  Merge Join  (cost=8785.68..29016.70 rows=998 width=88) (actual time=2685.828..2686.461 rows=223 loops=1)
        Merge Cond: (contacts_contact.id = contacts_contactgroup_contacts.contact_id)
        Buffers: shared hit=767648 read=86530
        ->  Sort  (cost=8784.44..8789.45 rows=2004 width=92) (actual time=2685.742..2685.804 rows=228 loops=1)
              Sort Key: contacts_contact.id
              Sort Method: quicksort  Memory: 34327kB
              Buffers: shared hit=767648 read=86524
              ->  Nested Loop  (cost=6811.12..8674.53 rows=2004 width=92) (actual time=646.573..2417.291 rows=200412 loops=1)

There is pretty bad estimation probably due dependency between contact_field_id = 1 and upper(string_value) = 'F'::text

The most simple solution is disable nested loop - set enable_nestloop to off

Regards

Pavel 
 
                    Buffers: shared hit=767648 read=86524
                    ->  HashAggregate  (cost=6810.70..6813.14 rows=244 width=4) (actual time=646.532..766.200 rows=200412 loops=1)
                          Buffers: shared read=51417
                          ->  Bitmap Heap Scan on values_value u0  (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709 rows=200412 loops=1)
                                Recheck Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
                                Buffers: shared read=51417
                                ->  Bitmap Index Scan on values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004 width=0) (actual time=70.647..70.647 rows=200412 loops=1)
                                      Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
                                      Buffers: shared read=770
                    ->  Index Scan using contacts_contact_pkey on contacts_contact  (cost=0.42..7.62 rows=1 width=88) (actual time=0.007..0.007 rows=1 loops=200412)
                          Index Cond: (id = u0.contact_id)
                          Buffers: shared hit=767648 read=35107
        ->  Index Only Scan Backward using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts  (cost=0.43..18967.29 rows=497992 width=4) (actual time=0.073..0.273 rows=550 loops=1)
              Index Cond: (contactgroup_id = 1)
              Heap Fetches: 0
              Buffers: shared read=6
Total runtime: 2695.301 ms


I've tried running ANALYZE but that actually reduced the limit at which things get worse. Any insight into the reasoning of the query planner would be much appreciated.

Thanks

--
Rowan Seymour | +260 964153686 | @rowanseymour

В списке pgsql-performance по дате отправления:

Предыдущее
От: Rowan Seymour
Дата:
Сообщение: [PERFORM] Query performance changes significantly depending on limit value
Следующее
От: Rowan Seymour
Дата:
Сообщение: Re: [PERFORM] Query performance changes significantly depending onlimit value