Re: [PERFORM] Query performance changes significantly depending onlimit value

Поиск

Список

Период

Сортировка

От	Pavel Stehule
Тема	Re: [PERFORM] Query performance changes significantly depending onlimit value
Дата	23 февраля 2017 г. 21:35:53
Msg-id	CAFj8pRCZPknj_3rweVXyYx-3vnwbSC2iog8KZ3LXiuYZ-U6H-w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PERFORM] Query performance changes significantly depending onlimit value (Rowan Seymour <rowanseymour@gmail.com>)
Ответы	Re: [PERFORM] Query performance changes significantly depending onlimit value (Rowan Seymour <rowanseymour@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

2017-02-23 15:02 GMT+01:00 Rowan Seymour <rowanseymour@gmail.com>:

Hi Pavel. That suggestion gets me as far as LIMIT 694 with the fast plan then things get slow again. This is now what happens at LIMIT 695:

Limit (cost=35945.78..50034.52 rows=695 width=88) (actual time=12852.580..12854.382 rows=695 loops=1)
Buffers: shared hit=6 read=66689
-> Merge Join (cost=35945.78..56176.80 rows=998 width=88) (actual time=12852.577..12854.271 rows=695 loops=1)
Merge Cond: (contacts_contact.id = contacts_contactgroup_contacts.contact_id)
Buffers: shared hit=6 read=66689
-> Sort (cost=35944.53..35949.54 rows=2004 width=92) (actual time=12852.486..12852.577 rows=710 loops=1)
Sort Key: contacts_contact.id
Sort Method: quicksort Memory: 34327kB
Buffers: shared hit=6 read=66677
-> Hash Join (cost=6816.19..35834.63 rows=2004 width=92) (actual time=721.293..12591.204 rows=200412 loops=1)
Hash Cond: (contacts_contact.id = u0.contact_id)
Buffers: shared hit=6 read=66677
-> Seq Scan on contacts_contact (cost=0.00..25266.00 rows=1000000 width=88) (actual time=0.003..267.258 rows=1000000 loops=1)
Buffers: shared hit=1 read=15265
-> Hash (cost=6813.14..6813.14 rows=244 width=4) (actual time=714.373..714.373 rows=200412 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 7046kB
Buffers: shared hit=5 read=51412
-> HashAggregate (cost=6810.70..6813.14 rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)
Buffers: shared hit=5 read=51412
-> Bitmap Heap Scan on values_value u0 (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976 rows=200412 loops=1)
Recheck Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
Buffers: shared hit=5 read=51412
-> Bitmap Index Scan on values_value_field_string_value_contact (cost=0.00..60.47 rows=2004 width=0) (actual time=57.642..57.642 rows=200412 loops=1)
Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
Buffers: shared hit=5 read=765
-> Index Only Scan Backward using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts (cost=0.43..18967.29 rows=497992 width=4) (actual time=0.080..0.651 rows=1707 loops=1)
Index Cond: (contactgroup_id = 1)
Heap Fetches: 0
Buffers: shared read=12
Total runtime: 12863.938 ms

https://explain.depesz.com/s/nfw1

Can you explain a bit more about what you mean about " dependency between contact_field_id = 1 and upper(string_value) = 'F'::text"?

look to related node in plan

-> Hash (cost=6813.14..6813.14 rows=244 width=4) (actual time=714.373..714.373 rows=200412 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 7046kB

Buffers: shared hit=5 read=51412

-> HashAggregate (cost=6810.70..6813.14 rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)

Buffers: shared hit=5 read=51412

-> Bitmap Heap Scan on values_value u0 (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976 rows=200412 loops=1)

Recheck Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))

Buffers: shared hit=5 read=51412

There is lot of significant differences between estimation (2004) and reality (200412) - two orders - so the plan must be suboptimal

I am looking to your schema - and it is variant on EAV table - this is antippatern and for more then small returned rows it should be slow.

Regards

Pavel

Btw I created the index values_value_field_string_value_contact as

CREATE INDEX values_value_field_string_value_contact
ON values_value(contact_field_id, UPPER(string_value), contact_id DESC)
WHERE contact_field_id IS NOT NULL;

I'm not sure why it needs the contact_id column but without it the planner picks a slow approach for even smaller LIMIT values.

On 23 February 2017 at 15:32, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2017-02-23 14:11 GMT+01:00 Rowan Seymour <rowanseymour@gmail.com>:
Hi guys

I'm a bit stuck on a query that performs fantastically up to a certain limit value, after which the planner goes off in a completely different direction and performance gets dramatically worse. Am using Postgresql 9.3

You can see all the relevant schemas at http://pastebin.com/PNEqw2id and in the test database there are 1,000,000 records in contacts_contact, and about half of those will match the subquery on values_value.

The query in question is:

SELECT "contacts_contact".* FROM "contacts_contact"
INNER JOIN "contacts_contactgroup_contacts" ON ("contacts_contact"."id" = "contacts_contactgroup_contacts"."contact_id")
WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1
AND "contacts_contact"."id" IN (
SELECT U0."contact_id" FROM "values_value" U0 WHERE (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = UPPER('F'))
)
) ORDER BY "contacts_contact"."id" DESC LIMIT 222;

With that limit of 222, it performs like:

Limit (cost=3.09..13256.36 rows=222 width=88) (actual time=0.122..3.358 rows=222 loops=1)
Buffers: shared hit=708 read=63
-> Nested Loop (cost=3.09..59583.10 rows=998 width=88) (actual time=0.120..3.304 rows=222 loops=1)
Buffers: shared hit=708 read=63
-> Merge Semi Join (cost=2.65..51687.89 rows=2004 width=92) (actual time=0.103..1.968 rows=227 loops=1)
Merge Cond: (contacts_contact.id = u0.contact_id)
Buffers: shared hit=24 read=63
-> Index Scan Backward using contacts_contact_pkey on contacts_contact (cost=0.42..41249.43 rows=1000000 width=88) (actual time=0.008..0.502 rows=1117 loops=1)
Buffers: shared hit=22 read=2
-> Index Scan using values_value_field_string_value_contact on values_value u0 (cost=0.43..7934.72 rows=2004 width=4) (actual time=0.086..0.857 rows=227 loops=1)
Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
Buffers: shared hit=2 read=61
-> Index Only Scan using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts (cost=0.43..3.93 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=227)
Index Cond: ((contactgroup_id = 1) AND (contact_id = contacts_contact.id))
Heap Fetches: 0
Buffers: shared hit=684
Total runtime: 3.488 ms

https://explain.depesz.com/s/iPPJ

But if increase the limit to 223 then it performs like:

Limit (cost=8785.68..13306.24 rows=223 width=88) (actual time=2685.830..2686.534 rows=223 loops=1)
Buffers: shared hit=767648 read=86530
-> Merge Join (cost=8785.68..29016.70 rows=998 width=88) (actual time=2685.828..2686.461 rows=223 loops=1)
Merge Cond: (contacts_contact.id = contacts_contactgroup_contacts.contact_id)
Buffers: shared hit=767648 read=86530
-> Sort (cost=8784.44..8789.45 rows=2004 width=92) (actual time=2685.742..2685.804 rows=228 loops=1)
Sort Key: contacts_contact.id
Sort Method: quicksort Memory: 34327kB
Buffers: shared hit=767648 read=86524
-> Nested Loop (cost=6811.12..8674.53 rows=2004 width=92) (actual time=646.573..2417.291 rows=200412 loops=1)

There is pretty bad estimation probably due dependency between contact_field_id = 1 and upper(string_value) = 'F'::text

The most simple solution is disable nested loop - set enable_nestloop to off

Regards

Pavel

Buffers: shared hit=767648 read=86524
-> HashAggregate (cost=6810.70..6813.14 rows=244 width=4) (actual time=646.532..766.200 rows=200412 loops=1)
Buffers: shared read=51417
-> Bitmap Heap Scan on values_value u0 (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709 rows=200412 loops=1)
Recheck Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
Buffers: shared read=51417
-> Bitmap Index Scan on values_value_field_string_value_contact (cost=0.00..60.47 rows=2004 width=0) (actual time=70.647..70.647 rows=200412 loops=1)
Index Cond: ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
Buffers: shared read=770
-> Index Scan using contacts_contact_pkey on contacts_contact (cost=0.42..7.62 rows=1 width=88) (actual time=0.007..0.007 rows=1 loops=200412)
Index Cond: (id = u0.contact_id)
Buffers: shared hit=767648 read=35107
-> Index Only Scan Backward using contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts (cost=0.43..18967.29 rows=497992 width=4) (actual time=0.073..0.273 rows=550 loops=1)
Index Cond: (contactgroup_id = 1)
Heap Fetches: 0
Buffers: shared read=6
Total runtime: 2695.301 ms

https://explain.depesz.com/s/gXS

I've tried running ANALYZE but that actually reduced the limit at which things get worse. Any insight into the reasoning of the query planner would be much appreciated.

Thanks

--
Rowan Seymour | +260 964153686 | @rowanseymour

--
Rowan Seymour | +260 964153686 | @rowanseymour

В списке pgsql-performance по дате отправления:

Предыдущее

От: Rowan Seymour
Дата: 23 февраля 2017 г., 20:02:27
Сообщение: Re: [PERFORM] Query performance changes significantly depending onlimit value

Следующее

От: Dinesh Chandra 12108
Дата: 23 февраля 2017 г., 22:21:01
Сообщение: [PERFORM] How to log quires which are taking time in PostgreSQL 9.1.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PERFORM] Query performance changes significantly depending onlimit value

Предыдущее

Следующее