Re: Slow "not in array" operation

Поиск

Список

Период

Сортировка

От	Marco Colli
Тема	Re: Slow "not in array" operation
Дата	13 ноября 2019 г. 14:56:22
Msg-id	CAFvCgN6cqaF28M3RWFKQ__pBn-igMFmZ6-gSwjnNGGeCJHWLeA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Slow "not in array" operation (Jeff Janes <jeff.janes@gmail.com>)
Ответы	Re: Slow "not in array" operation
Список	pgsql-performance

Дерево обсуждения

> the answer is that is because it is a GIN index. Make the same index only as btree, and you should get good performance as it can filter the tags within a given project without visiting the table.

Currently I have this GIN index:

"index_subscriptions_on_project_id_and_tags" gin (project_id, tags) WHERE trashed_at IS NULL

It uses the btree_gin extension and works perfectly for tag search, except for the "NOT" operator. I don't understand why it doesn't use the GIN index also for the "NOT" operator.

The problem is that I cannot create the same index with BTree, because PG doesn't support BTree on array :(

On Wed, Nov 13, 2019 at 12:30 PM Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Nov 13, 2019 at 4:20 AM Marco Colli <collimarco91@gmail.com> wrote:
Replying to the previous questions:
- work_mem = 64MB (there are hundreds of connections)
- the project 123 has more than 7M records, and those that don't have the tag 'en' are 4.8M

What was the plan for the one that took 500ms?

This is the query / plan without the filter on tags:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=291342.67..291342.68 rows=1 width=8) (actual time=354.556..354.556 rows=1 loops=1)
-> Gather (cost=291342.05..291342.66 rows=6 width=8) (actual time=354.495..374.305 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
-> Partial Aggregate (cost=290342.05..290342.06 rows=1 width=8) (actual time=349.799..349.799 rows=1 loops=7)
-> Parallel Index Only Scan using index_subscriptions_on_project_id_and_uid on subscriptions (cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018 rows=1030593 loops=7)
Index Cond: (project_id = 123)
Heap Fetches: 280849
Planning Time: 0.753 ms
Execution Time: 374.483 ms
(10 rows)

My previous comment about the bitmap index scan taking half the time was a slip of the eye, I was comparing *cost* of the bitmap index scan to the *time* of the overall plan. But then the question is, why isn't it doing an index-only scan on "index_subscriptions_on_project_id_and_tags"? And the answer is that is because it is a GIN index. Make the same index only as btree, and you should get good performance as it can filter the tags within a given project without visiting the table.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее

От: Jeff Janes
Дата: 13 ноября 2019 г., 14:30:10
Сообщение: Re: Slow "not in array" operation

Следующее

От: Jeff Janes
Дата: 13 ноября 2019 г., 15:18:21
Сообщение: Re: Slow "not in array" operation

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Slow "not in array" operation

Предыдущее

Следующее