Re: [PERFORM] Query > 1000× slowdown after adding datetime comparison

Поиск

Список

Период

Сортировка

От	David G. Johnston
Тема	Re: [PERFORM] Query > 1000× slowdown after adding datetime comparison
Дата	31 августа 2015 г. 19:35:13
Msg-id	CAKFQuwYsBZwfJrSVFQ7TOWxC1V=1PqZpV3Gn+O7z34UUQg3xvA@mail.gmail.com обсуждение исходный текст
Ответ на	Query > 1000× slowdown after adding datetime comparison (twoflower <standa.kurik@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

On Mon, Aug 31, 2015 at 12:09 PM, twoflower <standa.kurik@gmail.com> wrote:

I have the following three tables:

DOCUMENT
id (index)
documenttype
date_last_updated: timestamp(6) (indexed)

EXTERNAL_TRANSLATION_UNIT
id (indexed)
fk_id_document (indexed)

EXTERNAL_TRANSLATION
id (indexed)
fk_id_translation_unit (indexed)

Table sizes:
DOCUMENT: 381 000
EXTERNAL_TRANSLATION_UNIT: 76 000 000
EXTERNAL_TRANSLATION: 76 000 000

Now the following query takes about 36 minutes to finish:

SELECT u.id AS id_external_translation_unit,
r.id AS id_external_translation,
u.fk_id_language AS fk_id_source_language,
r.fk_id_language AS fk_id_target_language,
doc.fk_id_job
FROM "EXTERNAL_TRANSLATION_UNIT" u
JOIN "DOCUMENT" doc ON u.fk_id_document = doc.id
JOIN "EXTERNAL_TRANSLATION" r ON u.id = r.fk_id_translation_unit
WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval)
ORDER BY r.id LIMIT 1000

This is the query plan:

<http://postgresql.nabble.com/file/n5864045/qp1.png>

If I remove the WHERE condition, it returns immediately.

So does "SELECT 1;" - but since that doesn't give the same answer it is not very relevant.

Am I doing something obviously wrong?

Not obviously...

Thank you for any ideas.

Consider updating the translation tables at the same time the document table is updated. That way you can apply the WHERE and ORDER BY clauses against the same table.

I presume you've run ANALYZE on the data.

I would probably try something like:

WITH docs AS ( SELECT ... WHERE date > ...)

SELECT ... FROM (translations join translation_unit) t

WHERE EXISTS (SELECT 1 FROM docs WHERE t.doc_id = docs.doc_id)

ORDER BY t.id LIMIT 1000

You are trying to avoid the NESTED LOOP and the above has a decent chance of materializing docs and then building either a bit or hash map for both docs and translations thus performing a single sequential scan over both instead of performing 70+ million index lookups.

Take this with a grain of salt as my fluency in this area is limited - I tend to work with trial-and-error but without data that is difficult.

I'm not sure if the planner could be smarter because you are asking a question it is not particularly suited to estimating - namely cross-table correlations. Rethinking the model is likely to give you a better outcome long-term though it does seem like there should be room for improvement within the stated query and model.

As Tomas said you likely will benefit from increased working memory in order to make materializing and hashing/bitmapping favorable compared to a nested loop.

David J.

В списке pgsql-performance по дате отправления:

Предыдущее

От: Tomas Vondra
Дата: 31 августа 2015 г., 19:19:43
Сообщение: Re: Query > 1000× slowdown after adding datetime comparison

Следующее

От: twoflower
Дата: 31 августа 2015 г., 22:03:56
Сообщение: Re: Query > 1000× slowdown after adding datetime comparison

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PERFORM] Query > 1000× slowdown after adding datetime comparison

Предыдущее

Следующее