Обсуждение: Query > 1000× slowdown after adding datetime comparison

Поиск

Список

Период

Сортировка

Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

31 августа 2015 г., 19:09:20

I have the following three tables:

DOCUMENT
  id (index)
  documenttype
  date_last_updated: timestamp(6) (indexed)

EXTERNAL_TRANSLATION_UNIT
  id (indexed)
  fk_id_document (indexed)

EXTERNAL_TRANSLATION
  id (indexed)
  fk_id_translation_unit (indexed)

Table sizes:
 DOCUMENT: 381 000
 EXTERNAL_TRANSLATION_UNIT: 76 000 000
 EXTERNAL_TRANSLATION: 76 000 000

Now the following query takes about 36 minutes to finish:

 SELECT u.id AS id_external_translation_unit,
    r.id AS id_external_translation,
    u.fk_id_language AS fk_id_source_language,
    r.fk_id_language AS fk_id_target_language,
    doc.fk_id_job
 FROM "EXTERNAL_TRANSLATION_UNIT" u
     JOIN "DOCUMENT" doc ON u.fk_id_document = doc.id
     JOIN "EXTERNAL_TRANSLATION" r ON u.id = r.fk_id_translation_unit
 WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval)
 ORDER BY r.id LIMIT 1000

This is the query plan:

<http://postgresql.nabble.com/file/n5864045/qp1.png>

If I remove the WHERE condition, it returns immediately.

Am I doing something obviously wrong?

Thank you for any ideas.



--
View this message in context:
http://postgresql.nabble.com/Query-1000-slowdown-after-adding-datetime-comparison-tp5864045.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Query > 1000× slowdown after adding datetime comparison

От

Tomas Vondra

Дата:

31 августа 2015 г., 19:19:43

On 08/31/2015 06:09 PM, twoflower wrote:
> I have the following three tables:
...
> This is the query plan:
>
> <http://postgresql.nabble.com/file/n5864045/qp1.png>
>
> If I remove the WHERE condition, it returns immediately.
>
> Am I doing something obviously wrong?

Please share explain plans for both the slow and the fast query. That
makes it easier to spot the difference, and possibly identify the cause.

Also, what PostgreSQL version is this, and what are "basic" config
parameters (shared buffers, work mem)?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PERFORM] Query > 1000× slowdown after adding datetime comparison

От

"David G. Johnston"

Дата:

31 августа 2015 г., 19:35:13

On Mon, Aug 31, 2015 at 12:09 PM, twoflower <standa.kurik@gmail.com> wrote:

I have the following three tables:

DOCUMENT
id (index)
documenttype
date_last_updated: timestamp(6) (indexed)

EXTERNAL_TRANSLATION_UNIT
id (indexed)
fk_id_document (indexed)

EXTERNAL_TRANSLATION
id (indexed)
fk_id_translation_unit (indexed)

Table sizes:
DOCUMENT: 381 000
EXTERNAL_TRANSLATION_UNIT: 76 000 000
EXTERNAL_TRANSLATION: 76 000 000

Now the following query takes about 36 minutes to finish:

SELECT u.id AS id_external_translation_unit,
r.id AS id_external_translation,
u.fk_id_language AS fk_id_source_language,
r.fk_id_language AS fk_id_target_language,
doc.fk_id_job
FROM "EXTERNAL_TRANSLATION_UNIT" u
JOIN "DOCUMENT" doc ON u.fk_id_document = doc.id
JOIN "EXTERNAL_TRANSLATION" r ON u.id = r.fk_id_translation_unit
WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval)
ORDER BY r.id LIMIT 1000

This is the query plan:

<http://postgresql.nabble.com/file/n5864045/qp1.png>

If I remove the WHERE condition, it returns immediately.

So does "SELECT 1;" - but since that doesn't give the same answer it is not very relevant.

Am I doing something obviously wrong?

Not obviously...

Thank you for any ideas.

Consider updating the translation tables at the same time the document table is updated. That way you can apply the WHERE and ORDER BY clauses against the same table.

I presume you've run ANALYZE on the data.

I would probably try something like:

WITH docs AS ( SELECT ... WHERE date > ...)

SELECT ... FROM (translations join translation_unit) t

WHERE EXISTS (SELECT 1 FROM docs WHERE t.doc_id = docs.doc_id)

ORDER BY t.id LIMIT 1000

You are trying to avoid the NESTED LOOP and the above has a decent chance of materializing docs and then building either a bit or hash map for both docs and translations thus performing a single sequential scan over both instead of performing 70+ million index lookups.

Take this with a grain of salt as my fluency in this area is limited - I tend to work with trial-and-error but without data that is difficult.

I'm not sure if the planner could be smarter because you are asking a question it is not particularly suited to estimating - namely cross-table correlations. Rethinking the model is likely to give you a better outcome long-term though it does seem like there should be room for improvement within the stated query and model.

As Tomas said you likely will benefit from increased working memory in order to make materializing and hashing/bitmapping favorable compared to a nested loop.

David J.

Re: Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

31 августа 2015 г., 22:03:56

Tomas Vondra-4 wrote
> Please share explain plans for both the slow and the fast query. That
> makes it easier to spot the difference, and possibly identify the cause.
>
> Also, what PostgreSQL version is this, and what are "basic" config
> parameters (shared buffers, work mem)?

I am running 9.4.4, here are the basic config parameters:

work_mem = 32 MB
shared_buffers = 8196 MB
temp_buffers = 8 MB
effective_cache_size = 4 GB

I have run ANALYZE on all tables prior to running the queries. The query
plan for the fast version (without the WHERE clause) follows:

<http://postgresql.nabble.com/file/n5864075/qp2.png>

What I don't understand is the difference between the inner NESTED LOOP
between the slow and the fast query plan. In the fast one, both index scans
have 1000 as the actual row count. I would expect that, given the LIMIT
clause. The slow query plan, however, shows ~ 75 000 000 as the actual row
count. Is the extra WHERE condition the only and *plausible* explanation for
this difference?


David G. Johnston wrote
> I would probably try something like:
>
> WITH docs AS ( SELECT ... WHERE date > ...)
> SELECT ... FROM (translations join translation_unit) t
> WHERE EXISTS (SELECT 1 FROM docs WHERE t.doc_id = docs.doc_id)
> ORDER BY t.id LIMIT 1000

David, I tried this and it is probably as slow as the original query. It did
not finish in 5 minutes anyway.



--
View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864075.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

31 августа 2015 г., 22:19:27

And another thing which comes out as a little surprising to me - if I replace
the *date_last_updated* condition with another one, say *doc.documenttype =
4*, the query finishes immediately. *documenttype* is an unindexed integer
column.

Here's the query plan:

<http://postgresql.nabble.com/file/n5864080/qp3.png>

What's so special about that *date_last_updated* condition that makes it so
slow to use? Is it because it involves the *date()* function call that it
makes it difficult for the planner to guess the data distribution in the
DOCUMENT table?



--
View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864080.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison

От

"David G. Johnston"

Дата:

31 августа 2015 г., 22:23:00

On Mon, Aug 31, 2015 at 3:03 PM, twoflower <standa.kurik@gmail.com> wrote:

Tomas Vondra-4 wrote
> Please share explain plans for both the slow and the fast query. That
> makes it easier to spot the difference, and possibly identify the cause.
>
> Also, what PostgreSQL version is this, and what are "basic" config
> parameters (shared buffers, work mem)?

I am running 9.4.4, here are the basic config parameters:

work_mem = 32 MB
shared_buffers = 8196 MB
temp_buffers = 8 MB
effective_cache_size = 4 GB

I have run ANALYZE on all tables prior to running the queries. The query
plan for the fast version (without the WHERE clause) follows:

<http://postgresql.nabble.com/file/n5864075/qp2.png>

What I don't understand is the difference between the inner NESTED LOOP
between the slow and the fast query plan. In the fast one, both index scans
have 1000 as the actual row count. I would expect that, given the LIMIT
clause. The slow query plan, however, shows ~ 75 000 000 as the actual row
count. Is the extra WHERE condition the only and *plausible* explanation for
this difference?

In the slow query it requires evaluating every single document to determine which of the 75 million translations can be discarded; after which the first 1000 when sorted by translation id are returned.

In the first query the executor simply scans the translation index in ascending order and stops after retrieving the first 1,000.

What you are expecting, I think, is for that same process to continue beyond 1,000 should any of the first 1,000 be discarded due to the corresponding document not being updated recently enough, until 1,000 translations are identified. I'm not sure why the nested loop executor is not intelligent enough to do this...

The important number in these plans is "loops", not "rows"

David J.

Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison

От

"David G. Johnston"

Дата:

31 августа 2015 г., 22:29:50

On Mon, Aug 31, 2015 at 3:19 PM, twoflower <standa.kurik@gmail.com> wrote:

And another thing which comes out as a little surprising to me - if I replace
the *date_last_updated* condition with another one, say *doc.documenttype =
4*, the query finishes immediately. *documenttype* is an unindexed integer
column.

The only index that matters here is the pkey on document. The problem is the failure to exit the nested loop once 1,000 translations have been gathered. Translation is related to document via key - hence the nested loop. A hashing-based plan would make use of the secondary indexes but likely would not be particularly useful in this query (contrary to my earlier speculation).

Here's the query plan:

<http://postgresql.nabble.com/file/n5864080/qp3.png>

What's so special about that *date_last_updated* condition that makes it so
slow to use? Is it because it involves the *date()* function call that it
makes it difficult for the planner to guess the data distribution in the
DOCUMENT table?

What happens if you pre-compute the date condition and hard code it?

David J.

Re: Re: Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

31 августа 2015 г., 22:47:06

David G Johnston wrote
> What happens if you pre-compute the date condition and hard code it?

I created a new boolean column and filled it for every row in DOCUMENT with
*(doc.date_last_updated >= date(now() - '171:00:00'::interval))*, reanalyzed
the table and modified the query to just compare this column to TRUE. I
expected this to be very fast, considering that a (to me, anyway) similar
query also containing a constant value comparison finishes immediately.
However, the query is running now for 4 minutes already. That's really
interesting.



--
View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864088.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison

От

Stefan Keller

Дата:

31 августа 2015 г., 23:24:43

2015-08-31 21:46 GMT+02:00 twoflower <standa.kurik@gmail.com> wrote:
> I created a new boolean column and filled it for every row in DOCUMENT with
> *(doc.date_last_updated >= date(now() - '171:00:00'::interval))*, reanalyzed
> ...

... and you've put an index on that new boolean column (say "updated")?
CREATE INDEX index_name ON some_table (boolean_field);
or tried a conditional index like
CREATE INDEX index_name ON some_table (some_field) WHERE boolean_field;

-S.


2015-08-31 21:46 GMT+02:00 twoflower <standa.kurik@gmail.com>:
> David G Johnston wrote
>> What happens if you pre-compute the date condition and hard code it?
>
> I created a new boolean column and filled it for every row in DOCUMENT with
> *(doc.date_last_updated >= date(now() - '171:00:00'::interval))*, reanalyzed
> the table and modified the query to just compare this column to TRUE. I
> expected this to be very fast, considering that a (to me, anyway) similar
> query also containing a constant value comparison finishes immediately.
> However, the query is running now for 4 minutes already. That's really
> interesting.
>
>
>
> --
> View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864088.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Re: Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

31 августа 2015 г., 23:30:45

I did not. I wanted to compare this query to the one I tried before, having
*documenttype = 4* as the sole condition. That one was very fast and the
*documenttype* was not indexed either.

But this query, using the new temporary column, still runs, after 48
minutes...



--
View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864101.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison

От

Stefan Keller

Дата:

01 сентября 2015 г., 00:00:37

So, if I'm understanding you correctly, we're talking solely about
following clause in the query you gave initially:

WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval)
which initially was
WHERE documenttype = 4
and now is being replaced by a temporary (I'd say derived) column
WHERE updated
?

In any case - I have to go - but run http://explain.depesz.com/ and
give a weblink to the explain plans of your queries.

-S.


2015-08-31 22:30 GMT+02:00 twoflower <standa.kurik@gmail.com>:
> I did not. I wanted to compare this query to the one I tried before, having
> *documenttype = 4* as the sole condition. That one was very fast and the
> *documenttype* was not indexed either.
>
> But this query, using the new temporary column, still runs, after 48
> minutes...
>
>
>
> --
> View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864101.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Re: Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

01 сентября 2015 г., 10:53:50

Stefan Keller wrote
> So, if I'm understanding you correctly, we're talking solely about
> following clause in the query you gave initially:
>
> WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval)
> which initially was
> WHERE documenttype = 4
> and now is being replaced by a temporary (I'd say derived) column
> WHERE updated
> ?
>
> In any case - I have to go - but run http://explain.depesz.com/ and
> give a weblink to the explain plans of your queries.

Here's the query plan for the original query I struggled with: WHERE
doc.date_last_updated >= date(now() - '171:00:00'::interval)

The original slow query with date()
<http://explain.depesz.com/d/WEy/SFfcd5esMuuI9wrLmQrZU33ma371uW2Nlh1PmeI9ZDRiAJ4wjv>

Then, just out of curiosity, I replaced the WHERE condition with another
one: WHERE doc.documenttype = 4. documenttype is an unindexed integer
column. It's fast. Here's the query plan:

date() condition replaced with documenttype = 4
<http://explain.depesz.com/d/kHG/u1dvsDyJPBK92xnKwNT3YkQsNW6WfG9aHmWAzVSOgmXikL3hiW>

Then, as David suggested, I tried to precompute the doc.date_last_updated >=
date(now() - '171:00:00'::interval). I created a new boolean temp_eval
column on the DOCUMENT table and updated the rows like this:

update "DOCUMENT" set temp_eval = date_last_updated >= date(now() -
'171:00:00'::interval);

Then I analyzed the table and modified the query to only have WHERE
temp_eval = TRUE. I expected this to be fast, but it took 41 minutes to
complete. Here's the query plan:

Precomputed column
<http://explain.depesz.com/d/2aFw/GLqQ8CMGDZOYx5134M8eallFluervKXDfOdnaz4nZTMXYNacji>



--
View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864164.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison

От

林士博

Дата:

01 сентября 2015 г., 12:45:26

I think you should try putting the precomputed boolean temp_eval column

to "EXTERNAL_TRANSLATION" r table.

And if possible, try creating a conditional index on id where temp_eval is true,

on "EXTERNAL_TRANSLATION" r table.

So that, only check this index can get the top 1000 records.

Re: Re: Query > 1000× slowdown after adding datetime comparison

От

twoflower

Дата:

01 сентября 2015 г., 12:51:55

林士博 wrote
> I think you should try putting the precomputed boolean temp_eval column
> to "EXTERNAL_TRANSLATION" r table.
>
> And if possible, try creating a conditional index on id where temp_eval is
> true,
> on "EXTERNAL_TRANSLATION" r table.
>
> So that, only check this index can get the top 1000 records.

I agree that might help. But I would still like to understand what's the
reason for difference between the second and the third query. Both contain a
simple <column> = <constant> expression, yet one finishes immediately and
one runs for 41 minutes.




--
View this message in context:
http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864173.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison

От

林士博

Дата:

01 сентября 2015 г., 13:15:49

It depends on the values in your table.

It seems that the documenttype of all the records with the smallest 1000 ids is all 4.

So, the query ends after doing nest-loop 1000 times.

2015-09-01 18:51 GMT+09:00 twoflower <standa.kurik@gmail.com>:

林士博 wrote
> I think you should try putting the precomputed boolean temp_eval column
> to "EXTERNAL_TRANSLATION" r table.
>
> And if possible, try creating a conditional index on id where temp_eval is
> true,
> on "EXTERNAL_TRANSLATION" r table.
>
> So that, only check this index can get the top 1000 records.

I agree that might help. But I would still like to understand what's the
reason for difference between the second and the third query. Both contain a
simple <column> = <constant> expression, yet one finishes immediately and
one runs for 41 minutes.

--
View this message in context: http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864173.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance