Обсуждение: [PERFORM] Please help with a slow query: there are millions of records, whatcan we do?

Поиск
Список
Период
Сортировка

[PERFORM] Please help with a slow query: there are millions of records, whatcan we do?

От
Pat Maddox
Дата:
Hi there,

I’ve been asked to help with a project dealing with slow queries. I’m brand new to the project, so I have very little context. I’ve gathered as much information as I can.

I’ve put the schema, query, and explain info in gists to maintain their formatting.

We are stumped with this slow query right now. I could really use some help looking for ways to speed it up.

If you need any more information, please let me know.

Thanks,
Pat


Full Table and Index Schema

permissions schema

Table Metadata

tasks count: 8.8 million
tasks count where assigned_to_user_id is null: 2.7 million
tasks table has lots of new records added, individual existing records updated (e.g. to mark them complete)
permissions count: 4.4 million

EXPLAIN (ANALYZE, BUFFERS)




Postgres version

PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit

History

Slow query has gotten steadily worse over the past few months.

Hardware / Maintenance Setup / WAL Configuration / GUC Settings


Cache size: 3.5 GB
Storage limit: 256 GB
Connection limit: 400

work_mem: 30MB
checkpoint_segments: 40
wal_buffers: 16MB

Re: [PERFORM] Please help with a slow query: there are millions ofrecords, what can we do?

От
Albe Laurenz
Дата:
Pat Maddox wrote:
> I’ve been asked to help with a project dealing with slow queries. I’m brand new to the project, so I
> have very little context. I’ve gathered as much information as I can.
> 
> I’ve put the schema, query, and explain info in gists to maintain their formatting.
> 
> We are stumped with this slow query right now. I could really use some help looking for ways to speed
> it up.

I don't know if the plan can be improved; it has to retrieve and sort 347014 rows,
most of which are read from diak, so it will take some time.

One thing I notice is that some statistics seem to be bad (the estimate for
the index scan on "permissions" is off the mark), so maybe you can ANALYZE
both tables (perhaps with higher "default_statistics_target") and see if that
changes anything.

Is there any chance you could give the machine lots of RAM?
That would speed up the bitmap heap scan (but not the sort).

Yours,
Laurenz Albe

Re: [PERFORM] Please help with a slow query: there are millions ofrecords, what can we do?

От
Jeff Janes
Дата:
On Tue, Mar 7, 2017 at 6:26 PM, Pat Maddox <pat@adorable.io> wrote:
Hi there,

I’ve been asked to help with a project dealing with slow queries. I’m brand new to the project, so I have very little context. I’ve gathered as much information as I can.

I’ve put the schema, query, and explain info in gists to maintain their formatting.

We are stumped with this slow query right now. I could really use some help looking for ways to speed it up.

If you need any more information, please let me know.


You could try a partial index on:

(account_id, completed_at desc, taskable_name, position, assigned_to_user_id) where "tasks"."archived" != 't' AND "tasks"."complete" = 't'

Also, the poor estimate of the number of rows on your scan of index_permissions_on_user_id_and_object_id_and_object_type suggests that you are not analyzing (and so probably also not vacuuming) often enough.

Cheers,

Jeff

Re: [PERFORM] Please help with a slow query: there are millions ofrecords, what can we do?

От
Pat Maddox
Дата:
> On Mar 8, 2017, at 11:00 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> You could try a partial index on:
>
> (account_id, completed_at desc, taskable_name, position, assigned_to_user_id) where "tasks"."archived" != 't' AND
"tasks"."complete"= 't' 
>
> Also, the poor estimate of the number of rows on your scan of
index_permissions_on_user_id_and_object_id_and_object_typesuggests that you are not analyzing (and so probably also not
vacuuming)often enough. 

Thanks for this. So here’s a quick update…

I removed all the indexes that are there and added one on:

(account_id, taskable_type, taskable_id, assigned_to_user_id, archived, complete, completed_at, due_on)

We search for tasks that are complete or incomplete, so we wouldn’t want a partial index there… but I _think_ changing
theindex to be partial where archived != ’t’ would be beneficial; I’ll have to look. As of today, only about 10% of the
tasksare archived=’t’ – though that’s still ~1 million rows at this point. 

That helped the query plans big time, and adding more RAM so the indexes fit in memory instead of swapping led to major
improvements.

So thank you for the suggestions :)

I’ve manually vacuumed and analyzed a few times, and the estimates are always pretty far off. How do you suggest
increasingthe stats for the table? Just increase it, vacuum, and see if the stats look better? 

Thanks,
Pat