Re: Very long query planning times for database with lots of partitions

Поиск
Список
Период
Сортировка
От Mickael van der Beek
Тема Re: Very long query planning times for database with lots of partitions
Дата
Msg-id CAEQRsAfD9CSCWcS=_K0Pe52j80+HiF69YEUvPtE10KPvbDnFOQ@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Very long query planning times for database with lots ofpartitions  (Steven Winfield <Steven.Winfield@cantabcapital.com>)
Список pgsql-performance
Thank both of you for your quick answers,

@Justin Based on your answer it would seem to confirm that partitioning or at least partitioning this much is not the correct direction to take.
The reason I originally wanted to use partitioning was that I'm storing a multi-tenant graph and that as the data grew, so did the indexes and once they were larger than the available RAM, query performance went down the drain.
The two levels of partitioning let me create one level for the tenant-level partitioning and one level for the business logic where I could further partition the tables into the different types of nodes and edges I was storing.
(The table_a and table_b in my example query. There is also a table_c which connect table_a and table_b but I wanted to keep it simple.)
Another reason was that we do regular, automated cleanups of the data and dropping all the data (hundreds of thousands of rows) for a tenant is very fast with DROP TABLE of a partition and rather slow with a regular DELETE query (even if indexed).
With the redesign of the database schema (that included the partitioning changes), I also dramatically reduced the amounts and size of data per row on the nodes and edges by storing the large and numerous metadata fields on separate tables that are not part of the graph traversal process.
Based on the usage number I see, I would expect around 12K tenants in the medium future which means that even partitioning per tenant on those two tables would lead to 24K partitions which is way above your approximate limit of 1K partitions.
Queries are always limited to one tenant's data which was one of the motivations behind partitioning in the first place.
Not sure what you would advise in this case for a multi-tenant graph?

@Steven, yes, constaint_exclusion is set to the default value of 'partition'.
The EXPLAIN ANALYZE output also successfully prunes the partitions correctly.
So the query plan looks sounds and the query execution confirms this.
But reaching that point is really what the issue is for me.

 

On Tue, Jan 22, 2019 at 3:07 PM Steven Winfield <Steven.Winfield@cantabcapital.com> wrote:

Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?

If so, does the EXPLAIN output mention all of your parent partitions, or are some being successfully pruned?

Planning times can be sped up significantly if the planner can exclude parent partitions, without ever having to examine the constraints of the child (and grandchild) partitions. If this is not the case, take another look at your query and try to figure out why the planner might believe a parent partition cannot be outright disregarded from the query – does the query contain a filter on the parent partitions’ partition key, for example?

 

I believe Timescaledb has its own query planner optimisations for discarding partitions early at planning time.

 

Good luck,

Steve.





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice

В списке pgsql-performance по дате отправления:

Предыдущее
От: Steven Winfield
Дата:
Сообщение: RE: Very long query planning times for database with lots ofpartitions
Следующее
От: Jan Nielsen
Дата:
Сообщение: SELECT performance drop