Re: Thousands of partitions performance questions

Поиск
Список
Период
Сортировка
От Shai Cantor
Тема Re: Thousands of partitions performance questions
Дата
Msg-id CABJyNM49eLiM-PGRSxnqb07gGFxx+gzsEpvpMNSjoX+mu_2Chw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Thousands of partitions performance questions  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Thousands of partitions performance questions  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-general
Thanks a lot for your feedback.
We're gonna user PG 11 on AWS RDS.
I will do some workload simulations.
Some notes I haven't shared or were not clear enough on the previous post.
  1. Data is inserted using the copy command only and in an offline manner. Meaning, no user action creates or updates the data. An offline job runs is. Number of inserts can reach up to 1500 a day.
  2. Queries are only on a single partition

  • Can that ease the performance?
  • Will querying directly the partition tables help?

On Mon, Apr 29, 2019 at 8:46 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Mon, 29 Apr 2019 at 17:13, Shai Cantor <shaicantor@gmail.com> wrote:
> Will the db hold 135000 (45000 * 3 months) partitions under the assumption that I query only 1 partition?
> Should I model it differently in terms of schema, partitions etc.?

Which PG version?

Before PG11 the query planner pruned unneeded partition by looking at
each partition and determining if the partition constraint could not
match the base quals on the query.  PG11 improved this by adding a
smarter and faster algorithm to get rid of non-matching partitions,
however, this really only speeds things up a little as it only really
allows the planner to skip generating paths for these partitions,
certain meta-data is still loaded, which is not really slow per
partition, but it is slow if you have thousands of partitions.

PG11 is also pretty bad at planning UPDATE/DELETEs to partitioned
tables with a large number of partitions. You'll at best suffer from
high planning times to plan these queries and at worst suffer out of
memory errors with that many partitions.

The yet to be released PG12 improves both of these deficiencies
providing you can eliminate the majority of partitions during query
planning. PG12 also improves the performance of INSERT into a
partitioned table, wherein PG11 a lock was obtained on every
partition, with PG12 we only grab a lock on a partition the first time
the INSERT command inserts a row into it. If you're just INSERTing 1
row per command into a partitioned table with many partitions then
this makes a pretty big difference.

Depending on the types of query you're running it's likely not a good
idea to go above 100 or so partitions with PG11. You might get away
with more if you're running a more data-warehouse type load, i.e fewer
but longer running queries, but for a more OLTP type workload, with
more queries and lower latencies, then you may struggle to cope with a
dozen.

I'd recommend you do workload simulations with whatever number you
choose and ensure performance is to the level you require before
getting in too deep with your design.

If your go-live date is near the end of the year or beyond, then it
might be a good idea to start testing with PG12 right away. The
release date for that will likely be around the middle of October.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-general по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Thousands of partitions performance questions
Следующее
От: David Rowley
Дата:
Сообщение: Re: Thousands of partitions performance questions