Обсуждение: Planner performance in partitions

Поиск
Список
Период
Сортировка

Planner performance in partitions

От
Piotr Włodarczyk
Дата:
Hello,

We have partitioned tables in two levels. Both stages are partitioned in
ranges method. We see that planner and executor time was 10 time slower when
we asked main table rather than partitioned. My question is did planner and
executor are working optimal? I have doubts about it. Let's consider that
situation, because I think that in some ways (I think in more cases) planner
shoudl be more optimal. When we have table partitioned by key which is in
"where" clause we have guarantee that all rows we can find just in ONE
partition: contained in range or in default if exists. Planner show that
query should be executed just in one partition, but it takes a lot of time.
So my direct question is if planner is stopping searching for another
partition when he found one correct? Are partition ranges stored sorted and
searching method is optimal and is stopped after first (and only) hit?

I've noticed that increasing numbers of partition proportionally increase
planner time. Additionally having index on column that you are searching for
is adding extra time (one index add +/- 100% time for table). It's
understandable but optimizing planner and executor by ideas I wrote on first
paragraph automatically decrease time for searching indexes.

Reproduction:
1. ADD MAIN TABLE

-- Table: public.book

--DROP TABLE public.book;

CREATE TABLE public.book
(
    id bigserial,
    id_owner bigint NOT NULL,
    added date NOT NULL
) PARTITION BY RANGE (id_owner) 
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public.book
    OWNER to postgres;

2. ADD PARTITIONS (run first "a" variant, then drop table book, reconstruct
and run "variant"):
    a. 1200 partitions:
https://gist.github.com/piotrwlodarczyk/4faa05729d1bdd3b5f5738a2a3faabc0 
    b. 6000 partitions:
https://gist.github.com/piotrwlodarczyk/2747e0984f521768f5d36ab2b382ea36 
    
3. ANALYZE ON MAIN TABLE:
    EXPLAIN ANALYZE SELECT * FROM public.book WHERE id_owner = 4;
    a. My result for 1200 partitions:
https://gist.github.com/piotrwlodarczyk/500f20a0b6e2cac6d36ab88d4fea2c00 
    b. My result for 6000 partitions:
https://gist.github.com/piotrwlodarczyk/277687b21201340377116a18a3dd8be8

4. ANALYZE ON PARTITIONED TABLE (only on first level):
  EXPLAIN ANALYZE SELECT * FROM public.book WHERE id_owner = 4;
  a. My result for 1200:
https://gist.github.com/piotrwlodarczyk/4285907c68b34b486cbf39eb8ae5cf92
  b. My result for 6000:
https://gist.github.com/piotrwlodarczyk/c157cc9321b6e1a1d0f900310f14f1cc

4. CONCLUSIONS
    Planner time for select on public.book (main table) 1200 was 469.416 ms,
for 6000 was 2530.179 ms. It looks like time is linear to partition count.
That makes me sure that all partitions are checked instead of searching for
first that equals. Intentionally I've searching id_owner = 4 to make sure
that in both cases first partition should by marked as correct and planer
time should be constant. What is intereting too that real execution time was
+/- equal in both cases. Is executor working better than planner?
  When we're asking on first level partition directly - time for planner
1200 is 58.736 ms, for 6000: 60.555 ms. We can say it's equal. Why? Because
planner don't have to search for another matching partitions because first
found can match. It's guaranteed by rule that say ranges in partitions
cannot override. Execution time in this case is 50 times faster! 




Вложения

Re: Planner performance in partitions

От
Michael Lewis
Дата:
"It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher as more partitions are added." (emphasis added)

Re: Planner performance in partitions

От
MichaelDBA
Дата:
Queries against tables with a lot of partitions (> 1000) start to incur an increasing planning time duration even with the current version, V11.  V12 purportedly has fixed this problem, allowing thousands of partitioned tables without a heavy planning cost.  Can't seem to find the threads on this topic, but there are out there.  I personally noted a gigantic increase in planning time once I got past 1500 partitioned tables in V11.

On another note, hopefully they have fixed runtime partition pruning in V12 since V11 introduced it but some query plans don't use it, so you have to reconstruct some queries to sub queries to make it work correctly.

Regards,
Michael Vitale


Michael Lewis wrote on 8/12/2019 3:05 PM:
"It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher as more partitions are added." (emphasis added)

ODP: Planner performance in partitions

От
Piotr Włodarczyk
Дата:

@Michael Lewis: I know documentation. I'm just considerations about possible performance tricks in current production version. I've tested this on V12 on another computer and I can say that I'm impressed. I've checked on 1200 partitions and times are:

 

PostgreSQL11.5:

  • select on main partition (public.book): planner: 60ms, execution: 5ms
  • select on partitioned table (public.book_1-1000): planner: 2.7 ms, execution: 2,4 ms

PostgreSQL 12B3:

  • select on main partition (public.book): planner: 2,5ms , execution: 1,2ms
  • select on partitioned table (public.book_1-1000): planner: 2.5 ms, execution: 1,2 ms

 

So looking at above results we have two options:

  • Wait for 12.0 stable version 😉
  • Wait for patches to 11 – PostgreSQL Team: can You do this? 😊

 

Pozdrawiam,

Piotr Włodarczyk

 

Od: MichaelDBA
Wysłano: poniedziałek, 12 sierpnia 2019 21:25
Do: Michael Lewis
DW: Piotr Włodarczyk; pgsql-performance@lists.postgresql.org
Temat: Re: Planner performance in partitions

 

Queries against tables with a lot of partitions (> 1000) start to incur an increasing planning time duration even with the current version, V11.  V12 purportedly has fixed this problem, allowing thousands of partitioned tables without a heavy planning cost.  Can't seem to find the threads on this topic, but there are out there.  I personally noted a gigantic increase in planning time once I got past 1500 partitioned tables in V11.

On another note, hopefully they have fixed runtime partition pruning in V12 since V11 introduced it but some query plans don't use it, so you have to reconstruct some queries to sub queries to make it work correctly.

Regards,
Michael Vitale


Michael Lewis wrote on 8/12/2019 3:05 PM:

"It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher as more partitions are added." (emphasis added)

 

 

Re: Planner performance in partitions

От
Michael Lewis
Дата:
Thanks for clarifying your position and sharing the results you have seen. That is impressive indeed.

It seems likely that waiting for v12 is needed since feature are not back patched. Perhaps one of the contributors will confirm, but that is my expectation.

Re: Planner performance in partitions

От
David Rowley
Дата:
On Tue, 13 Aug 2019 at 08:03, Piotr Włodarczyk
<piotrwlodarczyk89@gmail.com> wrote:
> PostgreSQL11.5:
>
> select on main partition (public.book): planner: 60ms, execution: 5ms
> select on partitioned table (public.book_1-1000): planner: 2.7 ms, execution: 2,4 ms
>
> PostgreSQL 12B3:
>
> select on main partition (public.book): planner: 2,5ms , execution: 1,2ms
> select on partitioned table (public.book_1-1000): planner: 2.5 ms, execution: 1,2 ms
>
> So looking at above results we have two options:
>
> Wait for 12.0 stable version
> Wait for patches to 11 – PostgreSQL Team: can You do this?

You'll need to either reduce the number of partitions down to
something realistic or wait for 12.0.

The work done to speed up the planner with partitioned tables for v12
won't be going into v11.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Planner performance in partitions

От
Michael Lewis
Дата:
Was there a reason to exceed 100-500 partitions in real life that pushed you to do this test? Is there some issue you see when using 100 partitions that is solved or reduced in severity by increasing to 1200 or 6000 partitions?

ODP: Planner performance in partitions

От
Piotr Włodarczyk
Дата:

As you wrote we have about 400/500 partitions in real life. So time problem is much smaller, but still it is and in one place of aur application we have decided to help DB and we're indicating in query exact partition we need. What pushed me to do this test? Just curiosity I think. After I saw in pg_locks that all partitions which was selected in uncommitted transaction have ACCESS SHARED i've started thinking about efficiency. And that way here we are. Why we need some hundred partitions? It’s because our main table (public.book) have hundreds of millions records. It’s not maintainable. VACUUM never ends, space on device is huge and we cannot take database down for longer that 2-3 hours, what is too short to maintain them manually. So we've partitioned them on two levels. First on id_owner (which is in every query) and the second level based on date. It’ll help as detach partitions with old data we no longer need.

 

Pozdrawiam,

Piotr Włodarczyk

 

Od: Michael Lewis
Wysłano: wtorek, 13 sierpnia 2019 00:37
Do: David Rowley
DW: Piotr Włodarczyk; MichaelDBA; Piotr Włodarczyk; pgsql-performance@lists.postgresql.org
Temat: Re: Planner performance in partitions

 

Was there a reason to exceed 100-500 partitions in real life that pushed you to do this test? Is there some issue you see when using 100 partitions that is solved or reduced in severity by increasing to 1200 or 6000 partitions?