Обсуждение: Performance impact of hundreds of partitions

Поиск

Список

Период

Сортировка

Performance impact of hundreds of partitions

От

Leonardo F

Дата:

21 апреля 2010 г., 10:45:50

Hi,

increasing shared_buffers has improved *a lot* the number of inserts/second,
so my "problem" [1] is fixed.

But now I'm worried because of the sentence (Tom Lane):

"The partitioning code isn't designed to scale beyond a few dozen partitions"

Is it mainly a planning problem or an execution time problem?

I did a (very quick) test with 3000 empty partitions, and it took 0.5 secs to do
the planning for a simple select with a very simple where condition.
With 300 partitions, planning takes about 30ms.

That's fine for my case, as I don't expect more than 300 partitions; and I could
actually wait for .5 secs more if that helps with such large tables, and I won't
be doing joins.


So: the "scaling" problem would be more evident in case joins were taken into
account? Or there's something else I didn't get?



[1] http://archives.postgresql.org/pgsql-general/2010-04/msg00611.php

Re: Performance impact of hundreds of partitions

От

Rene Schickbauer

Дата:

21 апреля 2010 г., 12:54:50

Leonardo F wrote:

> Is it mainly a planning problem or an execution time problem?

I have here a database with a table partitioned across 400 sub-tables.

I'm using a trigger-based solution with constraint exclusion. The thing
that takes the longest is planning queries. I made THAT problem just go
away for the most part by using cached queries (only works within the
same database connection, but thats no problem for me).

I also tried a rule-based partitioning, that indeed breaks down quickly
performance wise.

Also, the trigger is a ON INSERT AND UPDATE, and only on the main table.
If your main insert/update process knows into which partition to insert,
you can speed that up even more; while still beeing able to use the
automated partitioning for everything else.

LG
Rene

Re: Performance impact of hundreds of partitions

От

Vick Khera

Дата:

21 апреля 2010 г., 14:20:44

On Wed, Apr 21, 2010 at 6:45 AM, Leonardo F <m_lists@yahoo.it> wrote:
> "The partitioning code isn't designed to scale beyond a few dozen partitions"
>
> Is it mainly a planning problem or an execution time problem?
>

I'll bet that is related to the planning and constraint exclusion
parts.  I have a couple of tables split into 100 partitions, and they
work extremely well.  However, I was able to alter my application such
that it almost always references the correct partition directly.  The
only times it does not is when it requires a full scan of all
partitions.  All inserts are direct to proper partition.

In my view, it is a big win to partition large tables such that each
partition holds no more than 5 million rows.  This keeps the indexes
small, and the query engine can quite easily skip huge hunks of them
on many queries.  Also, reindexes can be done pretty quickly and in my
case without seriously disrupting the application -- each table
reindexes in under 5 or 10 seconds.  When this was all one table, a
reindex op would lock up the application for upwards of two hours.

Re: Performance impact of hundreds of partitions

От

Alvaro Herrera

Дата:

21 апреля 2010 г., 20:12:58

Vick Khera wrote:
> On Wed, Apr 21, 2010 at 6:45 AM, Leonardo F <m_lists@yahoo.it> wrote:
> > "The partitioning code isn't designed to scale beyond a few dozen partitions"
> >
> > Is it mainly a planning problem or an execution time problem?
> >
>
> I'll bet that is related to the planning and constraint exclusion
> parts.  I have a couple of tables split into 100 partitions, and they
> work extremely well.

Keep in mind that 100 is only 8 dozen ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Performance impact of hundreds of partitions

От

Leonardo F

Дата:

22 апреля 2010 г., 12:29:43

> The thing that takes the longest is planning queries. I made THAT problem just
> go away for the most part by using cached queries (only works within the same
> database connection, but thats no problem for me).


What do you mean by "cached queries"? Prepared statements?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Performance impact of hundreds of partitions

Performance impact of hundreds of partitions

Re: Performance impact of hundreds of partitions

Re: Performance impact of hundreds of partitions

Re: Performance impact of hundreds of partitions

Re: Performance impact of hundreds of partitions