Обсуждение: Performance impact of hundreds of partitions

Поиск
Список
Период
Сортировка

Performance impact of hundreds of partitions

От
Leonardo F
Дата:
Hi,

increasing shared_buffers has improved *a lot* the number of inserts/second,
so my "problem" [1] is fixed.

But now I'm worried because of the sentence (Tom Lane):

"The partitioning code isn't designed to scale beyond a few dozen partitions"

Is it mainly a planning problem or an execution time problem?

I did a (very quick) test with 3000 empty partitions, and it took 0.5 secs to do
the planning for a simple select with a very simple where condition.
With 300 partitions, planning takes about 30ms.

That's fine for my case, as I don't expect more than 300 partitions; and I could
actually wait for .5 secs more if that helps with such large tables, and I won't
be doing joins.


So: the "scaling" problem would be more evident in case joins were taken into
account? Or there's something else I didn't get?



[1] http://archives.postgresql.org/pgsql-general/2010-04/msg00611.php





Re: Performance impact of hundreds of partitions

От
Rene Schickbauer
Дата:
Leonardo F wrote:

> Is it mainly a planning problem or an execution time problem?

I have here a database with a table partitioned across 400 sub-tables.

I'm using a trigger-based solution with constraint exclusion. The thing
that takes the longest is planning queries. I made THAT problem just go
away for the most part by using cached queries (only works within the
same database connection, but thats no problem for me).

I also tried a rule-based partitioning, that indeed breaks down quickly
performance wise.

Also, the trigger is a ON INSERT AND UPDATE, and only on the main table.
If your main insert/update process knows into which partition to insert,
you can speed that up even more; while still beeing able to use the
automated partitioning for everything else.

LG
Rene

Re: Performance impact of hundreds of partitions

От
Vick Khera
Дата:
On Wed, Apr 21, 2010 at 6:45 AM, Leonardo F <m_lists@yahoo.it> wrote:
> "The partitioning code isn't designed to scale beyond a few dozen partitions"
>
> Is it mainly a planning problem or an execution time problem?
>

I'll bet that is related to the planning and constraint exclusion
parts.  I have a couple of tables split into 100 partitions, and they
work extremely well.  However, I was able to alter my application such
that it almost always references the correct partition directly.  The
only times it does not is when it requires a full scan of all
partitions.  All inserts are direct to proper partition.

In my view, it is a big win to partition large tables such that each
partition holds no more than 5 million rows.  This keeps the indexes
small, and the query engine can quite easily skip huge hunks of them
on many queries.  Also, reindexes can be done pretty quickly and in my
case without seriously disrupting the application -- each table
reindexes in under 5 or 10 seconds.  When this was all one table, a
reindex op would lock up the application for upwards of two hours.

Re: Performance impact of hundreds of partitions

От
Alvaro Herrera
Дата:
Vick Khera wrote:
> On Wed, Apr 21, 2010 at 6:45 AM, Leonardo F <m_lists@yahoo.it> wrote:
> > "The partitioning code isn't designed to scale beyond a few dozen partitions"
> >
> > Is it mainly a planning problem or an execution time problem?
> >
>
> I'll bet that is related to the planning and constraint exclusion
> parts.  I have a couple of tables split into 100 partitions, and they
> work extremely well.

Keep in mind that 100 is only 8 dozen ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Performance impact of hundreds of partitions

От
Leonardo F
Дата:
> The thing that takes the longest is planning queries. I made THAT problem just
> go away for the most part by using cached queries (only works within the same
> database connection, but thats no problem for me).


What do you mean by "cached queries"? Prepared statements?