Re: Should we warn against using too many partitions?

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Should we warn against using too many partitions?
Дата
Msg-id CAKJS1f9AD5HwvpwAci5Ha+jHfnjTPetiTrG5-=pUV5jw4ht=wA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Should we warn against using too many partitions?  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Should we warn against using too many partitions?  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
Thanks for having another look.

On Sat, 8 Jun 2019 at 18:39, Justin Pryzby <pryzby@telsasoft.com> wrote:
> +   <para>
> +    The choice of how to partition a table should be made carefully as the
> +    performance of query planning and execution can be negatively affected by
> +    poorly made design decisions.
>
> Maybe just "poor design"

changed

> +    partitioned table.  <literal>WHERE</literal> clause items that match and
> +    are compatible with the partition key can be used to prune away unneeded
>
> remove "away" ?

removed

> +    requirements for the <literal>PRIMARY KEY</literal> or a
> +    <literal>UNIQUE</literal> constraint.  Removal of unwanted data is also
> +    a factor to consider when planning your partitioning strategy as an entire
> +    partition can be removed fairly quickly.  However, if data that you want
>
> Can we just say "dropped" ?  On my first (re)reading, I briefly thought this
> was now referring to "pruning" as "removal".

I used removed because that could be done via DROP TABLE or by DETACH
PARTITION. If I change it to "dropped" then it sounds like we might
only mean DROP TABLE.  I've reworded to use "detached" instead.

> +    to keep exists in that partition then that means having to resort to using
> +    <command>DELETE</command> instead of removing the partition.
> +   </para>
> +
> +   <para>
> +    Choosing the target number of partitions by which the table should be
> +    divided into is also a critical decision to make.  Not having enough
>
> Should be: ".. target number .. into which .. should be divided .."

I've changed "by" to "into". I think that's what you mean, otherwise,
you've lost me.

> +    partitions may mean that indexes remain too large and that data locality
> +    remains poor which could result in poor cache hit ratios.  However,
>
> Change the 2nd remains to "is" and the second poor to "low" ?

An internet search on "low cache hit ratio" turns up about twice as
many results as "poor cache hit ratio", but both seem fine to me.
However, since the search seems to show more for the former, I change
it to that.

> +    dividing the table into too many partitions can also cause issues.
> +    Too many partitions can mean slower query planning times and higher memory
>
> s/slower/longer/

changed

> +    consumption during both query planning and execution.  It's also important
> +    to consider what changes may occur in the future when choosing how to
> +    partition your table.  For example, if you choose to have one partition
>
> Remove "when choosing ..."?  Or say:

I don't see how that would make sense.

> |When choosing how to partition your table, it's also important to consider
> |what changes may occur in the future.

Changed to that.

> +    per customer and you currently have a small number of large customers,
> +    what will the implications be if in several years you obtain a large
> +    number of small customers.  In this case, it may be better to choose to
> +    partition by <literal>HASH</literal> and choose a reasonable number of
> +    partitions rather than trying to partition by <literal>LIST</literal> and
> +    hoping that the number of customers does not increase significantly over
> +    time.
> +   </para>
>
> It's an unusual thing for which to hope :)

I have reworded this slightly which may help with that.

> +   <para>
> +    Sub-partitioning can be useful to further divide partitions that are
> +    expected to become larger than other partitions, although excessive
> +    sub-partitioning can easily lead to large numbers of partitions and can
> +    cause the problems mentioned in the preceding paragraph.
> +   </para>
>
> cause the SAME problems ?

Added

> +    It is also important to consider the overhead of partitioning during
> +    query planning and execution.  The query planner is generally able to
> +    handle partition hierarchies up a few thousand partitions fairly well,
> +    provided that typical queries prune all but a small number of partitions
> +    during query planning.  Planning times become slower and memory
>
> s/slower/longer/

Changed

> Hm, maybe say "typical queries ALLOW PRUNNING .."
>
> +    consumption becomes higher when more partitions remain after the planner
> +    performs partition pruning.  This is particularly true for the
>
> Just say: "remain after planning" ?

I've changed this around, but not really how you've asked.

> +    <command>UPDATE</command> and <command>DELETE</command> commands.  Also,
> +    even if most queries are able to prune a large number of partitions during
> +    query planning, it still may be undesirable to have a large number of
>
> may still ?

This has been rewritten per Amit's review.

> +   <para>
> +    With data warehouse type workloads it can make sense to use a larger
> +    number of partitions than with an OLTP type workload.  Generally, in data
> +    warehouses, query planning time is less of a concern as the majority of
>
> VAST majority?  Or "essentially all"?  Or " .. query planning time is
> insignificant compared to the time spent during query execution.

I don't see any benefit in raising the significance of that.

> +    processing time is spent during query execution.  With either of these two
> +    types of workload it is important to make the right decisions early as
>
> early COMMA

I'm not really sure what you mean here as I don't see any comma in
that text. I guess you want me to add one? But I'm confused as you
seemed to ask me to remove a comma there in your previous review.

You wrote:
>>+    these two types of workload, it is important to make the right decisions
>>+    early as re-partitioning large quantities of data can be painstakingly

> early COMMA ?

Can you be more precise to the exact problem that you see with the
text? In the meantime, I've put the comma back where it was in the
original patch.

I've attached the updated patches.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: Binary support for pgoutput plugin
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Quick doc typo fix