Обсуждение: Re: Define hash partition for certain column values

Поиск
Список
Период
Сортировка

Re: Define hash partition for certain column values

От
Tom Lane
Дата:
=?utf-8?B?0JPQvtC70YPQsdC10LLQsCDQr9C90LA=?= <ishsha@yandex.ru> writes:
> Hello,  I've found in source code that there is a function satisfies_hash_partition(oid, modulus, remainder,
column_values[])which allows to check if the certain column value will be placed in the certain partition. I' d like to
knowif there is an opportunity not to check the certain partition but to define which partition will be the certain
columnvalue placed in. 

If you want to control what goes where, use list partitioning (or,
perhaps, range partitioning).  Hash is only suitable if you do not
care which partition any particular row goes to.

Personally, I think hash partitioning is mostly academic, precisely
because of that.  If the partitioning doesn't line up with application
requirements, you give up too much of the benefit of using partitions.

            regards, tom lane



Re: Define hash partition for certain column values

От
Ron
Дата:
On 1/11/21 12:36 AM, Tom Lane wrote:
Голубева Яна <ishsha@yandex.ru> writes:
Hello,  I've found in source code that there is a function satisfies_hash_partition(oid, modulus, remainder, column_values[]) which allows to check if the certain column value will be placed in the certain partition. I' d like to know if there is an opportunity not to check the certain partition but to define which partition will be the certain column value placed in.
If you want to control what goes where, use list partitioning (or,
perhaps, range partitioning).  Hash is only suitable if you do not
care which partition any particular row goes to.

Personally, I think hash partitioning is mostly academic, precisely
because of that.  If the partitioning doesn't line up with application
requirements, you give up too much of the benefit of using partitions.

In non-MBCC systems, hash partitioning minimizes lock conflicts because the writes aren't all going into the same page.  OLTP systems can use this feature to distribute writes across pages; some also allow for "mixed pages", where records from multiple tables get written to the same page.  (This then means that one DIO is used to read a parent and all it's child records.  Naturally, range reports are very slow, but sometimes OLTP performance is paramount.)

--
Angular momentum makes the world go 'round.

Re: Define hash partition for certain column values

От
Michael Lewis
Дата:
On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна <ishsha@yandex.ru> wrote:
List or range partitioning isn't suitable for my case.
I am using a column of numeric(20) type as a base for partitioning. The values of the column are generated randomly. 
So there will be too many partitions if I use list partitioning as is.

Sorry, but why is range not suited for this? It would seem fairly trivial to create 50 or 1000 partitions to break up the range of values allowed by your field definition.

Re: Define hash partition for certain column values

От
Rob Sargent
Дата:


On 1/12/21 8:51 AM, Голубева Яна wrote:
Values for the key partitioning column are generated randomly and I can't predict their distribution between ranges.
If I just create some ranges I won't have any guarantee that partitions will have similar amount of data. It is possible that I will have 2 or 3 extremely big partitions and a bit of data in others.
 
Thank you,
Iana Golubeva


12.01.2021, 17:55, "Michael Lewis" <mlewis@entrata.com>:
On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна <ishsha@yandex.ru> wrote:
List or range partitioning isn't suitable for my case.
I am using a column of numeric(20) type as a base for partitioning. The values of the column are generated randomly. 
So there will be too many partitions if I use list partitioning as is.

Sorry, but why is range not suited for this? It would seem fairly trivial to create 50 or 1000 partitions to break up the range of values allowed by your field definition.
Doesn't sound like a good definition of random.

Re: Define hash partition for certain column values

От
Alban Hertroys
Дата:
> On 12 Jan 2021, at 16:51, Голубева Яна <ishsha@yandex.ru> wrote:
>
> Values for the key partitioning column are generated randomly and I can't predict their distribution between ranges.
> If I just create some ranges I won't have any guarantee that partitions will have similar amount of data. It is
possiblethat I will have 2 or 3 extremely big partitions and a bit of data in others. 

A hash of a random number is also random, so when using hashes for partitioning you will get the same problem.

If you want to distribute values equally over a fixed number of partitions, I suggest you partition on a modulo of a
monotonouslyincreasing number (a sequence for example), instead of relying on a random number. 

> 12.01.2021, 17:55, "Michael Lewis" <mlewis@entrata.com>:
> On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна <ishsha@yandex.ru> wrote:
> List or range partitioning isn't suitable for my case.
> I am using a column of numeric(20) type as a base for partitioning. The values of the column are generated randomly.
> So there will be too many partitions if I use list partitioning as is.
>
> Sorry, but why is range not suited for this? It would seem fairly trivial to create 50 or 1000 partitions to break up
therange of values allowed by your field definition. 

Alban Hertroys
--
There is always an exception to always.







Re: Define hash partition for certain column values

От
Michael Lewis
Дата:
On Tue, Jan 12, 2021 at 9:37 AM Alban Hertroys <haramrae@gmail.com> wrote:

> On 12 Jan 2021, at 16:51, Голубева Яна <ishsha@yandex.ru> wrote:
>
> Values for the key partitioning column are generated randomly and I can't predict their distribution between ranges.
> If I just create some ranges I won't have any guarantee that partitions will have similar amount of data. It is possible that I will have 2 or 3 extremely big partitions and a bit of data in others.

A hash of a random number is also random, so when using hashes for partitioning you will get the same problem.

If you want to distribute values equally over a fixed number of partitions, I suggest you partition on a modulo of a monotonously increasing number (a sequence for example), instead of relying on a random number.

> 12.01.2021, 17:55, "Michael Lewis" <mlewis@entrata.com>:
> On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна <ishsha@yandex.ru> wrote:
> List or range partitioning isn't suitable for my case.
> I am using a column of numeric(20) type as a base for partitioning. The values of the column are generated randomly.
> So there will be too many partitions if I use list partitioning as is.
>
> Sorry, but why is range not suited for this? It would seem fairly trivial to create 50 or 1000 partitions to break up the range of values allowed by your field definition.

Alban Hertroys

That said, there is no reason you should need near-perfectly-even distribution anyway. You can also split partitions later, or do another level of partitioning on large partitions if they somehow end up significantly unbalanced.

How many rows are we talking about initially/over time? Do you plan to drop old data at all? Perhaps the initial decision to partition was decided on a bit too hastily.