Обсуждение: concurrent re-partitioning of declarative partitioned tables

Поиск
Список
Период
Сортировка

concurrent re-partitioning of declarative partitioned tables

От
Nick Cleaton
Дата:
I want to set up a large table on postgresql 12.4, using declarative partitioning to partition by record creation date. I'd like to have recent records in small partitions but old records in a few larger partitions, so I want merges. The merges should be concurrent, in the sense that they lock out readers or writers only for very short intervals if at all.

I'm looking at adding an extra boolean column and partitioning on that at the top level, with two parallel date-based partition trees underneath, so that I can effectively have overlapping date partitions:


create table mytable (
  record_date timestamp with time zone not null,
  _partition_channel boolean,
  ...
)
partition by list (_partition_channel);

create table mytable_chan_null
  partition of mytable for values in (null)
  partition by range (record_date);

create table mytable_chan_true
  partition of mytable for values in (true)
  partition by range (record_date);

create table mytable_day_20200101
  partition of mytable_chan_null
  for values from ('2020-01-01') to ('2020-01-02');

...

create table mytable_day_20200107
  partition of mytable_chan_null
  for values from ('2020-01-07') to ('2020-01-08');


Then to merge several day-partitions into a week-partition:

create table mytable_week_20200101
  partition of mytable_chan_true
  for values from ('2020-01-01') to ('2020-01-08');

... and migrate rows in batches by updating _partition_channel to true, then finally drop the empty day partitions.

Since record_date is an insertion timestamp, I don't mind that after this merge updating the record_date of a merged row could fail due to a missing partition. Likewise there's no need for new rows to be inserted with record_date values in previously merged ranges.

Questions:

Are there any hidden pitfalls with this approach ?

Have I missed a simpler way ?

Is there a project out there that will manage this for me ?

Re: concurrent re-partitioning of declarative partitioned tables

От
Michael Lewis
Дата:
You can not have overlapping partitions that are both attached. Why do you want to merge partitions that you are "done with" instead of just leaving them partitioned by day?

Why are you partitioning at all? Are you confident that you need partitions for performance & that the trade-offs are worth the cost, or are you needing to detach/drop old data quickly to adhere to a retention policy?

pg_partman extension may be worth looking into. I have only dabbled with it, so I can't offer much comment.

Re: concurrent re-partitioning of declarative partitioned tables

От
"David G. Johnston"
Дата:
On Mon, Nov 30, 2020 at 8:36 AM Nick Cleaton <nick@cleaton.net> wrote:
I want to set up a large table on postgresql 12.4, using declarative partitioning to partition by record creation date. I'd like to have recent records in small partitions but old records in a few larger partitions, so I want merges. The merges should be concurrent, in the sense that they lock out readers or writers only for very short intervals if at all.

Once a date has passed is the table for that date effectively read-only?

Re: concurrent re-partitioning of declarative partitioned tables

От
Nick Cleaton
Дата:
On Wed, 2 Dec 2020 at 16:07, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Nov 30, 2020 at 8:36 AM Nick Cleaton <nick@cleaton.net> wrote:
I want to set up a large table on postgresql 12.4, using declarative partitioning to partition by record creation date. I'd like to have recent records in small partitions but old records in a few larger partitions, so I want merges. The merges should be concurrent, in the sense that they lock out readers or writers only for very short intervals if at all.

Once a date has passed is the table for that date effectively read-only?

No, old records get updated from time to time, although updates are much less common than for recent records.

Re: concurrent re-partitioning of declarative partitioned tables

От
Nick Cleaton
Дата:
On Wed, 2 Dec 2020 at 15:59, Michael Lewis <mlewis@entrata.com> wrote:
You can not have overlapping partitions that are both attached.

Not directly, no. That's why I'm considering the _partition_channel hack.

Why do you want to merge partitions that you are "done with" instead of just leaving them partitioned by day?

I have some random access index lookups on columns not in the partition key, where values are unique over the entire table so at most one partition is going to return a row. A lookup that touches 4 or 5 pages in each of 100 partition indexes is more expensive than one that touches 6 or 7 pages in each of 10 larger partition indexes.

Why are you partitioning at all? Are you confident that you need partitions for performance & that the trade-offs are worth the cost, or are you needing to detach/drop old data quickly to adhere to a retention policy?

I do want cheap drops of old data, but also many queries have indexable conditions on non-key columns and also only want records from the most recent N days, so partition pruning is useful there with small partitions for recent records.