Re: Adding support for Default partition in partitioning

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: Adding support for Default partition in partitioning
Дата
Msg-id a816af3d-dfc0-7f28-1a48-e675b206b214@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Adding support for Default partition in partitioning  (Keith Fiske <keith@omniti.com>)
Список pgsql-hackers
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
> 
> However, running your original example, I'm getting this error
> 
> keith@keith=# CREATE TABLE list_partitioned (
> keith(#     a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR:  unrecognized node type: 216

It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().

> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.

I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):

Suppose you want to add a new partition which will accept key=3 rows.

1. If no default partition exists, we're done; no key=3 rows would have  been accepted by any of the table's existing
partitions,so no need to  move any rows
 

2. Default partition exists which might contain key=3 rows, which we'll  need to move.  If we do this in the same
transaction,as you say, it  might result in unnecessary unavailability of table's data.  So, it's  better to delegate
thatresponsibility to a background process.  The  current transaction will only add the new key=3 partition, so any
key=3 rows will be routed to the new partition from this point on.  But we  haven't updated the default partition's
constraintyet to say that it  no longer contains key=3 rows (constraint that the planner consumes),  so it will
continueto be scanned for queries that request key=3 rows  (there should be some metadata to indicate that the default
partition's constraint is invalid), along with the new partition.
 

3. A background process receives a "work item" requesting it to move all  key=3 rows from the default partition heap to
thenew partition's heap.  The movement occurs without causing the table to become unavailable;  once all rows have been
moved,we momentarily lock the table to update  the default partition's constraint to mark it valid, so that it will  no
longerbe accessed by queries that want to see key=3 rows.
 

Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction.  Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue.  In that
case, we need to think of the interface more carefully.

Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:

* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4

> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?

I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.

Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?

I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of default partitions, but also to allow the feature to split/merge range
(only?) partitions.  I'd like to work on the latter for v11 for which I
would like to post a proposal soon; if anyone would like to collaborate
(ideas, code, review), I look forward to.  (sorry for hijacking this thread.)

Thanks,
Amit





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: partitioned tables and contrib/sepgsql
Следующее
От: Emre Hasegeli
Дата:
Сообщение: Re: BRIN cost estimate