Re: Adding support for Default partition in partitioning

Поиск
Список
Период
Сортировка
От Rushabh Lathia
Тема Re: Adding support for Default partition in partitioning
Дата
Msg-id CAGPqQf3Zb16C13MqHW4C3OjjTBtyWh1UPenZmYCj0TrxaeSTJQ@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Adding support for Default partition in partitioning  (Rahila Syed <rahilasyed90@gmail.com>)
Ответы Re: Adding support for Default partition in partitioning
Список pgsql-hackers


On Wed, Apr 5, 2017 at 10:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
>
> However, running your original example, I'm getting this error
>
> keith@keith=# CREATE TABLE list_partitioned (
> keith(#     a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR:  unrecognized node type: 216

It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().

> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.

I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):

Suppose you want to add a new partition which will accept key=3 rows.

1. If no default partition exists, we're done; no key=3 rows would have
   been accepted by any of the table's existing partitions, so no need to
   move any rows

2. Default partition exists which might contain key=3 rows, which we'll
   need to move.  If we do this in the same transaction, as you say, it
   might result in unnecessary unavailability of table's data.  So, it's
   better to delegate that responsibility to a background process.  The
   current transaction will only add the new key=3 partition, so any key=3
   rows will be routed to the new partition from this point on.  But we
   haven't updated the default partition's constraint yet to say that it
   no longer contains key=3 rows (constraint that the planner consumes),
   so it will continue to be scanned for queries that request key=3 rows
   (there should be some metadata to indicate that the default partition's
   constraint is invalid), along with the new partition.

3. A background process receives a "work item" requesting it to move all
   key=3 rows from the default partition heap to the new partition's heap.
   The movement occurs without causing the table to become unavailable;
   once all rows have been moved, we momentarily lock the table to update
   the default partition's constraint to mark it valid, so that it will
   no longer be accessed by queries that want to see key=3 rows.

Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction.  Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue.  In that
case, we need to think of the interface more carefully.

Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:

* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4

> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?

I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.


+1

I agree about the future plan about the row movement, how that is I am
not quite sure at this stage.

I was thinking that CREATE new partition is the DDL command, so even
if row-movement works with holding the lock on the new partition table,
that should be fine. I am not quire sure, why row movement should be
happen in the back-ground process.

Of-course, one idea is that if someone don't want feature of row-movement,
then we might add that under some GUC or may be as another option into
the CREATE partition table.

Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?

I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of default partitions, but also to allow the feature to split/merge range
(only?) partitions.  I'd like to work on the latter for v11 for which I
would like to post a proposal soon; if anyone would like to collaborate
(ideas, code, review), I look forward to.  (sorry for hijacking this thread.)

Thanks,
Amit





--
Rushabh Lathia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Emre Hasegeli
Дата:
Сообщение: Re: BRIN cost estimate
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Statement timeout behavior in extended queries