Re: [PATCH] Automatic HASH and LIST partition creation

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема Re: [PATCH] Automatic HASH and LIST partition creation
Дата
Msg-id d596fdf5-7985-2a79-f76f-4f7606798642@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [PATCH] Automatic HASH and LIST partition creation  (Rahila Syed <rahilasyed90@gmail.com>)
Ответы Re: [PATCH] Automatic HASH and LIST partition creation  (Pavel Borisov <pashkin.elfe@gmail.com>)
Список pgsql-hackers
On 05.10.2020 09:36, Rahila Syed wrote:

Hi, 

Couple of comments: 
1. The syntax used omits the { IMMEDIATE | DEFERRED} keywords suggested in 
the earlier discussions. I think it is intuitive to include IMMEDIATE with the current implementation
so that the syntax can be extended with a  DEFERRED clause in future for dynamic partitions.
  CREATE TABLE tbl_lst (i int) PARTITION BY LIST (i)
 CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);
 
After some consideration, I decided that we don't actually need to introduce IMMEDIATE | DEFERRED keyword. For hash and list partitions it will always be immediate, as the number of partitions cannot change after we initially set it. For range partitions, on the contrary, it doesn't make much sense to make partitions immediately, because in many use-cases one bound will be open.


DEFERRED clause refers to creating partitions on the fly, while the data is being inserted. 
The number of partitions and partition bounds can be the same as specified initially 
during partitioned table creation, but the actual creation of partitions can be deferred. 
This seems like a potential extension to statically created partitions even in the case of 
hash and list partitions, as it won't involve moving any existing data.

Oh, now I see what you mean. The case with already existing tables will require changes to ALTER TABLE syntax. And that's where we may want to choose between immediate (i.e. locking) and deferred (i.e. concurrent) creation of partitions. I think we should try to implement it with existing keywords, maybe use 'CONCURRENTLY' keyword and it will look like:

ALTER TABLE tbl PARTITION BY ... CONFIGURATION (....) [CONCURRENTLY];

Anyway, the task of handling existing data is much more complicated, especially the 'concurrent' case and to be honest, I haven't put much thought into it yet.

The current patch only implements the simplest case of creating a new partitioned table. And I don't see if CREATE TABLE needs this immediate|deferred clause or if it will need it in the future.

Thoughts?


     2. One suggestion for generation of partition names is to append a unique id to
avoid conflicts.

Can you please give an example of such a conflict? I agree that current naming scheme is far from perfect, but I think that 'tablename'_partnum provides unique name for each partition.


Sorry for not being clear earlier, I mean the partition name 'tablename_partnum' can conflict with any existing table name. 
As per current impemetation, if I do the following it results in the table name conflict.

postgres=# create table tbl_test_5_1(i int);
CREATE TABLE
postgres=# CREATE TABLE tbl_test_5 (i int) PARTITION BY LIST((tbl_test_5))                                                                                                               CONFIGURATION (values in ('(1)'::tbl_test_5), ('(3)'::tbl_test_5) default partition tbl_default_5);
ERROR:  relation "tbl_test_5_1" already exists 
 


I don't mind adding some specific suffix for generated partitions, although it still may conflict with existing table names. The main disadvantage of this idea, is that it reduces number of symbols available for table name, which can lead to something like this:

CREATE TABLE parteddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd (a text, b int NOT NULL DEFAULT 0,  CONSTRAINT check_aa CHECK (length(a) > 0))
PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('a','b'),('c','d') DEFAULT PARTITION parted_def) ;;
NOTICE:  identifier "parteddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" will be truncated to "partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
ERROR:  relation "partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" already exists

The error message here is a bit confusing, as relation 'partedddddddddddddddddddddddddddddddddddddddddddddddddddddddddd' haven't existed before and this is a conflict between partitioned and generated partition table name. For now, I don't know if we can handle it more gracefully. Probably, we could truncate tablename to a shorter size, but it doesn't provide a complete solution, because partition number can contain several digits.

See also pg_partman documentation on the same issue: https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman.md#naming-length-limits

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andy Fan
Дата:
Сообщение: Re: Improve choose_custom_plan for initial partition prune case
Следующее
От: Andy Fan
Дата:
Сообщение: Re: [HACKERS] Runtime Partition Pruning