Обсуждение: Simple task with partitioning which I can't realize

Поиск
Список
Период
Сортировка

Simple task with partitioning which I can't realize

От
"Andrew Zakharov"
Дата:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 

Re: Simple task with partitioning which I can't realize

От
"David G. Johnston"
Дата:
On Tue, Mar 1, 2022 at 8:37 AM Andrew Zakharov <Andrew898@mail.ru> wrote:

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

 

 

I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field.


Yes, you are coming up against the following limitation:

"Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions."


That limitation is independent of partitioning; i.e., the legacy inheritance option doesn't bypass it.

Thus, your true "key" is composite: (region, identifier).  Thus you need to add a "parent_region_code" column as well, redefine the PK as (region_code, gid), and the REFERENCES clause to link the two paired fields.

You can decide whether that is sufficient or if you want some added comfort in ensuring that a gid cannot appear in multiple regions by creating a single non-partitioned table containing all gid values and add a unique constraint there.

Or maybe allow for duplicates across region codes and save space by using a smaller data type (int or bigint - while renaming the column to "rid" or some such) - combined with having the non-partitioned reference table being defined as (region_code, rid, gid).

David J.

Re: Simple task with partitioning which I can't realize

От
Marc Millas
Дата:
Hi,

is there any chance (risk ?) that a given gid be present in more than one region ?
if not (or if you implement it via a dedicated, non partition table), 

you may create a simple table partitioned by region, and create unique indexes for each partition.
this is NOT equivalent to a unique constraint at global table level, of course.

Marc MILLAS
Senior Architect
+33607850334



On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 

RE: Simple task with partitioning which I can't realize

От
"Andrew Zakharov"
Дата:

David, - yes, creation composite foreign/primary key is not a problem. But the main question is what method should I use for partitioning by composite key gid, region_code? The partition method itself created not only for faster data access but for better administration. The administration like a truncate/insert is a main reason why I split the data for my DWH case. If the only hash method is left I cannot administer the partitions separately this way. But anyway, could you please provide your vision the brief declaration for main table and partition?

Thanks.

Andrew.

 

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Tuesday, March 01, 2022 6:54 PM
To: Andrew Zakharov <Andrew898@mail.ru>
Cc: Pgsql Performance <pgsql-performance@lists.postgresql.org>
Subject: Re: Simple task with partitioning which I can't realize

 

On Tue, Mar 1, 2022 at 8:37 AM Andrew Zakharov <Andrew898@mail.ru> wrote:

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

 

 

I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field.

 

Yes, you are coming up against the following limitation:

 

"Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions."

 

 

That limitation is independent of partitioning; i.e., the legacy inheritance option doesn't bypass it.

 

Thus, your true "key" is composite: (region, identifier).  Thus you need to add a "parent_region_code" column as well, redefine the PK as (region_code, gid), and the REFERENCES clause to link the two paired fields.

 

You can decide whether that is sufficient or if you want some added comfort in ensuring that a gid cannot appear in multiple regions by creating a single non-partitioned table containing all gid values and add a unique constraint there.

 

Or maybe allow for duplicates across region codes and save space by using a smaller data type (int or bigint - while renaming the column to "rid" or some such) - combined with having the non-partitioned reference table being defined as (region_code, rid, gid).

 

David J.

 

Re: Simple task with partitioning which I can't realize

От
"David G. Johnston"
Дата:
On Tue, Mar 1, 2022 at 9:37 AM Andrew Zakharov <Andrew898@mail.ru> wrote:

David, - yes, creation composite foreign/primary key is not a problem. But the main question is what method should I use for partitioning by composite key gid, region_code?


The convention here is to inline or bottom-post responses.

Your original plan - list partitions by region_code.  You couldn't do that before because you weren't seeing the region_code as being part of your PK and all partition columns must be part of the PK.  My suggestion is that instead of figuring out how to work around that limitation (not that I think there is a good one to be had) you accept it and just add region_code to the PK.

David J.

RE: Simple task with partitioning which I can't realize

От
"Andrew Zakharov"
Дата:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different region partitions. I considered simple partitioned table w/o any keys but I’d believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas <marc.millas@mokadb.com>
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov <Andrew898@mail.ru>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.


Marc MILLAS

Senior Architect

+33607850334

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 

Re: Simple task with partitioning which I can't realize

От
Marc Millas
Дата:
Andrew,

contrary to Oracle, in postgres you can add the indexes and/or the constraints which are meaningful to you at partition level.
I was not saying NOT to create keys, but I was saying to create them at partition level.


Marc MILLAS
Senior Architect
+33607850334



On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different region partitions. I considered simple partitioned table w/o any keys but I’d believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas <marc.millas@mokadb.com>
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov <Andrew898@mail.ru>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.


Marc MILLAS

Senior Architect

+33607850334

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 

RE: Simple task with partitioning which I can't realize

От
"Andrew Zakharov"
Дата:

Yes, Marc –

I understood you properly and totally. I was just saying about the hope that there is a trick to keep constraints on the base table level for my case.

Thanks a bunch.

Andrew.

 

 

On Tue, Mar 01, 2022 at 9:00 PM  Marc Millas <marc.millas@mokadb.com> wrote:

Andrew,

 

contrary to Oracle, in postgres you can add the indexes and/or the constraints which are meaningful to you at partition level.

I was not saying NOT to create keys, but I was saying to create them at partition level.

 


Marc MILLAS

Senior Architect

+33607850334

 

 

 

On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different region partitions. I considered simple partitioned table w/o any keys but I’d believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas <marc.millas@mokadb.com>
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov <Andrew898@mail.ru>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.


Marc MILLAS

Senior Architect

+33607850334

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 

RE: Simple task with partitioning which I can't realize

От
"Michel SALAIS"
Дата:

De : Marc Millas <marc.millas@mokadb.com>
Envoyé : mardi 1 mars 2022 19:00
À : Andrew Zakharov <Andrew898@mail.ru>
Cc : pgsql-performance@lists.postgresql.org
Objet : Re: Simple task with partitioning which I can't realize

 

Andrew,

 

contrary to Oracle, in postgres you can add the indexes and/or the constraints which are meaningful to you at partition level.

I was not saying NOT to create keys, but I was saying to create them at partition level.

 


Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com

 

 

 

On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different region partitions. I considered simple partitioned table w/o any keys but I’d believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas <marc.millas@mokadb.com>
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov <Andrew898@mail.ru>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.


Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 _________________________________________________________________________________________

Hi

To say it using Oracle vocabulary, PostgreSQL doesn’t offer GLOBAL INDEXES. Even when we create an index on the partitioned table which is now possible, PostgreSQL create LOCAL indexes on each partition separately.

There is no global indexes on partitioned tables in PostgreSQL. So it is not simple to offer uniqueness at global level using indexes. That is why, it is required that partition key columns be part of the primary key AND any other UNIQE constraint.

 

Michel SALAIS

Re: Simple task with partitioning which I can't realize

От
Geri Wright
Дата:
If you are wanting to ensure uniqueness for the original oracle pk across the partitions, you could look into adding an advisory trigger to the table.

On Wed, Mar 2, 2022, 2:28 AM Michel SALAIS <msalais@msym.fr> wrote:

De : Marc Millas <marc.millas@mokadb.com>
Envoyé : mardi 1 mars 2022 19:00
À : Andrew Zakharov <Andrew898@mail.ru>
Cc : pgsql-performance@lists.postgresql.org
Objet : Re: Simple task with partitioning which I can't realize

 

Andrew,

 

contrary to Oracle, in postgres you can add the indexes and/or the constraints which are meaningful to you at partition level.

I was not saying NOT to create keys, but I was saying to create them at partition level.

 


Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com

 

 

 

On Tue, Mar 1, 2022 at 5:45 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hi Marc –

Since there is a DWH fed by ETL there no risks to have same gids in different region partitions. I considered simple partitioned table w/o any keys but I’d believed there is a solutions with keys that’s why I’m seeking the clue.

Thanks.

Andrew.

 

From: Marc Millas <marc.millas@mokadb.com>
Sent: Tuesday, March 01, 2022 7:29 PM
To: Andrew Zakharov <Andrew898@mail.ru>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Simple task with partitioning which I can't realize

 

Hi,

 

is there any chance (risk ?) that a given gid be present in more than one region ?

if not (or if you implement it via a dedicated, non partition table), 

 

you may create a simple table partitioned by region, and create unique indexes for each partition.

this is NOT equivalent to a unique constraint at global table level, of course.


Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com

 

 

 

On Tue, Mar 1, 2022 at 4:37 PM Andrew Zakharov <Andrew898@mail.ru> wrote:

Hello all –

I have a task which is simple at the first look. I have a table which contains hierarchy of address objects starting with macro region end ends with particular buildings. You can imagine how big is it.

Here is short sample of table declaration:

 

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

  …

    constraint pk_region_hierarchy primary key (gid),

    constraint fk_region_hierarchy_region_hierarchy_parent foreign key (parent_gid) references region_hierarchy(gid)

);

 

Being an Oracle specialist, I planned to using same declarative partitioning by list on the region_code field as I did in Oracle database. I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field. Thus partition method “by list” is not appropriate method in this case and “by range” either. What I have left from partition methods? Hash? How can I create partitions by gid & region_code by hash? Feasible? Will it be working properly (with partition pruning) when search criteria is by region_code only? Same problem appears when there is simple serial “id” used as primary identifier. Removing all constraints is not considered. I understand that such specific PostgreSQL partitioning implementation has done by tons of reasons but how I can implement partitioning for my EASY case? I see the only legacy inheritance is left, right? Very sad if it’s true.

Your advices are very important.

Thanks in advance.

Andrew.

 _________________________________________________________________________________________

Hi

To say it using Oracle vocabulary, PostgreSQL doesn’t offer GLOBAL INDEXES. Even when we create an index on the partitioned table which is now possible, PostgreSQL create LOCAL indexes on each partition separately.

There is no global indexes on partitioned tables in PostgreSQL. So it is not simple to offer uniqueness at global level using indexes. That is why, it is required that partition key columns be part of the primary key AND any other UNIQE constraint.

 

Michel SALAIS

Re: Simple task with partitioning which I can't realize

От
Mladen Gogala
Дата:
On 3/1/22 10:54, David G. Johnston wrote:
On Tue, Mar 1, 2022 at 8:37 AM Andrew Zakharov <Andrew898@mail.ru> wrote:

create table region_hierarchy(

  gid uuid not null default uuid_generate_v1mc(),

  parent_gid uuid null,

  region_code int2,

 

 

I’ve carefully looked thru docs/faqs/google/communities and found out that I must include “gid” field into partition key because a primary key field.


Yes, you are coming up against the following limitation:

"Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions."


That limitation is independent of partitioning; i.e., the legacy inheritance option doesn't bypass it.

Thus, your true "key" is composite: (region, identifier).  Thus you need to add a "parent_region_code" column as well, redefine the PK as (region_code, gid), and the REFERENCES clause to link the two paired fields.

You can decide whether that is sufficient or if you want some added comfort in ensuring that a gid cannot appear in multiple regions by creating a single non-partitioned table containing all gid values and add a unique constraint there.

Or maybe allow for duplicates across region codes and save space by using a smaller data type (int or bigint - while renaming the column to "rid" or some such) - combined with having the non-partitioned reference table being defined as (region_code, rid, gid).

David J.

Hi David,

Are there any concrete plans to address that particular limitation? That limitation can be re-stated as "PostgreSQL doesn't support global indexes on the partitioned tables" and I've have also run into it. My way around it was not to use partitioning but to use much larger machine with the NVME disks, which can handle the necesary I/O. Are there any plans to allow global indexes? I am aware that this is not a small change but is the only real advantage that Oracle holds over PostgreSQL.

Regards

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com