Обсуждение: Creating table and indexes for new application

Поиск
Список
Период
Сортировка

Creating table and indexes for new application

От
yudhi s
Дата:
Hello Friends,
We are newly creating tables for a system which is going to consume transaction data from customers and store in postgres version 15+ database. And it would be ~400million transactions/rows per day in the main transaction table and almost double in the multiple child tables and some child tables will hold lesser records too.

Say TABLE1 is the main or parent transaction table which will hold ~400million transactions.
TABLE2 is the child table which is going to hold ~800million rows/day. It has one to many relationships with table1.
TABLe3 is another child table which is going to hold ~200million rows per day.

We are considering all of these tables for partitioning by the same transaction_date column and it would be daily partitions. We have some questions,

1)While creating these tables and related indexes, do we need to be careful of defining any other storage parameters like tablespaces etc Or its fine to make those table/indexes aligned to the default tablespace only? and are there any constraints on tablespace size , as we will have 100's GB of data going to be stored in each of the daily partitions?

2)Should we be creating composite indexes on each foreign key for table2 and table3, because any update or delete on parent is going to take lock on all child tables?

3)We were thinking of simple Btree indexes to be created on the columns based on the search criteria of the queries. but the indexes doc i see in postgres having INCLUDE keywords also available in them. So I'm struggling to understand a bit, how it's adding value to the read query performance if those additional columns are not added explicitly to the index but are part of the INCLUDE clause? Will it give some benefit in regards to the index storage space? or should we always keep all the columns in the index definition itself other than some exception scenario? Struggling to understand the real benefit of the INCLUDE clause.

Below is a sample DDL of what it will look like.
 
 Create table syntax:-
 
CREATE TABLE TABLE1
(
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp)
) partition by range (txn_timestamp);


CREATE TABLE TABLE2
(
    table2_id varchar(36) not null,
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp)
) partition by range (txn_timestamp);

alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references  table1(TXN_ID,txn_timestamp);
Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key

CREATE TABLE TABLE3
(
    table3_id varchar(36) not null,
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp)
) partition by range (txn_timestamp);

alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references  table1(TXN_ID,txn_timestamp);
Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign key.


Thanks And Regards
Yudhi

Re: Creating table and indexes for new application

От
Lok P
Дата:
My 2 cents...
Foreign key indexes are required for avoiding locking when deleting and updating the pk in parent. But insert only table may not have any issue. And we used to do this in other relational databases like oracle but not sure how different is this in PG. 
However considering your high transaction volume it might have some negative impact on your data inserts having foreign keys on the txn tables and also related indexes in first place I think. So better evaluate. 
Others may comment on this. 

On Fri, 23 Feb, 2024, 2:05 am yudhi s, <learnerdatabase99@gmail.com> wrote:
Hello Friends,
We are newly creating tables for a system which is going to consume transaction data from customers and store in postgres version 15+ database. And it would be ~400million transactions/rows per day in the main transaction table and almost double in the multiple child tables and some child tables will hold lesser records too.

Say TABLE1 is the main or parent transaction table which will hold ~400million transactions.
TABLE2 is the child table which is going to hold ~800million rows/day. It has one to many relationships with table1.
TABLe3 is another child table which is going to hold ~200million rows per day.

We are considering all of these tables for partitioning by the same transaction_date column and it would be daily partitions. We have some questions,

1)While creating these tables and related indexes, do we need to be careful of defining any other storage parameters like tablespaces etc Or its fine to make those table/indexes aligned to the default tablespace only? and are there any constraints on tablespace size , as we will have 100's GB of data going to be stored in each of the daily partitions?

2)Should we be creating composite indexes on each foreign key for table2 and table3, because any update or delete on parent is going to take lock on all child tables?

3)We were thinking of simple Btree indexes to be created on the columns based on the search criteria of the queries. but the indexes doc i see in postgres having INCLUDE keywords also available in them. So I'm struggling to understand a bit, how it's adding value to the read query performance if those additional columns are not added explicitly to the index but are part of the INCLUDE clause? Will it give some benefit in regards to the index storage space? or should we always keep all the columns in the index definition itself other than some exception scenario? Struggling to understand the real benefit of the INCLUDE clause.

Below is a sample DDL of what it will look like.
 
 Create table syntax:-
 
CREATE TABLE TABLE1
(
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp)
) partition by range (txn_timestamp);


CREATE TABLE TABLE2
(
    table2_id varchar(36) not null,
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp)
) partition by range (txn_timestamp);

alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references  table1(TXN_ID,txn_timestamp);
Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key

CREATE TABLE TABLE3
(
    table3_id varchar(36) not null,
TXN_ID varchar(36)  NOT NULL ,
txn_timestamp date  NOT NULL ,
CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp)
) partition by range (txn_timestamp);

alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references  table1(TXN_ID,txn_timestamp);
Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign key.


Thanks And Regards
Yudhi

Re: Creating table and indexes for new application

От
Laurenz Albe
Дата:
On Fri, 2024-02-23 at 02:05 +0530, yudhi s wrote:
> postgres version 15+ database. And it would be ~400million transactions/rows per day in the
> main transaction table and almost double in the multiple child tables and some child tables
> will hold lesser records too.
>
> We are considering all of these tables for partitioning by the same transaction_date column
> and it would be daily partitions. We have some questions,
>
> 1)While creating these tables and related indexes, do we need to be careful of defining any
>   other storage parameters like tablespaces etc Or its fine to make those table/indexes
>   aligned to the default tablespace only? and are there any constraints on tablespace size ,
>   as we will have 100's GB of data going to be stored in each of the daily partitions?

There are no constraints on the size of a tablespace other than the limits of the underlying
file system.  Use the default tablespace.

> 2)Should we be creating composite indexes on each foreign key for table2 and table3, because
>   any update or delete on parent is going to take lock on all child tables?

Every foreign key needs its own index.  A composite index is only appropriate if the foreign
key spans multiple columns.

> 3)We were thinking of simple Btree indexes to be created on the columns based on the search
>   criteria of the queries. but the indexes doc i see in postgres having INCLUDE keywords also
>   available in them. So I'm struggling to understand a bit, how it's adding value to the read
>   query performance if those additional columns are not added explicitly to the index but are
>   part of the INCLUDE clause? Will it give some benefit in regards to the index storage space?
>   or should we always keep all the columns in the index definition itself other than some
>   exception scenario? Struggling to understand the real benefit of the INCLUDE clause.

The benefits of putting a column into INCLUDE is twofold:

1) The column only has to be stored in the leaf pages, since it is not used for searching.
   That makes the intermediate index entries smaller, which causes a wider fan-out of the
   index, which in turn makes the tree shallower and hence faster to search.

2) It is an implicit documentation that the column is not to be used for searching.

Yours,
Laurenz Albe



Re: Creating table and indexes for new application

От
sud
Дата:


On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-02-23 at 02:05 +0530, yudhi s 

> 2)Should we be creating composite indexes on each foreign key for table2 and table3, because
>   any update or delete on parent is going to take lock on all child tables?

Every foreign key needs its own index.  A composite index is only appropriate if the foreign
key spans multiple columns.
 

From the DDL which OP posted it's using composite foreign key thus a composite index would be needed. 
However, if someone doesn't delete or update the parent table PK   , is it still advisable to have all the FK indexed? Like in general I think transaction id should not get updated in a normal scenario unless some special case. 




Re: Creating table and indexes for new application

От
yudhi s
Дата:


On Fri, 23 Feb, 2024, 1:20 pm sud, <suds1434@gmail.com> wrote:


On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-02-23 at 02:05 +0530, yudhi s 

> 2)Should we be creating composite indexes on each foreign key for table2 and table3, because
>   any update or delete on parent is going to take lock on all child tables?

Every foreign key needs its own index.  A composite index is only appropriate if the foreign
key spans multiple columns.
 

From the DDL which OP posted it's using composite foreign key thus a composite index would be needed. 
However, if someone doesn't delete or update the parent table PK   , is it still advisable to have all the FK indexed? Like in general I think transaction id should not get updated in a normal scenario unless some special case. 



Thank you. I can double check if we have confirmed use case of deleting the parent table or updating PK in the parent table. But anyway it can happen for data fix for sure in some scenario. 
 
But yes, we are certainly going to drop/purge partition from all the parent and child table after specific days. So isn't that need the FK to be indexed or else it will scan whole parent table partition? 

Re: Creating table and indexes for new application

От
sud
Дата:


On Fri, 23 Feb, 2024, 1:28 pm yudhi s, <learnerdatabase99@gmail.com> wrote:


On Fri, 23 Feb, 2024, 1:20 pm sud, <suds1434@gmail.com> wrote:


On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-02-23 at 02:05 +0530, yudhi s 

> 2)Should we be creating composite indexes on each foreign key for table2 and table3, because
>   any update or delete on parent is going to take lock on all child tables?

Every foreign key needs its own index.  A composite index is only appropriate if the foreign
key spans multiple columns.
 

From the DDL which OP posted it's using composite foreign key thus a composite index would be needed. 
However, if someone doesn't delete or update the parent table PK   , is it still advisable to have all the FK indexed? Like in general I think transaction id should not get updated in a normal scenario unless some special case. 



Thank you. I can double check if we have confirmed use case of deleting the parent table or updating PK in the parent table. But anyway it can happen for data fix for sure in some scenario. 
 
But yes, we are certainly going to drop/purge partition from all the parent and child table after specific days. So isn't that need the FK to be indexed or else it will scan whole parent table partition? 


I am not sure if drop partition of parent table, will have a lock or will do a full scan on the child table while doing the partition maintenance or dropping the partitions, in absence of foreign key index. Others may comment here. 

Re: Creating table and indexes for new application

От
yudhi s
Дата:

On Fri, Feb 23, 2024 at 5:26 PM sud <suds1434@gmail.com> wrote:


On Fri, 23 Feb, 2024, 1:28 pm yudhi s, <learnerdatabase99@gmail.com> wrote:


On Fri, 23 Feb, 2024, 1:20 pm sud, <suds1434@gmail.com> wrote:


On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-02-23 at 02:05 +0530, yudhi s 

> 2)Should we be creating composite indexes on each foreign key for table2 and table3, because
>   any update or delete on parent is going to take lock on all child tables?

Every foreign key needs its own index.  A composite index is only appropriate if the foreign
key spans multiple columns.
 

From the DDL which OP posted it's using composite foreign key thus a composite index would be needed. 
However, if someone doesn't delete or update the parent table PK   , is it still advisable to have all the FK indexed? Like in general I think transaction id should not get updated in a normal scenario unless some special case. 



Thank you. I can double check if we have confirmed use case of deleting the parent table or updating PK in the parent table. But anyway it can happen for data fix for sure in some scenario. 
 
But yes, we are certainly going to drop/purge partition from all the parent and child table after specific days. So isn't that need the FK to be indexed or else it will scan whole parent table partition? 


I am not sure if drop partition of parent table, will have a lock or will do a full scan on the child table while doing the partition maintenance or dropping the partitions, in absence of foreign key index. Others may comment here. 


Can you please help me understand, If it's true that all the statements like Delete, Update and Drop partition of parent table will take lock on the child table and Full scan the child table , and thus foreign key index on all the child table is necessary irrespective of the performance overhead it has on all the INSERT queries into the child tables? 

Re: Creating table and indexes for new application

От
Lok P
Дата:


On Sun, 25 Feb, 2024, 1:05 am yudhi s, <learnerdatabase99@gmail.com> wrote:

On Fri, Feb 23, 2024 at 5:26 PM sud <suds1434@gmail.com> wrote:


On Fri, 23 Feb, 2024, 1:28 pm yudhi s, <learnerdatabase99@gmail.com> wrote:


On Fri, 23 Feb, 2024, 1:20 pm sud, <suds1434@gmail.com> wrote:


On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-02-23 at 02:05 +0530, yudhi s 

> 2)Should we be creating composite indexes on each foreign key for table2 and table3, because
>   any update or delete on parent is going to take lock on all child tables?

Every foreign key needs its own index.  A composite index is only appropriate if the foreign
key spans multiple columns.
 

From the DDL which OP posted it's using composite foreign key thus a composite index would be needed. 
However, if someone doesn't delete or update the parent table PK   , is it still advisable to have all the FK indexed? Like in general I think transaction id should not get updated in a normal scenario unless some special case. 



Thank you. I can double check if we have confirmed use case of deleting the parent table or updating PK in the parent table. But anyway it can happen for data fix for sure in some scenario. 
 
But yes, we are certainly going to drop/purge partition from all the parent and child table after specific days. So isn't that need the FK to be indexed or else it will scan whole parent table partition? 


I am not sure if drop partition of parent table, will have a lock or will do a full scan on the child table while doing the partition maintenance or dropping the partitions, in absence of foreign key index. Others may comment here. 


Can you please help me understand, If it's true that all the statements like Delete, Update and Drop partition of parent table will take lock on the child table and Full scan the child table , and thus foreign key index on all the child table is necessary irrespective of the performance overhead it has on all the INSERT queries into the child tables? 


Not sure of the lock but I think you should be able to see a full scan on child table while doing delete or update of parent table PK. Explain Analyze should show that I believe. Not sure if explain analyze will work for drop partition too.