Обсуждение: Need guidance on partioning

Поиск
Список
Период
Сортировка

Need guidance on partioning

От
srinivasan s
Дата:
Hello everyone,

I hope you are all doing well.

I am seeking guidance on how to implement partitioning in PostgreSQL.

We have a large table that currently does not have any partitioning, and we have two requirements for removing old data from this table. We are looking to create a new table with partitioning.

1. The first requirement is to delete all records from the table that are older than 18 months. I believe we can achieve this by using range partitioning on the timestamp column.

2. The second requirement is to remove data from the table when a user leaves the organization. We have the account ID and user ID in the same table.

Could someone please offer guidance on selecting the appropriate partitioning method (range, sub-partition, or composite)?

Additionally, not all queries use the timestamp column in the WHERE condition. Is it mandatory to use the partition key in the WHERE condition to benefit from partitioning? Can we create a composite index that combines the partition key column with other columns used in the WHERE clause? Would this be beneficial?

Thank you.




Re: Need guidance on partioning

От
Kashif Zeeshan
Дата:
Hi Srinivasan

You can use the following documentation to learn of PG Table Partitioning. 



On Wed, May 22, 2024 at 5:04 PM srinivasan s <srinioracledba7@gmail.com> wrote:
Hello everyone,

I hope you are all doing well.

I am seeking guidance on how to implement partitioning in PostgreSQL.

We have a large table that currently does not have any partitioning, and we have two requirements for removing old data from this table. We are looking to create a new table with partitioning.

1. The first requirement is to delete all records from the table that are older than 18 months. I believe we can achieve this by using range partitioning on the timestamp column.
RANGE Partition is a good option here. 

2. The second requirement is to remove data from the table when a user leaves the organization. We have the account ID and user ID in the same table.

Could someone please offer guidance on selecting the appropriate partitioning method (range, sub-partition, or composite)?
It all depends on your data structure, please go through the documentation and then analyze your table/data structure to find the appropriate method. 

Additionally, not all queries use the timestamp column in the WHERE condition. Is it mandatory to use the partition key in the WHERE condition to benefit from partitioning? Can we create a composite index that combines the partition key column with other columns used in the WHERE clause? Would this be beneficial?
Following is the limitaiton
 To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint's columns must include all of the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

Thank you.




Re: Need guidance on partioning

От
srinivasan s
Дата:
Thanks Kashif, I posted the message after I went through the official documentation and some other blogs. Seeking guidance from any experts who already implemented Similar things.

On Wed, May 22, 2024 at 5:49 PM Kashif Zeeshan <kashi.zeeshan@gmail.com> wrote:
Hi Srinivasan

You can use the following documentation to learn of PG Table Partitioning. 



On Wed, May 22, 2024 at 5:04 PM srinivasan s <srinioracledba7@gmail.com> wrote:
Hello everyone,

I hope you are all doing well.

I am seeking guidance on how to implement partitioning in PostgreSQL.

We have a large table that currently does not have any partitioning, and we have two requirements for removing old data from this table. We are looking to create a new table with partitioning.

1. The first requirement is to delete all records from the table that are older than 18 months. I believe we can achieve this by using range partitioning on the timestamp column.
RANGE Partition is a good option here. 

2. The second requirement is to remove data from the table when a user leaves the organization. We have the account ID and user ID in the same table.

Could someone please offer guidance on selecting the appropriate partitioning method (range, sub-partition, or composite)?
It all depends on your data structure, please go through the documentation and then analyze your table/data structure to find the appropriate method. 

Additionally, not all queries use the timestamp column in the WHERE condition. Is it mandatory to use the partition key in the WHERE condition to benefit from partitioning? Can we create a composite index that combines the partition key column with other columns used in the WHERE clause? Would this be beneficial?
Following is the limitaiton
 To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint's columns must include all of the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

Thank you.




Re: Need guidance on partioning

От
Muhammad Salahuddin Manzoor
Дата:
Dear Srinivasan,

Implementing partitioning in PostgreSQL can significantly improve the performance of your database

1. Deleting Records Older Than 18 Months

Create the Parent Table: Define your main table as partitioned by range on the timestamp column.

CREATE TABLE your_table (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    account_id INT,
    user_id INT,
    -- other columns
) PARTITION BY RANGE (timestamp);

2. Create Partitions: Define partitions for each range, such as monthly or quarterly. This makes it easier to drop old partitions when they exceed 18 months.

CREATE TABLE your_table_2023_01 PARTITION OF your_table FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
CREATE TABLE your_table_2023_02 PARTITION OF your_table FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
-- Repeat for other months

Automate Partition Management: You can use tools like pg_partman or create scripts to manage partitions, ensuring new partitions are created and old ones are dropped as needed.


Salahuddin (살라후딘)



On Wed, 22 May 2024 at 17:04, srinivasan s <srinioracledba7@gmail.com> wrote:
Hello everyone,

I hope you are all doing well.

I am seeking guidance on how to implement partitioning in PostgreSQL.

We have a large table that currently does not have any partitioning, and we have two requirements for removing old data from this table. We are looking to create a new table with partitioning.

1. The first requirement is to delete all records from the table that are older than 18 months. I believe we can achieve this by using range partitioning on the timestamp column.

2. The second requirement is to remove data from the table when a user leaves the organization. We have the account ID and user ID in the same table.

Could someone please offer guidance on selecting the appropriate partitioning method (range, sub-partition, or composite)?

Additionally, not all queries use the timestamp column in the WHERE condition. Is it mandatory to use the partition key in the WHERE condition to benefit from partitioning? Can we create a composite index that combines the partition key column with other columns used in the WHERE clause? Would this be beneficial?

Thank you.




Need guidance on partioning

От
"Wetmore, Matthew (CTR)"
Дата:

Define large.

 

IMO, I don’t partition until 750M to 1B rows.

 

Maybe 500M is the table is heavily used or very wide.

 

Partitioning under those row counts without working knowledge can hinder performance.

 

From: srinivasan s <srinioracledba7@gmail.com>
Sent: Wednesday, May 22, 2024 5:04 AM
To: pgsql-admin@lists.postgresql.org
Subject: [EXTERNAL] Need guidance on partioning

 

Hello everyone,

 

I hope you are all doing well.

 

I am seeking guidance on how to implement partitioning in PostgreSQL.

 

We have a large table that currently does not have any partitioning, and we have two requirements for removing old data from this table. We are looking to create a new table with partitioning.

 

1. The first requirement is to delete all records from the table that are older than 18 months. I believe we can achieve this by using range partitioning on the timestamp column.

 

2. The second requirement is to remove data from the table when a user leaves the organization. We have the account ID and user ID in the same table.

 

Could someone please offer guidance on selecting the appropriate partitioning method (range, sub-partition, or composite)?

 

Additionally, not all queries use the timestamp column in the WHERE condition. Is it mandatory to use the partition key in the WHERE condition to benefit from partitioning? Can we create a composite index that combines the partition key column with other columns used in the WHERE clause? Would this be beneficial?

 

Thank you.

 

 

 

 

Re: Need guidance on partioning

От
M Sarwar
Дата:
I am also trying to learn the partitioning decision while watching this thread.

Matthew,
Is it  a good idea to partition the table when it reaches 750 MB in size or 1 B rows or any other general rule?
Thanks,
Sarwar


From: Wetmore, Matthew (CTR) <Matthew.Wetmore@evernorth.com>
Sent: Wednesday, May 22, 2024 9:39 AM
To: srinivasan s <srinioracledba7@gmail.com>; pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>
Subject: Need guidance on partioning
 

Define large.

 

IMO, I don’t partition until 750M to 1B rows.

 

Maybe 500M is the table is heavily used or very wide.

 

Partitioning under those row counts without working knowledge can hinder performance.

 

From: srinivasan s <srinioracledba7@gmail.com>
Sent: Wednesday, May 22, 2024 5:04 AM
To: pgsql-admin@lists.postgresql.org
Subject: [EXTERNAL] Need guidance on partioning

 

Hello everyone,

 

I hope you are all doing well.

 

I am seeking guidance on how to implement partitioning in PostgreSQL.

 

We have a large table that currently does not have any partitioning, and we have two requirements for removing old data from this table. We are looking to create a new table with partitioning.

 

1. The first requirement is to delete all records from the table that are older than 18 months. I believe we can achieve this by using range partitioning on the timestamp column.

 

2. The second requirement is to remove data from the table when a user leaves the organization. We have the account ID and user ID in the same table.

 

Could someone please offer guidance on selecting the appropriate partitioning method (range, sub-partition, or composite)?

 

Additionally, not all queries use the timestamp column in the WHERE condition. Is it mandatory to use the partition key in the WHERE condition to benefit from partitioning? Can we create a composite index that combines the partition key column with other columns used in the WHERE clause? Would this be beneficial?

 

Thank you.

 

 

 

 

Re: Need guidance on partioning

От
Ron Johnson
Дата:
On Wed, May 22, 2024 at 8:50 AM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:
Dear Srinivasan,

Implementing partitioning in PostgreSQL can significantly improve the performance of your database

1. Deleting Records Older Than 18 Months

Create the Parent Table: Define your main table as partitioned by range on the timestamp column.

CREATE TABLE your_table (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    account_id INT,
    user_id INT,
    -- other columns
) PARTITION BY RANGE (timestamp);


That's invalid syntax (as of PG 15):

psql (15.7)
Type "help" for help.

dba=#
dba=# CREATE TABLE your_table (
dba(#     id SERIAL PRIMARY KEY,
dba(#     timestamp TIMESTAMPTZ NOT NULL,
dba(#     account_id INT,
dba(#     user_id INT
dba(# ) PARTITION BY RANGE (timestamp);
ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  PRIMARY KEY constraint on table "your_table" lacks column "timestamp" which is part of the partition key.


Re: Need guidance on partioning

От
Rui DeSousa
Дата:


On May 22, 2024, at 11:10 AM, M Sarwar <sarwarmd02@outlook.com> wrote:

Is it  a good idea to partition the table when it reaches 750 MB in size or 1 B rows or any other general rule?


750MB is not a really big table; Postgres default segment size is 1GB, I normally set it 256GB instead.  1GB segment size is holdover from signed 32bit filesystems where the largest file size was 2GB. 

I’ve had tables that where hundreds of GB and with billions of records.  In my experiences partitioning usually degrades performance unless it well planned out as there is overhead associated with partitioning.

I don’t think there is a hard and fast rule for when to partition.  It really depending on the given issues at hand.  One good reason to partition large tables is for table maintenance, i.e. aging out and archiving out large chunks of data.  What is a large table? That might also depend on hardware; a big table on spinning rust might feel like small table on solid state.

There is also a limit to how many partitions one should create.  Too many and performance tanks.  I once inherited a system that partition by day, what a disaster, I completely eliminated the partitioning from system as it didn’t need it.  Point is partitioning is a tool and you’ll know when you need it.  As with a tool — if you have a hammer in hand and the fastener is a screw then that screw really start to look like nail.

Re: Need guidance on partioning

От
M Sarwar
Дата:
Hi Rui,
On 750 MB part, Matthew has clarified that he means to say that 750 million rows.
But all of your comments are insightful. I agree with all of your comments.

My tables are growing fast. My largest table size is 10gb with 130 Million rows. Database is still only used by operations team so far. It is going to roll out end customers soon.

No-one has started complaining about anything so far. I will not be surprising if someone start crying all of sudden.
My management has highly underestimates the database tasks. I am just listening to them with their priority order. Otherwise, my job will be at risk 🙂

Thank you,
Sarwar


From: Rui DeSousa <rui.desousa@icloud.com>
Sent: Wednesday, May 22, 2024 1:58 PM
To: M Sarwar <sarwarmd02@outlook.com>
Cc: Wetmore, Matthew (CTR) <Matthew.Wetmore@evernorth.com>; srinivasan s <srinioracledba7@gmail.com>; pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>
Subject: Re: Need guidance on partioning
 


On May 22, 2024, at 11:10 AM, M Sarwar <sarwarmd02@outlook.com> wrote:

Is it  a good idea to partition the table when it reaches 750 MB in size or 1 B rows or any other general rule?


750MB is not a really big table; Postgres default segment size is 1GB, I normally set it 256GB instead.  1GB segment size is holdover from signed 32bit filesystems where the largest file size was 2GB. 

I’ve had tables that where hundreds of GB and with billions of records.  In my experiences partitioning usually degrades performance unless it well planned out as there is overhead associated with partitioning.

I don’t think there is a hard and fast rule for when to partition.  It really depending on the given issues at hand.  One good reason to partition large tables is for table maintenance, i.e. aging out and archiving out large chunks of data.  What is a large table? That might also depend on hardware; a big table on spinning rust might feel like small table on solid state.

There is also a limit to how many partitions one should create.  Too many and performance tanks.  I once inherited a system that partition by day, what a disaster, I completely eliminated the partitioning from system as it didn’t need it.  Point is partitioning is a tool and you’ll know when you need it.  As with a tool — if you have a hammer in hand and the fastener is a screw then that screw really start to look like nail.

Re: Need guidance on partioning

От
Rui DeSousa
Дата:


On May 22, 2024, at 2:40 PM, M Sarwar <sarwarmd02@outlook.com> wrote:

No-one has started complaining about anything so far. I will not be surprising if someone start crying all of sudden.
My management has highly underestimates the database tasks. I am just listening to them with their priority order. Otherwise, my job will be at risk 🙂

Hi Sarwar,

What you’ll find is that for OLTP systems table sizes doesn’t really matter as access patterns should be well structured and serviced by indexes, etc.   The only reason to have partitioning in an OLTP system is for table maintenance; like aging out old data for regulatory reasons, etc.  

Partitioning is most beneficial in OLAP and reporting type systems where queries can make use of partition elimination.  When thinking about partitioning I also consider the use of Brin indexes as they can sometime achieve similar or even better results.  Also, when considering partitioning especially in a hybrid system one should also think about how it will impact referential integrity.