Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation

Поиск

Список

Период

Сортировка

От	Ron Johnson
Тема	Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
Дата	26 июня 15:49:23
Msg-id	CANzqJaCju2m=8iD=ijjH-1GiWGwUzrcHEn_2oteRZYdVh_RYJw@mail.gmail.com обсуждение исходный текст
Ответ на	Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation (Motog Plus <mplus7535@gmail.com>)
Ответы	Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
Список	pgsql-admin

Дерево обсуждения

PG easily handles our 6TB database, as well as 3 and 5TB databases (all on different VMs), and has done so since at least v8.4.

Ours are on single LVM mount points, as are the disks that hold the PgBackRest savesets.

"considering typical transaction loads."

Pfft...there are no typical transaction loads. Is this db OLTP, Reporting or DW?

On Wed, Jun 25, 2025 at 4:35 AM Motog Plus <mplus7535@gmail.com> wrote:

Dear PostgreSQL Community,
We are implementing a new feature in our application that is expected to generate a significant amount of data, and we are seeking your expert guidance on how to best handle this growth within our existing PostgreSQL setup.

Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu instance with the following specifications:
RAM: 32 GB
Disk: 1.2 TB
vCPUs: 16

Our database architecture utilizes a primary-standby streaming replication setup. Application modules (running in Kubernetes pods) connect to the database through Pgpool-II, using HikariCP for connection pooling.

We have multiple databases on our primary server, with their approximate current sizes as follows:
C: 620 GB
M: 225 GB
P: 59 GB
K: 13 MB

The total current size of our databases is around 1 TB. With the new feature, we anticipate a substantial increase in data, potentially reaching 10 TB over the next 5-7 years.

Below is the table for current size and expected growth in size:

S.No.
DB
Current DB size
Future DB size
Schema Name
Current Schema size
Future Schema size
1
C
1 TB
8 TB - 10 TB
acc
297 GB
3 TB - 4 TB
po
270 GB
2.6 TB - 3.5 TB
pa
27 GB
270 GB
pra
13 GB
130 GB
fu
13 GB
130 GB
te
167 MB
2 GB
pro
30 MB
300 MB
2
M
225 GB
2.2 TB - 3 TB
bi
82 GB
820 GB
co
80 GB
800 GB
ps
17 GB
170 GB
qo
16 GB
160 GB
to
7 GB
70 GB
in
7 GB
70 GB
di
6 GB
60 GB
no
4 GB
40 GB
do
4 GB
40 GB
cl
3 GB
30 GB
3
P
60 GB
600 GB
au
45 GB
450 GB
fi
8 GB
80 GB
con
4 GB
40 GB
ba
1 GB
10 GB
li
2 MB
20 GB

We would greatly appreciate your insights on the following points:
Scalability for Large Datasets: Conceptually, PostgreSQL is known to handle large datasets. However, we'd like to confirm if a single PostgreSQL instance can realistically and efficiently manage 10-12 TB of data in a production environment, considering typical transaction loads.
Database Split Strategy: Our largest database, "C," currently occupies 620 GB. It contains multiple schemas. We are considering splitting database "C" into two new databases: "C1" to exclusively house the "acc" schema, and "C2" for the remaining schemas. Is this a recommended approach for managing growth, and what are the potential pros and cons?
Server Allocation for Split Databases: If we proceed with splitting "C" into "C1" and "C2," would it be advisable to assign a new, separate database server for "C2," or could both "C1" and "C2" reside on the same database server? What factors should we consider in making this decision?
Performance Limits per Database and Database Server: From a performance perspective, is there a general "limit" or best practice for the maximum amount of data a single database server should handle (e.g., 10 TB) and similarly general limit per database? How does this influence the decision to add more database servers?
Best Practices for Large-Scale Data Management: Beyond standard practices like indexing and partitioning, what other best practices should we consider implementing to ensure optimal performance and manageability with such a large dataset? This could include configurations, maintenance strategies, etc.
Hardware Configuration Recommendations: Based on our projected data growth and desired performance, what hardware configurations (e.g., RAM, CPU, storage I/O, storage type like NVMe) would you recommend for future database servers to efficiently handle 10-12 TB?
Open-Source Horizontal Scaling Solutions: Are there any open-source horizontal scaling solutions for PostgreSQL (other than Citus Data) that the community recommends or has experience with for managing extremely large datasets? Any pointers or guidance on this would be highly valuable.

Thank you in advance for your time and expertise. We look forward to your valuable insights.
Thanks & Regards,
Ramzy

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

В списке pgsql-admin по дате отправления:

S.No.	DB	Current DB size	Future DB size	Schema Name	Current Schema size	Future Schema size
1	C	1 TB	8 TB - 10 TB	acc	297 GB	3 TB - 4 TB
				po	270 GB	2.6 TB - 3.5 TB
				pa	27 GB	270 GB
				pra	13 GB	130 GB
				fu	13 GB	130 GB
				te	167 MB	2 GB
				pro	30 MB	300 MB
2	M	225 GB	2.2 TB - 3 TB	bi	82 GB	820 GB
				co	80 GB	800 GB
				ps	17 GB	170 GB
				qo	16 GB	160 GB
				to	7 GB	70 GB
				in	7 GB	70 GB
				di	6 GB	60 GB
				no	4 GB	40 GB
				do	4 GB	40 GB
				cl	3 GB	30 GB
3	P	60 GB	600 GB	au	45 GB	450 GB
				fi	8 GB	80 GB
				con	4 GB	40 GB
				ba	1 GB	10 GB
				li	2 MB	20 GB

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation