Re: Postgres configuration for 64 CPUs, 128 GB RAM...
От | Marc Mamin |
---|---|
Тема | Re: Postgres configuration for 64 CPUs, 128 GB RAM... |
Дата | |
Msg-id | CA896D7906BF224F8A6D74A1B7E54AB301750B32@JENMAIL01.ad.intershop.net обсуждение исходный текст |
Ответ на | Re: Postgres configuration for 64 CPUs, 128 GB RAM... (PFC <lists@peufeu.com>) |
Ответы |
Re: Postgres configuration for 64 CPUs, 128 GB RAM...
("Luke Lonergan" <llonergan@greenplum.com>)
|
Список | pgsql-performance |
Hello, thank you for all your comments and recommendations. I'm aware that the conditions for this benchmark are not ideal, mostly due to the lack of time to prepare it. We will also need an additional benchmark on a less powerful - more realistic - server to better understand the scability of our application. Our application is based on java and is generating dynamic reports from log files content. Dynamic means here that a repor will be calculated from the postgres data the first time it is requested (it will then be cached). Java is used to drive the data preparation and to handle/generate the reports requests. This is much more an OLAP system then an OLTP, at least for our performance concern. Data preparation: 1) parsing the log files with a heavy use of perl (regular expressions) to generate csv files. Prepared statements also maintain reference tables in the DB. Postgres performance is not an issue for this first step. 2) loading the csv files with COPY. As around 70% of the data to load come in a single daily table, we don't allow concurrent jobs for this step. We have between a few and a few hundreds files to load into a single table; they are processed one after the other. A primary key is always defined; for the case when the required indexes are alreay built and when the new data are above a given size, we are using a "shadow" table instead (without the indexes) , build the index after the import and then replace the live table with the shadow one. For example, we a have a table of 13 GB + 11 GB indexes (5 pieces). Performances : a) is there an "ideal" size to consider for our csv files (100 x 10 MB or better 1 x 1GB ?) b) maintenance_work_mem: I'll use around 1 GB as recommended by Stefan 3) Data agggregation. This is the heaviest part for Postgres. On our current system some queries need above one hour, with phases of around 100% cpu use, alterning with times of heavy i/o load when temporary results are written/read to the plate (pgsql_tmp). During the aggregation, other postgres activities are low (at least should be) as this should take place at night. Currently we have a locking mechanism to avoid having more than one of such queries running concurently. This may be to strict for the benchmark server but better reflect our current hardware capabilities. Performances : Here we should favorise a single huge transaction and consider a low probability to have another transaction requiring large sort space. Considering this, is it reasonable to define work_mem being 3GB (I guess I should raise this parameter dynamically before running the aggregation queries) 4) Queries (report generation) We have only few requests which are not satisfying while requiring large sort operations. The data are structured in different aggregation levels (minutes, hours, days) with logical time based partitions in oder to limit the data size to compute for a given report. Moreover we can scale our infrastrucure while using different or dedicated Postgres servers for different customers. Smaller customers may share a same instance, each of them having its own schema (The lock mechanism for large aggregations apply to a whole Postgres instance, not to a single customer) . The benchmark will help us to plan such distribution. During the benchmark, we will probably not have more than 50 not idle connections simultaneously. It is a bit too early for us to fine tune this part. The benchmark will mainly focus on the steps 1 to 3 During the benchmark, the Db will reach a size of about 400 GB, simulating 3 different customers, also with data quite equally splitted in 3 scheemas. I will post our configuration(s) later on. Thanks again for all your valuable input. Marc Mamin
В списке pgsql-performance по дате отправления: