Re: Re: New server to improve performance on our large and busy DB - advice? (v2)
От | Ing. Marcos L. Ortiz Valmaseda |
---|---|
Тема | Re: Re: New server to improve performance on our large and busy DB - advice? (v2) |
Дата | |
Msg-id | 4B50330E.4020901@uci.cu обсуждение исходный текст |
Ответ на | Re: New server to improve performance on our large and busy DB - advice? (v2) (Ivan Voras <ivoras@freebsd.org>) |
Список | pgsql-performance |
El 15/01/2010 14:43, Ivan Voras escribió: > hi, > > You wrote a lot of information here so let's confirm in a nutshell > what you have and what you are looking for: > > * A database that is of small to medium size (5 - 10 GB)? > * Around 10 clients that perform constant write operations to the > database (UPDATE/INSERT) > * Around 10 clients that occasionally read from the database > * Around 6000 tables in your database > * A problem with tuning it all > * Migration to new hardware and/or OS > > Is this all correct? > > First thing that is noticeable is that you seem to have way too few > drives in the server - not because of disk space required but because > of speed. You didn't say what type of drives you have and you didn't > say what you would consider desirable performance levels, but off hand > (because of the "10 clients perform constant writes" part) you will > probably want at least 2x-4x more drives. > > > 1) Which RAID level would you recommend > > With only 4 drives, RAID 10 is the only thing usable here. > > > 2) Which Windows OS would you recommend? (currently 2008 x64 Server) > > Would not recommend Windows OS. > > > 3) If we were to port to a *NIX flavour, which would you recommend? > (which > > support trouble-free PG builds/makes please!) > > Practically any. I'm biased for FreeBSD, a nice and supported version > of Linux will probably be fine. > > > 4) Is this the right PG version for our needs? > > If you are starting from scratch on a new server, go for the newest > version you can get - 8.4.2 in this case. > > Most importantly, you didn't say what you would consider desirable > performance. The hardware and the setup you described will work, but > not necessarily fast enough. > > > . So far, we have never seen a situation where a seq scan has improved > > performance, which I would attribute to the size of the tables > > ... and to the small number of drives you are using. > > > . We believe our requirements are exceptional, and we would benefit > > immensely from setting up the PG planner to always favour > index-oriented decisions > > Have you tried decreasing random_page_cost in postgresql.conf? Or > setting (as a last resort) enable_seqscan = off? > > > Carlo Stonebanks wrote: >> My client just informed me that new hardware is available for our DB >> server. >> >> . Intel Core 2 Quads Quad >> . 48 GB RAM >> . 4 Disk RAID drive (RAID level TBD) >> >> I have put the ugly details of what we do with our DB below, as well >> as the >> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB >> with very large tables and the server is always busy serving a constant >> stream of single-row UPDATEs and INSERTs from parallel automated >> processes. >> >> There are less than 10 users, as the server is devoted to the KB >> production >> system. >> >> My questions: >> >> 1) Which RAID level would you recommend >> 2) Which Windows OS would you recommend? (currently 2008 x64 Server) >> 3) If we were to port to a *NIX flavour, which would you recommend? >> (which >> support trouble-free PG builds/makes please!) >> 4) Is this the right PG version for our needs? >> >> Thanks, >> >> Carlo >> >> The details of our use: >> >> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the >> professional information of 1.3M individuals. >> . The KB tables related to these 130M individuals are naturally also >> large >> . The DB is in a perpetual state of serving TCL-scripted Extract, >> Transform >> and Load (ETL) processes >> . These ETL processes typically run 10 at-a-time (i.e. in parallel) >> . We would like to run more, but the server appears to be the bottleneck >> . The ETL write processes are 99% single row UPDATEs or INSERTs. >> . There are few, if any DELETEs >> . The ETL source data are "import tables" >> . The import tables are permanently kept in the data warehouse so >> that we >> can trace the original source of any information. >> . There are 6000+ and counting >> . The import tables number from dozens to hundreds of thousands of rows. >> They rarely require more than a pkey index. >> . Linking the KB to the source import date requires an "audit table" >> of 500M >> rows, and counting. >> . The size of the audit table makes it very difficult to manage, >> especially >> if we need to modify the design. >> . Because we query the audit table different ways to audit the ETL >> processes >> decisions, almost every column in the audit table is indexed. >> . The maximum number of physical users is 10 and these users RARELY >> perform >> any kind of write >> . By contrast, the 10+ ETL processes are writing constantly >> . We find that internal stats drift, for whatever reason, causing row >> seq >> scans instead of index scans. >> . So far, we have never seen a situation where a seq scan has improved >> performance, which I would attribute to the size of the tables >> . We believe our requirements are exceptional, and we would benefit >> immensely from setting up the PG planner to always favour index-oriented >> decisions - which seems to contradict everything that PG advice >> suggests as >> best practice. >> >> Current non-default conf settings are: >> >> autovacuum = on >> autovacuum_analyze_scale_factor = 0.1 >> autovacuum_analyze_threshold = 250 >> autovacuum_naptime = 1min >> autovacuum_vacuum_scale_factor = 0.2 >> autovacuum_vacuum_threshold = 500 >> bgwriter_lru_maxpages = 100 >> checkpoint_segments = 64 >> checkpoint_warning = 290 >> datestyle = 'iso, mdy' >> default_text_search_config = 'pg_catalog.english' >> lc_messages = 'C' >> lc_monetary = 'C' >> lc_numeric = 'C' >> lc_time = 'C' >> log_destination = 'stderr' >> log_line_prefix = '%t ' >> logging_collector = on >> maintenance_work_mem = 16MB >> max_connections = 200 >> max_fsm_pages = 204800 >> max_locks_per_transaction = 128 >> port = 5432 >> shared_buffers = 500MB >> vacuum_cost_delay = 100 >> work_mem = 512MB >> >> >> > > I have a question about that, due to all of you recommend RAID-10 for the implementatio of this system. Would you give a available arquitecture based on all these considerations? About the questions, I recommend FreeBSD too for a PostgreSQL production server (and for other things too, not only Pg), but with Linux you can obtain a strong, reliable environment that can be more efficient that Windows. Regards
В списке pgsql-performance по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: Re: New server to improve performance on our large and busy DB - advice? (v2)