Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)

Поиск
Список
Период
Сортировка
От Pietro Pugni
Тема Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)
Дата
Msg-id EF209A99-74D5-41FD-875B-4D7952D95147@gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance
Thank you a lot for your suggestions.

Random points/suggestions:
*) mdadm is the way to go.   I think you'll get bandwidth constrained on most modern hba unless they are really crappy.  On reasonably modern hardware storage is rarely the bottleneck anymore (which is a great place to be).  Fancy raid controllers may actually hurt performance -- they are obsolete IMNSHO.

*) Small point, but you'll want to crank effective_io_concurrency (see: https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com). It only affects certain kinds of queries, but when it works it really works.    Those benchmarks were done on my crapbox dell workstation!

This is really impressive, thank you for sharing it.

*) For very high transaction rates, you can get a lot of benefit from disabling synchronous_commit if you are willing to accommodate the risk.  I do not recommend disabling fsync unless you are prepared to regenerate the entire database at any time.

*) Don't assume indefinite linear scaling as you increase storage capacity -- the database itself can become the bottleneck, especially for writing.   To improve write performance, classic optimization strategies of trying to intelligently bundle writes around units of work still apply.  If you are expecting high rates of write activity your engineering focus needs to be here for sure (read scaling is comparatively pretty easy).

What do you mean with “units of work”?


*) I would start doing your benchmarking with pgbench since that is going to most closely reflect measured production performance.  

My final benchmark will be my application, it’s quite articulated and does also query parallelization using a custom splitter, so it will be hard to reproduce it using pgbench. At the moment I was just figuring out why my SSD weren’t performing as expected with comparable benchmarks found on hardware review websites (fio with 4k and 8k workloads). 


If this is the case your stack performance is going to be based on data structure design.  Make liberal use of:
*) natural keys 
*) constraint exclusion for partition selection
*) BRIN index is amazing (if you can work into it's limitations)
*) partial indexing
*) covering indexes.  Don't forget to vacuum your partitions before you make them live if you use them

The data definition is optimized for Postgres yet, but didn’t know about covering indexes. I read about BRIN but never tried them. Will do some testing.


If your data is going to get really big and/or query activity is expected to be high, keep an eye on your scale out strategy.   Going monolithic to bootstrap your app is the right choice IMO but start thinking about the longer term if you are expecting growth.  I'm starting to come out to the perspective that lift/shift scaleout using postgres fdw without an insane amount of app retooling could be a viable option by postgres 11/12 or so.  For my part I scaled out over asynchronous dblink which is a much more maintenance heavy strategy (but works fabulous although I which you could asynchronously connect).

Thank you for your hints 

 Pietro Pugni

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Wes Vaske (wvaske)"
Дата:
Сообщение: Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madamRAID10)
Следующее
От: Pietro Pugni
Дата:
Сообщение: Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madam RAID10)