Re: How does PG know if data is in memory?

Поиск
Список
Период
Сортировка
От Fabrício dos Anjos Silva
Тема Re: How does PG know if data is in memory?
Дата
Msg-id AANLkTinrtVbjtadteR88xWGGYovAzhgOWR1=pXLDi08M@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How does PG know if data is in memory?  (Samuel Gendler <sgendler@ideasculptor.com>)
Ответы Re: How does PG know if data is in memory?  (Craig Ringer <craig@postnewspapers.com.au>)
Re: How does PG know if data is in memory?  (Greg Smith <greg@2ndquadrant.com>)
Список pgsql-performance

   Thank you all for the replies.

   If PG does not know whether needed data is in memory, how does it estimate cost? There is a huge difference between access time in memory and in secondary storage. Not taking this into account results in almost "useless" estimates. I am not saying that PG does a pour job, but I've been using it for 4 years and from time to time I notice very pour estimates. After some testing 2 years ago, the only configuration I could manage to use was to tell PG to avoid Seq Scan and Index Scans. I know that in many situations these techniques are the best to choose, but when they are chosen where they are not suitable, I get very bad plans.
   Recently, I faced poor performance again, but this time because we started to work with larger tables (10M rows). This encourage me to study PG tuning again, trying to understand how the planner works and trying to get the best of it. Unfortunately, it does not seem to be an easy task.
   If someone could point good books about PG tuning, I would appreciate that. I found some yet to be released books about PG 9. Any comments about them?

   Thank you all.

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.



Em 29 de setembro de 2010 14:08, Samuel Gendler <sgendler@ideasculptor.com> escreveu:


2010/9/29 Fabrício dos Anjos Silva <fabricio.silva@linkcom.com.br>



   When setting seq_page_cost and random_page_cost, do I have to consider the probability that data will be in memory? Or does seq_page_cost mean "sequential access on disk" and random_page_cost mean "random access on disk"?


The reason seq_page_cost and random_page_cost exist as parameters is so that you can inform the optimizer what the relative costs of those actions are, which is directly related to the expected size of the filesystem cache, ratio of total db size to available cache memory, and the performance of your disk i/o subsystems (and any other disk-related work the host may be doing).  effective_cache_size allows you to tell postgres how big you believe all available cache memory is - shared_buffers and OS cache.

As to your question about increasing shared_buffers to be some significant proportion of available RAM - apparently, that is not a good idea.  I've seen advice that said you shouldn't go above 8GB for shared_buffers and I've also seen 12GB suggested as an upper limit, too.  On my host with 48GB of RAM, I didn't see much difference between 8GB and 12GB on a fairly wide variety of tests, so mine is set at 8GB with an efective_cache_size of 36GB.
 
   I appreciate if someone could clear this out.

   Thanks!

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.



В списке pgsql-performance по дате отправления:

Предыдущее
От: Jesper Krogh
Дата:
Сообщение: Re: gist indexes for distance calculations
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: How does PG know if data is in memory?