Re: Scaling shared buffer eviction

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Scaling shared buffer eviction
Дата
Msg-id CAA4eK1KVMCKPVKkQDcJAw07w1yum_NHggq4hWVT5dR7iwRzu5A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Scaling shared buffer eviction  (Gregory Smith <gregsmithpgsql@gmail.com>)
Список pgsql-hackers
On Mon, Sep 22, 2014 at 10:43 AM, Gregory Smith <gregsmithpgsql@gmail.com> wrote:
On 9/16/14, 8:18 AM, Amit Kapila wrote:
I think the main reason for slight difference is that
when the size of shared buffers is almost same as data size, the number
of buffers it needs from clock sweep are very less, as an example in first
case (when size of shared buffers is 12286MB), it actually needs at most
256 additional buffers (2MB) via clock sweep, where as bgreclaimer
will put 2000 (high water mark) additional buffers (0.5% of shared buffers
is greater than 2000 ) in free list, so bgreclaimer does some extra work
when it is not required
This is exactly what I was warning about, as the sort of lesson learned from the last round of such tuning.  There are going to be spots where trying to tune the code to be aggressive on the hard cases will work great.  But you need to make that dynamic to some degree, such that the code doesn't waste a lot of time sweeping buffers when the demand for them is actually weak.  That will make all sorts of cases that look like this slower.

To verify whether above can lead to any kind of regression, I have
checked the cases (workload is 0.05 or 0.1 percent larger than shared
buffers) where we need few extra buffers and bgreclaimer might put
some additional buffers and it turns out that in those cases also, there
is a win especially at high concurrency and results of the same are posted
upthread
 
We should be able to tell these apart if there's enough instrumentation and solid logic inside of the program itself though.  The 8.3 era BGW coped with a lot of these issues using a particular style of moving average with fast reaction time, plus instrumenting the buffer allocation rate as accurately as it could. So before getting into high/low water note questions, are you comfortable that there's a clear, accurate number that measures the activity level that's important here?


Very Good Question.  This was exactly the thing which was
missing in my initial versions (about 2 years back when I tried to
solve this problem) but based on Robert's and Andres's feedback
I realized that we need an accurate number to measure the activity
level (in this case it is consumption of buffers from freelist), so
I have introduced the logic to calculate the same (it is stored in new
variable numFreeListBuffers in BufferStrategyControl structure).

 
And have you considered ways it might be averaging over time or have a history that's analyzed?


The current logic of bgreclaimer is such that even if it does
some extra activity (extra is very much controlled) in one cycle,
it will not start another cycle unless backends consume all the
buffers that were made available by bgreclaimer in one cycle.
I think the algorithm designed for bgreclaimer automatically
averages out based on activity.  Do you see any cases where it
will not do so? 
 
The exact fast approach / slow decay weighted moving average approach of the 8.3 BGW, the thing that tried to smooth the erratic data set possible here, was a pretty critical part of getting itself auto-tuning to workload size.  It ended up being much more important than the work of setting the arbitrary watermark levels.


Agreed, but the logic with which bgwriter works is pretty different
and thats why it needs different kind of logic to handle auto-tuning.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Documentation fix for pg_recvlogical's --create mode
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Index scan optimization