Re: mosbench revisited

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: mosbench revisited
Дата
Msg-id CA+TgmobWi_tFQAFX13VryaW3ZoSxRxVQOebOPLb0SGNEeLZhuw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: mosbench revisited  (Jim Nasby <jim@nasby.net>)
Список pgsql-hackers
On Wed, Aug 3, 2011 at 6:21 PM, Jim Nasby <jim@nasby.net> wrote:
> On Aug 3, 2011, at 1:21 PM, Robert Haas wrote:
>> 1. "We configure PostgreSQL to use a 2 Gbyte application-level cache
>> because PostgreSQL protects its free-list with a single lock and thus
>> scales poorly with smaller caches."  This is a complaint about
>> BufFreeList lock which, in fact, I've seen as a huge point of
>> contention on some workloads.  In fact, on read-only workloads, with
>> my lazy vxid lock patch applied, this is, I believe, the only
>> remaining unpartitioned LWLock that is ever taken in exclusive mode;
>> or at least the only one that's taken anywhere near often enough to
>> matter.  I think we're going to do something about this, although I
>> don't have a specific idea in mind at the moment.
>
> This has been discussed before: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01406.php (which itself
references2 other threads). 
>
> The basic idea is: have a background process that proactively moves buffers onto the free list so that backends
shouldnormally never have to run the clock sweep (which is rather expensive). The challenge there is figuring out how
toget stuff onto the free list with minimal locking impact. I think one possible option would be to put the freelist
underit's own lock (IIRC we currently use it to protect the clock sweep as well). Of course, that still means the free
listlock could be a point of contention, but presumably it's far faster to add or remove something from the list than
itis to run the clock sweep. 

Based on recent benchmarking, I'm going to say "no".  It doesn't seem
to matter how short you make the critical section: a single
program-wide mutex is a loser.  Furthermore, the "free list" is a
joke, because it's nearly always going to be completely empty.  We
could probably just rip that out and use the clock sweep and never
miss it, but I doubt it would improve performance much.

I think what we probably need to do is have multiple clock sweeps in
progress at the same time.  So, for example, if you have 8GB of
shared_buffers, you might have 8 mutexes, one for each GB.  When a
process wants a buffer, it locks one of the mutexes and sweeps through
that 1GB partition.  If it finds a buffer before returning to the
point at which it started the scan, it's done.  Otherwise, it releases
its mutex, grabs the next one, and continues on until it finds a free
buffer.

The trick with any modification in this area is that pretty much any
degree of increased parallelism is potentially going to reduce the
quality of buffer replacement to some degree. So the trick will be to
try to squeeze out as much concurrency as possible while minimizing
degradation in the quality of buffer replacements.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Compressing the AFTER TRIGGER queue
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Further news on Clang - spurious warnings