Re: Wait free LW_SHARED acquisition - v0.9

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Wait free LW_SHARED acquisition - v0.9
Дата
Msg-id 20141014135843.GF9267@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Wait free LW_SHARED acquisition - v0.9  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: Wait free LW_SHARED acquisition - v0.9
Список pgsql-hackers
On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote:
> On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-10-10 16:41:39 +0200, Andres Freund wrote:
> >> FWIW, the profile always looks like
> >> -  48.61%      postgres  postgres              [.] s_lock
> >>    - s_lock
> >>       + 96.67% StrategyGetBuffer
> >>       + 1.19% UnpinBuffer
> >>       + 0.90% PinBuffer
> >>       + 0.70% hash_search_with_hash_value
> >> +   3.11%      postgres  postgres              [.] GetSnapshotData
> >> +   2.47%      postgres  postgres              [.] StrategyGetBuffer
> >> +   1.93%      postgres  [kernel.kallsyms]     [k] copy_user_generic_string
> >> +   1.28%      postgres  postgres              [.] hash_search_with_hash_value
> >> -   1.27%      postgres  postgres              [.] LWLockAttemptLock
> >>    - LWLockAttemptLock
> >>       - 97.78% LWLockAcquire
> >>          + 38.76% ReadBuffer_common
> >>          + 28.62% _bt_getbuf
> >>          + 8.59% _bt_relandgetbuf
> >>          + 6.25% GetSnapshotData
> >>          + 5.93% VirtualXactLockTableInsert
> >>          + 3.95% VirtualXactLockTableCleanup
> >>          + 2.35% index_fetch_heap
> >>          + 1.66% StartBufferIO
> >>          + 1.56% LockReleaseAll
> >>          + 1.55% _bt_next
> >>          + 0.78% LockAcquireExtended
> >>       + 1.47% _bt_next
> >>       + 0.75% _bt_relandgetbuf
> >>
> >> to me. Now that's with the client count 496, but it's similar with lower
> >> counts.
> >>
> >> BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
> >> smarter.
> >
> > Which is nearly trivial now that atomics are in. Check out the attached
> > WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
> > there's buffers on the freelist.
> 
> Is this safe?
> 
> + victim = pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1);
> 
> - if (++StrategyControl->nextVictimBuffer >= NBuffers)
> + buf = &BufferDescriptors[victim % NBuffers];
> +
> + if (victim % NBuffers == 0)
> <snip>
> 
> I don't think there's any guarantee you could keep nextVictimBuffer
> from wandering off the end.

Not sure what you mean? It'll cycle around at 2^32. The code doesn't try
to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning
&BufferDescriptors I'm doing % NBuffers.

Yes, that'll have the disadvantage of the first buffers being slightly
more likely to be hit, but for that to be relevant you'd need
unrealistically large amounts of shared_buffers.

> You could buy a little safety by CAS'ing
> zero in instead, followed by atomic increment.  However that's still
> pretty dodgy IMO and requires two atomic ops which will underperform
> the spin in some situations.
> 
> I like Robert's idea to keep the spinlock but preallocate a small
> amount of buffers, say 8 or 16.

I think that's pretty much orthogonal. I *do* think it makes sense to
increment nextVictimBuffer in bigger steps. But the above doesn't
prohibit doing so - and it'll still be be much better without the
spinlock. Same number of atomics, but no potential of spinning and no
potential of being put to sleep while holding the spinlock.

This callsite has a comparatively large likelihood of being put to sleep
while holding a spinlock because it can run for a very long time doing
nothing but spinlocking.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Wait free LW_SHARED acquisition - v0.9
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Drop any statistics of table after it's truncated