Re: XLogInsert scaling, revisited

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: XLogInsert scaling, revisited
Дата
Msg-id 20130627122758.GG1254@alap2.anarazel.de
обсуждение исходный текст
Ответ на Re: XLogInsert scaling, revisited  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: XLogInsert scaling, revisited  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
Hi,

On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote:
> >* Could you document the way slots prevent checkpoints from occurring
> >   when XLogInsert rechecks for full page writes? I think it's correct -
> >   but not very obvious on a glance.
>
> There's this in the comment near the top of the file:
>
>  * To update RedoRecPtr or fullPageWrites, one has to make sure that all
>  * subsequent inserters see the new value. This is done by reserving all the
>  * insertion slots before changing the value. XLogInsert reads RedoRecPtr
> and
>  * fullPageWrites after grabbing a slot, so by holding all the slots
>  * simultaneously, you can ensure that all subsequent inserts see the new
>  * value.  Those fields change very seldom, so we prefer to be fast and
>  * non-contended when they need to be read, and slow when they're changed.
>
> Does that explain it well enough? XLogInsert holds onto a slot while it
> rechecks for full page writes.

Yes. Earlieron that was explained in XLogInsert() - maybe point to the
documentation ontop of the file? The file is too big to expect everyone
to reread the whole file in a new release...

> >* The read of Insert->RedoRecPtr while rechecking whether it's out of
> >   date now is unlocked, is that correct? And why?

> Same here, XLogInsert holds the slot while rechecking Insert->RedoRecptr.

I was wondering whether its guaranteed that we don't read a cached
value, but I didn't think of the memory barriers due to the spinlock in
Release/AcquireSlot. I think I thought that release wouldn't acquire the
spinlock at all.

> >* Have you measured whether it works to just keep as many slots as
> >   max_backends requires around and not bothering with dynamically
> >   allocating them to inserters?
> >   That seems to require to keep actually waiting slots in a separate
> >   list which very well might make that too expensive.
> >
> >   Correctness wise the biggest problem for that probably is exlusive
> >   acquiration of all slots CreateCheckpoint() does...
>
> Hmm. It wouldn't be much different, each backend would still need to reserve
> its own dedicated slot, because it might be held by the a backend that
> grabbed all the slots. Also, bgwriter and checkpointer need to flush the
> WAL, so they'd need slots too.
>
> More slots make WaitXLogInsertionsToFinish() more expensive, as it has to
> loop through all of them. IIRC some earlier pgbench tests I ran didn't show
> much difference in performance, whether there were 40 slots or 100, as long
> as there was enough of them. I can run some more tests with even more slots
> to see how it behaves.

In a very quick test I ran previously I did see the slot acquiration in
the profile when using short connections that all did some quick inserts
- which isn't surprising since they tend to all start with the same
slot so they will all fight for the first slots.

Maybe we could ammeliorate that by initializing MySlotNo to
(MyProc->pgprocno % num_xloginsert_slots) when uninitialized? That won't
be completely fair since the number of procs won't usually be a multiple
of num_insert_slots, but I doubt that will be problematic.

> >* What about using some sort of linked list of unused slots for
> >   WALInsertSlotAcquireOne? Everytime we're done we put it to the *end*
> >   of the list so it's less likely to have been grabbed by somebody else
> >   so we can reuse it.
> >   a) To grab a new one go to the head of the linked list spinlock it and
> >   recheck whether it's still free. If not, restart. Otherwise, remove
> >   from list.
> >   b) To reuse a previously used slot
> >
> >   That way we only have to do the PGSemaphoreLock() dance if there
> >   really aren't any free slots.
>
> That adds a spinlock acquisition / release into the critical path, to
> protect the linked list. I'm trying very hard to avoid serialization points
> like that.

Hm. We already have at least one spinlock in that path? Or are you
thinking of a global one protecting the linked list? If so, I think we
should be able to get away with locking individual slots.
IIRC if you never need to delete list elements you can even get away
with a lockless implementation.

> While profiling the tests I've done, finding a free slot hasn't shown up
> much. So I don't think it's a problem performance-wise as it is, and I don't
> think it would make the code simpler.

It sure wouldn't make it simpler. As I said above, I saw the slot
acquiration in a profile when using -C and a short pgbench script (that
just inserts 10 rows).

> >* The queuing logic around slots seems to lack documentation. It's
> >   complex enough to warrant that imo.
>
> Yeah, it's complex. I expanded the comments above XLogInsertSlot, to explain
> how xlogInsertingAt works. Did that help, or was it some other part of that
> that needs more docs?

What I don't understand so far is why we don't reset xlogInsertingAt
during SlotReleaseOne. There are places documenting that we don't do so,
but not why unless I missed it.

Do we do this only to have some plausible value for a backend that been
acquired but hasn't copied data yet? If so, why isn't it sufficient to
initialize it in ReserveXLogInsertLocation?

> >* Is it correct that WALInsertSlotAcquireOne() resets xlogInsertingAt of
> >   all slots *before* it has acquired locks in all of them? If yes, why
> >   haven't we already reset it to something invalid?

> I didn't understand this part. Can you elaborate?

It has to do with my lack of understanding of the above. If there's a
reason not to reset xlogInsertingAt during ReleaseOne() we might have a
problem.
Consider a scenario with num_xloginsert_slots = 2. Slot 0 has been
released but still has xlogInsertingAt = 0/ff00100. Slot 1 is unused.
1) checkpointer: WALInsertSlotAcquire(exclusive)
2) checkpointer: WALInsertSlotAcquireOne(0), clears xlogInsertingAt
3) WaitXLogInsertionsToFinish() checks slot 0's xlogInsertingAt: InvalidXLogRecPtr
4) WaitXLogInsertionsToFinish() checks slot 1's xlogInsertingAt: InvalidXLogRecPtr
2) checkpointer: WALInsertSlotAcquireOne(1), sets xlogInsertingAt = 1

In this case we WaitXLogInsertionsToFinish() could run without getting
blocked and report that everything up to Insert->CurrBytePos has been
finished since all slots have xlogInsertingAt = InvalidXLogRecPtr.

> >* Is GetXLogBuffer()'s unlocked read correct without a read barrier?
>
> Hmm. I thought that it was safe because GetXLogBuffer() handles the case
> that you get a "torn read" of the 64-bit XLogRecPtr value. But now that I
> think of it, I wonder if it's possible for reads/writes to be reordered so
> that AdvanceXLInsertBuffer() overwrites WAL data that another backend has
> copied onto a page. Something like this:

Yea, I am not so much worried about a torn value, but about an out of
date one that looks valid. A barrier sounds good.

> >* XLogBytePosToEndRecPtr() seems to have a confusing name to me. At
> >   least the comment needs to better explain that it's named that way
> >   because of the way it's used.
>
> Ok, added a sentence on that. Let me know if that helped or if you have
> better suggestions.

Yes, that's better.

> >   Also, doesn't
> >seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
> >   potentially point into the middle of a page?
>
> Yes. seg_offset is the byte offset from the beginning of the segment, it's
> supposed to point in the middle of the page.

Misunderstood something...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: Developer meeting photos
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Move unused buffers to freelist