Re: bg worker: patch 1 of 6 - permanent process

Поиск
Список
Период
Сортировка
От Markus Wanner
Тема Re: bg worker: patch 1 of 6 - permanent process
Дата
Msg-id 4C7B747A.8090502@bluegap.ch
обсуждение исходный текст
Ответ на Re: bg worker: patch 1 of 6 - permanent process  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
(Sorry, need to disable Ctrl-Return, which quite often sends mails 
earlier than I really want.. continuing my mail)

On 08/27/2010 10:46 PM, Robert Haas wrote:
> Yeah, probably.  I think designing something that works efficiently
> over a network is a somewhat different problem than designing
> something that works on an individual node, and we probably shouldn't
> let the designs influence each other too much.

Agreed. Thus I've left out any kind of congestion avoidance stuff from 
imessages so far.

>>> There's no padding or sophisticated allocation needed.  You
>>> just need a pointer to the last byte read (P1), the last byte allowed
>>> to be read (P2), and the last byte allocated (P3).  Writers take a
>>> spinlock, advance P3, release the spinlock, write the message, take
>>> the spinlock, advance P2, release the spinlock, and signal the reader.
>>
>> That would block parallel writers (i.e. only one process can write to the
>> queue at any time).
>
> I feel like there's probably some variant of this idea that works
> around that problem.  The problem is that when a worker finishes
> writing a message, he needs to know whether to advance P2 only over
> his own message or also over some subsequent message that has been
> fully written in the meantime.  I don't know exactly how to solve that
> problem off the top of my head, but it seems like it might be
> possible.

I've tried pretty much that before. And failed. Because the 
allocation-order (i.e. the time the message gets created in preparation 
for writing to it) isn't necessarily the same as the sending-order (i.e. 
when the process has finished writing and decides to send the message).

To satisfy the FIFO property WRT the sending order, you need to decouple 
allocation form the ordering (i.e. queuing logic).

(And yes, it has taken me a while to figure out what's wrong in 
Postgres-R, before I've even noticed about that design bug).

>>> Readers take the spinlock, read P1 and P2, release the spinlock, read
>>> the data, take the spinlock, advance P1, and release the spinlock.
>>
>> It would require copying data in case a process only needs to forward the
>> message. That's a quick pointer dequeue and enqueue exercise ATM.
>
> If we need to do that, that's a compelling argument for having a
> single messaging area rather than one per backend.

Absolutely, yes.

> But I'm not sure I
> see why we would need that sort of capability.  Why wouldn't you just
> arrange for the sender to deliver the message directly to the final
> recipient?

A process can read and even change the data of the message before 
forwarding it. Something the coordinator in Postgres-R does sometimes. 
(As it is the interface to the GCS and thus to the rest of the nodes in 
the cluster).

For parallel querying (on a single node) that's probably less important 
a feature.

> So, they know in advance how large the message will be but not what
> the contents will be?  What are they doing?

Filling the message until it's (mostly) full and then continue with the 
next one. At least that's how the streaming approach on top of imessages 
works.

But yes, it's somewhat annoying to have to know the message size in 
advance. I didn't implement realloc so far. Nor can I think of any other 
solution. Note that separation of allocation and queue ordering is 
required anyway for the above reasons.

> Well, the fact that something is commonly used doesn't mean it's right
> for us.  Tabula raza, we might design the whole system differently,
> but changing it now is not to be undertaken lightly.  Hopefully the
> above comments shed some light on my concerns.  In short, (1) I don't
> want to preallocate a big chunk of memory we might not use,

Isn't that's exactly what we do now for lots of sub-systems, and what 
I'd like to improve (i.e. reduce to a single big chunk).

> (2) I fear
> reducing the overall robustness of the system, and

Well, that applies to pretty much every new feature you add.

> (3) I'm uncertain
> what other systems would be able leverage a dynamic allocator of the
> sort you propose.

Okay, that's up to me to show evidences (or at least a PoC).

Regards

Markus Wanner


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Markus Wanner
Дата:
Сообщение: Re: bg worker: patch 1 of 6 - permanent process
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: pg_subtrans keeps bloating up in the standby