Re: bg worker: patch 1 of 6 - permanent process
От | Markus Wanner |
---|---|
Тема | Re: bg worker: patch 1 of 6 - permanent process |
Дата | |
Msg-id | 4C7B747A.8090502@bluegap.ch обсуждение исходный текст |
Ответ на | Re: bg worker: patch 1 of 6 - permanent process (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
(Sorry, need to disable Ctrl-Return, which quite often sends mails earlier than I really want.. continuing my mail) On 08/27/2010 10:46 PM, Robert Haas wrote: > Yeah, probably. I think designing something that works efficiently > over a network is a somewhat different problem than designing > something that works on an individual node, and we probably shouldn't > let the designs influence each other too much. Agreed. Thus I've left out any kind of congestion avoidance stuff from imessages so far. >>> There's no padding or sophisticated allocation needed. You >>> just need a pointer to the last byte read (P1), the last byte allowed >>> to be read (P2), and the last byte allocated (P3). Writers take a >>> spinlock, advance P3, release the spinlock, write the message, take >>> the spinlock, advance P2, release the spinlock, and signal the reader. >> >> That would block parallel writers (i.e. only one process can write to the >> queue at any time). > > I feel like there's probably some variant of this idea that works > around that problem. The problem is that when a worker finishes > writing a message, he needs to know whether to advance P2 only over > his own message or also over some subsequent message that has been > fully written in the meantime. I don't know exactly how to solve that > problem off the top of my head, but it seems like it might be > possible. I've tried pretty much that before. And failed. Because the allocation-order (i.e. the time the message gets created in preparation for writing to it) isn't necessarily the same as the sending-order (i.e. when the process has finished writing and decides to send the message). To satisfy the FIFO property WRT the sending order, you need to decouple allocation form the ordering (i.e. queuing logic). (And yes, it has taken me a while to figure out what's wrong in Postgres-R, before I've even noticed about that design bug). >>> Readers take the spinlock, read P1 and P2, release the spinlock, read >>> the data, take the spinlock, advance P1, and release the spinlock. >> >> It would require copying data in case a process only needs to forward the >> message. That's a quick pointer dequeue and enqueue exercise ATM. > > If we need to do that, that's a compelling argument for having a > single messaging area rather than one per backend. Absolutely, yes. > But I'm not sure I > see why we would need that sort of capability. Why wouldn't you just > arrange for the sender to deliver the message directly to the final > recipient? A process can read and even change the data of the message before forwarding it. Something the coordinator in Postgres-R does sometimes. (As it is the interface to the GCS and thus to the rest of the nodes in the cluster). For parallel querying (on a single node) that's probably less important a feature. > So, they know in advance how large the message will be but not what > the contents will be? What are they doing? Filling the message until it's (mostly) full and then continue with the next one. At least that's how the streaming approach on top of imessages works. But yes, it's somewhat annoying to have to know the message size in advance. I didn't implement realloc so far. Nor can I think of any other solution. Note that separation of allocation and queue ordering is required anyway for the above reasons. > Well, the fact that something is commonly used doesn't mean it's right > for us. Tabula raza, we might design the whole system differently, > but changing it now is not to be undertaken lightly. Hopefully the > above comments shed some light on my concerns. In short, (1) I don't > want to preallocate a big chunk of memory we might not use, Isn't that's exactly what we do now for lots of sub-systems, and what I'd like to improve (i.e. reduce to a single big chunk). > (2) I fear > reducing the overall robustness of the system, and Well, that applies to pretty much every new feature you add. > (3) I'm uncertain > what other systems would be able leverage a dynamic allocator of the > sort you propose. Okay, that's up to me to show evidences (or at least a PoC). Regards Markus Wanner
В списке pgsql-hackers по дате отправления: