Re: Proposal: PqSendBuffer removal

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Proposal: PqSendBuffer removal
Дата
Msg-id 20200309202624.3tr5m5dt2vvqhfih@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Proposal: PqSendBuffer removal  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Proposal: PqSendBuffer removal  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2020-03-07 13:54:57 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > What I'm thinking is that we'd have pg_beginmessage() (potentially a
> > different named version) initialize the relevant StringInfo basically as
> 
> > (StringInfoData){
> >     .data = PqSendPointer,
> >     .len = 0,
> >     .alloc_offset = PqSendBuffer - PqSendBuffer
> > }
> 
> This seems way overcomplicated compared to what I suggested (ie,
> just let internal_putbytes bypass the buffer in cases where the
> data would get flushed next time through its loop anyway).

Well, we quite frequently send out multiple messages in a row, without a
flush inbetween. It'd be nice if we could avoid both copying buffers for
each message, as well as allocating a new stringinfo.

We've reduced the number of wholesale stringinfo reallocations with
pq_beginmessage_reuse(), which is e.g. significant when actually
returning tuples, and that was a noticable performance improvement.

I don't believe that the copy is a performance relevant factor solely
for messages that are individually too large to fit in the send
buffer. For one, there'll often be some pending send data from a
previous "round", which'd mean we'd need to call send() more often, or
use vectorized IO (i.e. switch to writev()). But also,


> What you're suggesting would be a lot more invasive and restrictive
> --- for example, why is it a good idea to have a hard-wired
> assumption that we can't build more than one message at once?

Well, we don't seem to have many (any?) places where that's not the
case. And having to use only one layer of buffering for outgoing data
does seem advantageous to me.  It'd not be hard to fall back to a
separate buffer just for the cases where there are multiple messages
built concurrently, if we want to support that.


> I'm also concerned that trying to do too much optimization here will
> break one of the goals of the existing code, which is to not get into
> a situation where an OOM failure causes a wire protocol violation
> because we've already sent part of a message but are no longer able to
> send the rest of it.  To ensure that doesn't happen, we must construct
> the whole message before we start to send it, and we can't let
> buffering of the last message be too entwined with construction of the
> next one.  Between that and the (desirable) arms-length separation
> between datatype I/O functions and the actual I/O, a certain amount of
> data copying seems unavoidable.

Sure. But I don't see why that requires two levels of buffering for
messages? If we were to build the message in the output buffer, resizing
as needed, we can send the data once the message is complete, or not at
all.

I don't think anything on the datatype I/O level would be affected?

While I think it'd be quite desirable to avoid e.g. the separate
stringinfo allocation for send functions, I think that's quite a
separate project. One which I have no really good idea to tackle.

Greetings,

Andres Freund


[1] Since I had looked it up:

We do a separate message for each of:
1) result description
2) each result row
3) ReadyForQuery

And we separately call through PQcommMethods for things like
pq_putemptymessage() and uses of pq_putmessage() not going through
pq_endmessage. The former is called a lot, especially when using the
extended query protocol (which we want clients to use!).


For a SELECT 1 in the simple protocol we end up calling putmessage via:
1) SendRowDescriptionMessage
2) printtup()
3) EndCommand()
4) ReadyForQuery()

For extended:
1) exec_parse_message()
2) exec_bind_message()
3) exec_describe_portal_message()
4) printtup()
5) EndCommand()
6) ReadyForQuery()



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace onthe fly
Следующее
От: David Rowley
Дата:
Сообщение: Re: Index Skip Scan