Re: Libpq async issues

Поиск
Список
Период
Сортировка
От Alfred Perlstein
Тема Re: Libpq async issues
Дата
Msg-id 20010124084720.T26076@fw.wintelcom.net
обсуждение исходный текст
Ответ на Libpq async issues  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
* Bruce Momjian <pgman@candle.pha.pa.us> [010124 07:58] wrote:
> 
> I have added this email to TODO.detail and a mention in the TODO list.

The bug mentioned here is long gone, however the problem with
issuing non-blocking COPY commands is still present (8k limit on
buffer size).  I hope to get to fix this sometime soon, but you
shouldn't worry about the "normal" path.

There's also a bug with PQCopyEnd(sp?) where it can still block because
it automagically calls into a routine that select()'s waiting for data.

It's on my TODO list as well, but a little behind a several thousand
line server I'm almost complete with.

-Alfred

> 
> > >> Um, I didn't have any trouble at all reproducing Patrick's complaint.
> > >> pg_dump any moderately large table (I used tenk1 from the regress
> > >> database) and try to load the script with psql.  Kaboom.
> > 
> > > This is after or before my latest patch?
> > 
> > Before.  I haven't updated since yesterday...
> > 
> > > I can't seem to reproduce this problem,
> > 
> > Odd.  Maybe there is something different about the kernel's timing of
> > message sending on your platform.  I see it very easily on HPUX 10.20,
> > and Patrick sees it very easily on whatever he's using (netbsd I think).
> > You might try varying the situation a little, say
> >     psql mydb <dumpfile
> >     psql -f dumpfile mydb
> >     psql mydb
> >         \i dumpfile
> > and the same with -h localhost (to get a TCP/IP connection instead of
> > Unix domain).  At the moment (pre-patch) I see failures with the
> > first two of these, but not with the \i method.  -h doesn't seem to
> > matter for me, but it might for you.
> > 
> > > Telling me something is wrong without giving suggestions on how
> > > to fix it, nor direct pointers to where it fails doesn't help me
> > > one bit.  You're not offering constructive critism, you're not
> > > even offering valid critism, you're just waving your finger at
> > > "problems" that you say exist but don't pin down to anything specific.
> > 
> > I have been explaining it as clearly as I could.  Let's try it
> > one more time.
> > 
> > > I spent hours looking over what I did to pqFlush and pqPutnBytes
> > > because of what you said earlier when all the bug seems to have
> > > come down to is that I missed that the socket is set to non-blocking
> > > in all cases now.
> > 
> > Letting the socket mode default to blocking will hide the problems from
> > existing clients that don't care about non-block mode.  But people who
> > try to actually use the nonblock mode are going to see the same kinds of
> > problems that psql is exhibiting.
> > 
> > > The old sequence of events that happened was as follows:
> > 
> > >   user sends data almost filling the output buffer...
> > >   user sends another line of text overflowing the buffer...
> > >   pqFlush is invoked blocking the user until the output pipe clears...
> > >   and repeat.
> > 
> > Right.
> > 
> > > The nonblocking code allows sends to fail so the user can abort
> > > sending stuff to the backend in order to process other work:
> > 
> > >   user sends data almost filling the output buffer...
> > >   user sends another line of text that may overflow the buffer...
> > >   pqFlush is invoked, 
> > >     if the pipe can't be cleared an error is returned allowing the user to
> > >       retry the send later.
> > >     if the flush succeeds then more data is queued and success is returned
> > 
> > But you haven't thought through the mechanics of the "error is returned
> > allowing the user to retry" code path clearly enough.  Let's take
> > pqPutBytes for an example.  If it returns EOF, is that a hard error or
> > does it just mean that the application needs to wait a while?  The
> > application *must* distinguish these cases, or it will do the wrong
> > thing: for example, if it mistakes a hard error for "wait a while",
> > then it will wait forever without making any progress or producing
> > an error report.
> > 
> > You need to provide a different return convention that indicates
> > what happened, say
> >     EOF (-1)    => hard error (same as old code)
> >     0        => OK
> >     1        => no data was queued due to risk of blocking
> > And you need to guarantee that the application knows what the state is
> > when the can't-do-it-yet return is made; note that I specified "no data
> > was queued" above.  If pqPutBytes might queue some of the data before
> > returning 1, the application is in trouble again.  While you apparently
> > foresaw that in recoding pqPutBytes, your code doesn't actually work.
> > There is the minor code bug that you fail to update "avail" after the
> > first pqFlush call, and the much more fundamental problem that you
> > cannot guarantee to have queued all or none of the data.  Think about
> > what happens if the passed nbytes is larger than the output buffer size.
> > You may pass the first pqFlush successfully, then get into the loop and
> > get a won't-block return from pqFlush in the loop.  What then?
> > You can't simply refuse to support the case nbytes > bufsize at all,
> > because that will cause application failures as well (too long query
> > sends it into an infinite loop trying to queue data, most likely).
> > 
> > A possible answer is to specify that a return of +N means "N bytes
> > remain unqueued due to risk of blocking" (after having queued as much
> > as you could).  This would put the onus on the caller to update his
> > pointers/counts properly; propagating that into all the internal uses
> > of pqPutBytes would be no fun.  (Of course, so far you haven't updated
> > *any* of the internal callers to behave reasonably in case of a
> > won't-block return; PQfn is just one example.)
> > 
> > Another possible answer is to preserve pqPutBytes' old API, "queue or
> > bust", by the expedient of enlarging the output buffer to hold whatever
> > we can't send immediately.  This is probably more attractive, even
> > though a long query might suck up a lot of space that won't get
> > reclaimed as long as the connection lives.  If you don't do this then
> > you are going to have to make a lot of ugly changes in the internal
> > callers to deal with won't-block returns.  Actually, a bulk COPY IN
> > would probably be the worst case --- the app could easily load data into
> > the buffer far faster than it could be sent.  It might be best to extend
> > PQputline to have a three-way return and add code there to limit the
> > growth of the output buffer, while allowing all internal callers to
> > assume that the buffer is expanded when they need it.
> > 
> > pqFlush has the same kind of interface design problem: the same EOF code
> > is returned for either a hard error or can't-flush-yet, but it would be
> > disastrous to treat those cases alike.  You must provide a 3-way return
> > code.
> > 
> > Furthermore, the same sort of 3-way return code convention will have to
> > propagate out through anything that calls pqFlush (with corresponding
> > documentation updates).  pqPutBytes can be made to hide a pqFlush won't-
> > block return by trying to enlarge the output buffer, but in most other
> > places you won't have a choice except to punt it back to the caller.
> > 
> > PQendcopy has the same interface design problem.  It used to be that
> > (unless you passed a null pointer) PQendcopy would *guarantee* that
> > the connection was no longer in COPY state on return --- by resetting
> > it, if necessary.  So the return code was mainly informative; the
> > application didn't have to do anything different if PQendcopy reported
> > failure.  But now, a nonblocking application does need to pay attention
> > to whether PQendcopy completed or not --- and you haven't provided a way
> > for it to tell.  If 1 is returned, the connection might still be in
> > COPY state, or it might not (PQendcopy might have reset it).  If the
> > application doesn't distinguish these cases then it will fail.
> > 
> > I also think that you want to take a hard look at the automatic "reset"
> > behavior upon COPY failure, since a PQreset call will block the
> > application until it finishes.  Really, what is needed to close down a
> > COPY safely in nonblock mode is a pair of entry points along the line of
> > "PQendcopyStart" and "PQendcopyPoll", with API conventions similar to
> > PQresetStart/PQresetPoll.  This gives you the ability to do the reset
> > (if one is necessary) without blocking the application.  PQendcopy
> > itself will only be useful to blocking applications.
> > 
> > > I'm sorry if they don't work for some situations other than COPY IN,
> > > but it's functionality that I needed and I expect to be expanded on
> > > by myself and others that take interest in nonblocking operation.
> > 
> > I don't think that the nonblock code is anywhere near production quality
> > at this point.  It may work for you, if you don't stress it too hard and
> > never have a communications failure; but I don't want to see us ship it
> > as part of Postgres unless these issues get addressed.
> > 
> >             regards, tom lane
> > 
> > ************
> > 
> 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BETWEEN patch
Следующее
От: "Ross J. Reedstrom"
Дата:
Сообщение: Re: [PATCHES] PgAccess schema-diagram cleanup