Обсуждение: Error I don't understand, losing synch with server

Поиск
Список
Период
Сортировка

Error I don't understand, losing synch with server

От
Scott Ribe
Дата:
Every once in a while I log this error executing a query:

message contents do not agree with length in message type "D"
lost synchronization with server: got message type "O", length 1398030676

And from that point forward any use of the connection just returns a null
result.

I'm running 8.0.4 on OS X 10.4.4 Server. Does this look more like a possible
bug in PG, or me corrupting memory? For what it's worth, this is currently
the only real problem I'm having, no crashes or other weirdness that would
lead me to suspect memory corruption in my own code.

It's also rare enough that I can work around it by noticing the error,
dropping the connection from my pool, and replacing it with a new one. But
ugh, that's not exactly a long-term solution.

Also FWIW, the only reason I haven't moved to 8.1 is lack of time. (My
available time last month got chewed up by a server hardware failure.)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Error I don't understand, losing synch with server

От
Tom Lane
Дата:
Scott Ribe <scott_ribe@killerbytes.com> writes:
> Every once in a while I log this error executing a query:
> message contents do not agree with length in message type "D"
> lost synchronization with server: got message type "O", length 1398030676

This means either that libpq got a corrupt message from the server, or
that libpq itself contains a bug in message parsing.  Given that no one
else has reported similar problems, the idea that your app is somehow
clobbering the libpq message buffer (and thus corrupting the message "in
transit") has to be taken seriously.

You mention pooling so I suppose this is a multi-threaded application
... are you being careful not to let any two threads try to use the same
libpq PGconn at the same time?  libpq itself does not contain any
locking that would make that safe, you need to provide the locking
yourself.

            regards, tom lane

Re: Error I don't understand, losing synch with server

От
Scott Ribe
Дата:
> This means either that libpq got a corrupt message from the server, or
> that libpq itself contains a bug in message parsing.  Given that no one
> else has reported similar problems, the idea that your app is somehow
> clobbering the libpq message buffer (and thus corrupting the message "in
> transit") has to be taken seriously.

Gee. My code corrupting memory. Like that's never happened before ;-) I just
had to ask though since I'm not seeing other signs right now.

> You mention pooling so I suppose this is a multi-threaded application
> ... are you being careful not to let any two threads try to use the same
> libpq PGconn at the same time?  libpq itself does not contain any
> locking that would make that safe, you need to provide the locking
> yourself.

I have a queue of pgconns. When a thread needs one it pops it off the queue,
and when it's done it pushes the pgconn back on, wrapped by a
stack-allocated class whose constructor and destructor take care of
acquiring and releasing the pgconn. The queue is a Mac OS thing, not my
code, so it's not a problem with sharing them, unfortunately. So I'll have
to keep looking for memory-munging bugs.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Error I don't understand, losing synch with server

От
Scott Ribe
Дата:
>> Every once in a while I log this error executing a query:
>> message contents do not agree with length in message type "D"
>> lost synchronization with server: got message type "O", length 1398030676
>
> This means either that libpq got a corrupt message from the server, or
> that libpq itself contains a bug in message parsing.  Given that no one
> else has reported similar problems, the idea that your app is somehow
> clobbering the libpq message buffer (and thus corrupting the message "in
> transit") has to be taken seriously.
>
> You mention pooling so I suppose this is a multi-threaded application
> ... are you being careful not to let any two threads try to use the same
> libpq PGconn at the same time?  libpq itself does not contain any
> locking that would make that safe, you need to provide the locking
> yourself.

Uhhhmmm, I built without --enable-thread-safety??? I have a process I follow
when building, but pg_config is telling me that I didn't use my standard
options. I'm assuming this could cause all sorts of threading kinkiness...


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice