Обсуждение: Strange Postgresql behavior solved

Поиск
Список
Период
Сортировка

Strange Postgresql behavior solved

От
Owen Hartnett
Дата:
I spent a day on this, and it's really not a PostgreSQL issue, but I
thought I'd post it in case someone else comes down with it.

Scenario:

I moved the physical location and networking environment of the
server.  It's on Mac OS X - XServe, but that isn't germaine to the
story.  Originally, the server was the DHCP router for the network,
now it sits in a demilitarized zone off a DLink router that's
providing DHCP and NAT.

Symptoms:

Postgres was unable to resolve *some* simple queries, like "Select *
from salestable where thekey = 118", although it would work for
thekey values of 1 all the way to 117.  The connection would just
freeze, and timeout after a couple of minutes.

My application worked this way, and so did pgAdmin, but Navicat LE didn't!

Solution:

I finally realized that my application and pgAdmin were both
accessing the server using the domain name, and Navicat was using the
IP number.  Indeed, replacing the connection data with the IP number
on the app and pgAdmin made the world safe again.

Probably some funky stuff with the router (not one of their expensive
ones) that caused all the consternation, but I originally thought
corrupt database (because I could get 117 records to come out fine,
but not the 118th).  Also, I had narrowed it down to failing only
when accessing the last three fields of that 118th record, the first
40 fields were fine.

-Owen

Re: Strange Postgresql behavior solved

От
Tom Lane
Дата:
Owen Hartnett <owen@clipboardinc.com> writes:
> I spent a day on this, and it's really not a PostgreSQL issue, but I
> thought I'd post it in case someone else comes down with it.

> Scenario:

> I moved the physical location and networking environment of the
> server.  It's on Mac OS X - XServe, but that isn't germaine to the
> story.  Originally, the server was the DHCP router for the network,
> now it sits in a demilitarized zone off a DLink router that's
> providing DHCP and NAT.

> Symptoms:

> Postgres was unable to resolve *some* simple queries, like "Select *
> from salestable where thekey = 118", although it would work for
> thekey values of 1 all the way to 117.  The connection would just
> freeze, and timeout after a couple of minutes.

> My application worked this way, and so did pgAdmin, but Navicat LE didn't!

> Solution:

> I finally realized that my application and pgAdmin were both
> accessing the server using the domain name, and Navicat was using the
> IP number.  Indeed, replacing the connection data with the IP number
> on the app and pgAdmin made the world safe again.

What this sounds like to me is that you've got two postmasters running
on different ports, or something close to that.  The specific behavior
you describe is absolutely not sensible.

            regards, tom lane

Re: Strange Postgresql behavior solved

От
"Leif B. Kristensen"
Дата:
On Saturday 26. July 2008, Owen Hartnett wrote:
>Probably some funky stuff with the router (not one of their expensive
>ones) that caused all the consternation, but I originally thought
>corrupt database (because I could get 117 records to come out fine,
>but not the 118th).  Also, I had narrowed it down to failing only
>when accessing the last three fields of that 118th record, the first
>40 fields were fine.

That sounds a lot like the "game mode" router bug:

http://www.azureuswiki.com/index.php/Torrents_stop_at_99_percent
--
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/
My Jazz Jukebox: http://www.last.fm/user/leifbk/

Re: Strange Postgresql behavior solved

От
owen hartnett
Дата:
On Jul 26, 2008, at 2:32 AM, "Leif B. Kristensen"
<leif@solumslekt.org> wrote:

> On Saturday 26. July 2008, Owen Hartnett wrote:
>> Probably some funky stuff with the router (not one of their expensive
>> ones) that caused all the consternation, but I originally thought
>> corrupt database (because I could get 117 records to come out fine,
>> but not the 118th).  Also, I had narrowed it down to failing only
>> when accessing the last three fields of that 118th record, the first
>> 40 fields were fine.
>
> That sounds a lot like the "game mode" router bug:
>
> http://www.azureuswiki.com/index.php/Torrents_stop_at_99_percent

Yes. It looks like just the behavior. The read failed in the exact
same record every time, even at the same column, and the server is
sitting in a DMZ.

-Owen