RE: [HACKERS] Major bug, possible, with Solaris 7?

Поиск
Список
Период
Сортировка
От Daryl W. Dunbar
Тема RE: [HACKERS] Major bug, possible, with Solaris 7?
Дата
Msg-id 002b01be5c84$1ebc06b0$1445e59b@ddunbar.eni.net
обсуждение исходный текст
Ответ на RE: [HACKERS] Major bug, possible, with Solaris 7?  (The Hermit Hacker <scrappy@hub.org>)
Ответы RE: [HACKERS] Major bug, possible, with Solaris 7?  (The Hermit Hacker <scrappy@hub.org>)
Список pgsql-hackers
At this point, I willing to try anything.  I'm in production (live
site), but we have not announced the site.  What that means is that
I have the weekend to debug/fix/decide what to do.  I'll take
whatever version you suggest and load it.

DwD

> -----Original Message-----
> From: The Hermit Hacker [mailto:scrappy@hub.org]
> Sent: Friday, February 19, 1999 10:39 PM
> To: Daryl W. Dunbar
> Cc: pgsql-hackers@postgreSQL.org
> Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
>
>
> On Fri, 19 Feb 1999, Daryl W. Dunbar wrote:
>
> > Oh, sorry.  6.4.2 with a backend patch to prevent the
> parent death
> > in the event of MaxBackendID being reached.
> >
> > I know it is in semop() because I did a truss on the child
> > processes.  From a small sample, it looks like they may all be
> > trying to operate on the same semaphore.  I'm recompiling with
> > the -g flag to gain more insight...
>
> I'm just curious, but is this being used production yet?
> If not, would
> you be willing to try out the current snapshot, which is
> soon to become
> 6.5-BETA?  If this apparent bug still exists there, I
> think its sufficient
> a bug to prevent v6.5 coming out until this is fixed

> then again,
> something this reproducible will most likely hold up
> v6.4.3 from being
> released also, so if we are planning a v6.4.3 (I thought
> we were), we'll
> have to get this fixed in the 6.4 line also.
>
> Actually, with that in mind, I'm putting together a very
> quick tar ball of
> what v6.4.3 is looking like so far.  this is *not* a
> release, but I'd like
> to see if this problem exists in the most current STABLE
> tree or not...I
> know there has been quite a few fixes put into it...
>
> Check in about a half hour or so, under the 'test' directory of
> ftp.postgresql.org .. should be there then...
>
>
> > > -----Original Message-----
> > > From: owner-pgsql-hackers@postgreSQL.org
> > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf
> Of The Hermit
> > > Hacker
> > > Sent: Friday, February 19, 1999 12:46 PM
> > > To: pgsql-hackers@postgreSQL.org
> > > Cc: Daryl W. Dunbar
> > > Subject: [HACKERS] Major bug, possible, with Solaris 7?
> > >
> > >
> > >
> > > Can someone please take a minute to look at this?
> > >
> > > I've gzip'd and moved his errorlog to
> > > ftp.postgresql.org:/pub/debugging...one thing that
> appears to be
> > > lacking...what version of PostgreSQL are you using?
> > >
> > > Marc G. Fournier
> > > Systems Administrator @ hub.org
> > > primary: scrappy@hub.org           secondary:
> > > scrappy@{freebsd|postgresql}.org
> > >
> > > ---------- Forwarded message ----------
> > > Date: Thu, 18 Feb 1999 18:23:25 -0500
> > > From: Daryl W. Dunbar <daryl@www.com>
> > > To: The Hermit Hacker <scrappy@hub.org>
> > > Subject: RE: Interested?
> > >
> > > Thanks Marc,  We exchanged an e-mail or two last
> week, along with
> > > Tatsuo Ishii and Tom Lane.  You suggested I truss the process.
> > >
> > > Anyway, periodically, the backends spiral out of
> control with hung
> > > up children until I hit MaxBackendID (which I
> compiled in to be
> > > 128).  Initially, I was running out of semaphores on
> Solaris 7 and
> > > changed /etc/system to add these lines:
> > > set shmsys:shminfo_shmmax=16777216
> > > set shmsys:shminfo_shmmin=1
> > > set shmsys:shminfo_shmmni=128
> > > set shmsys:shminfo_shmseg=51
> > > *
> > > set semsys:seminfo_semmap=128
> > > set semsys:seminfo_semmni=128
> > > set semsys:seminfo_semmns=8192
> > > set semsys:seminfo_semmnu=8192
> > > set semsys:seminfo_semmsl=64
> > > set semsys:seminfo_semopm=32
> > > set semsys:seminfo_semume=32
> > >
> > > I increased shared memory so I could start more backends...
> > >
> > > OK, so now, everything is running fine and boom, the
> > > backends start
> > > to hang on semop, eventually reaching MaxBackendID
> and refusing
> > > connections.
> > > Attached is a log file from a hang up today.  Debug
> is set to 3.
> > > All times are PST.  I have carved out a bunch of
> normal operation
> > > from the beginning (about 21,000 lines) and redundant
> 'too many
> > > backends' (about 1,000 lines, while I was eating lunch :)
> > > signified
> > > by {SNIP SNIP}.  I pick the log back up with the
> birth of pid 2828
> > > and left several 'normal' cycles in until...
> > >
> > > You can see that process 2840 is the first child to
> hang.  It was
> > > started at 11:39:23 and did not die until sent a 15 by
> > > the parent at
> > > 14:12:16.  All of the hung processes fall between
> 2840 and 3454.
> > >
> > > Sorry the file is so big.  Here are some 'keys' you can use:
> > > Startup is the first line (obviously).
> > > You can find child startup by looking for [2840] (pid
> in brackets)
> > > You can find child exits by looking for '2480 exited'
> > > You can find where I send the kill signal by looking for
> > > 'pmdie 15'
> > >
> > > I think that's a good start. :)
> > >
> > > Don't hesitate to contact me if I can shed any more
> > > light.  I'm wide
> > > open to ideas at the moment.  I'm in EST, but tend to
> work until
> > > 10-11 at night, so e-mail anytime.
> > >
> > > Thanks,
> > >
> > > DwD
> > >
> > > > -----Original Message-----
> > > > From: The Hermit Hacker [mailto:scrappy@hub.org]
> > > > Sent: Thursday, February 18, 1999 5:36 PM
> > > > To: Daryl W. Dunbar
> > > > Subject: Re: Interested?
> > > >
> > > >
> > > >
> > > > Hi Daryl...
> > > >
> > > >     I'm not the strongest at internal code, so may not
> > > > be of any help
> > > > at all.  I just went through my -hackers email, and can't
> > > > seem to find
> > > > anything from you in there.  Can you tell me what your
> > > > problem is, as well
> > > > as version of PostgreSQL you are using, and we'll see
> > > > what we can do?
> > > >
> > > > Marc
> > > >
> > > > On Thu, 18 Feb 1999, Daryl W. Dunbar wrote:
> > > >
> > > > > Marc,
> > > > >
> > > > > I know that you put considerable volunteer time into
> > > > PostgreSQL.  If
> > > > > I am not too bold in asking, and you are comfortable
> > > > with it, I am
> > > > > prepared to compensate you for your time if you can
> > > assist me in
> > > > > tracking down this rather nasty bug I have been
> > > > e-mailing Hackers
> > > > > about.  Please let me know if you are interested and if
> > > > so, at what
> > > > > rate.
> > > > >
> > > > > We are in the process of launching a pretty exciting
> > > site and a
> > > > > database in a integral part of it.  I really want to
> > > > use PostgreSQL,
> > > > > but can not take it into production on Solaris with
> > > this problem
> > > > > going on.  I'm in the process of installing a test site
> > > > on Linux to
> > > > > see if the problem exists there, but I expect it
> is limited to
> > > > > Solaris.
> > > > >
> > > > > I anxiously await your response.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > DwD
> > > > >
> > > > > --
> > > > > Daryl W. Dunbar
> > > > > VP of Engineering/Chief Technology Officer
> > > > > http://www.com, Where the Web Begins!
> > > > > mailto:daryl@www.com
> > > > >
> > > > >
> > > >
> > > > Marc G. Fournier
> > > > Systems Administrator @ hub.org
> > > > primary: scrappy@hub.org           secondary:
> > > > scrappy@{freebsd|postgresql}.org
> > > >
> > >
> > >
> >
>
> Marc G. Fournier
> Systems Administrator @ hub.org
> primary: scrappy@hub.org           secondary:
> scrappy@{freebsd|postgresql}.org
>



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vince Vielhaber
Дата:
Сообщение: Re: [HACKERS] lower() broken?
Следующее
От: The Hermit Hacker
Дата:
Сообщение: RE: [HACKERS] Major bug, possible, with Solaris 7?