Re: [HACKERS] FreeBSD problem under heavy load

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: [HACKERS] FreeBSD problem under heavy load
Дата
Msg-id 19991210145658J.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] FreeBSD problem under heavy load  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] FreeBSD problem under heavy load  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] FreeBSD problem under heavy load  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I seem to run into a serious problem. With 6.5.x + FreeBSD 3.2, I get
> > a core under heavy load (16 or more concurrent users). FreeBSD 2.2.x
> > seems more stable but soon or later same thing happens. Examing a
> > core, I found it segfaulted in hash_search().
> 
> I've been looking into this without much success.  I cannot reproduce it
> here under HPUX --- I ran pgbench for several hours without seeing any
> problem.  I also made another pass over the dynahash.c code looking for
> portability bugs, but didn't find anything that looked promising.  (The
> code is ugly and fragile, but AFAICT it will work under existing usage
> patterns.)  It's quite possible the problem is elsewhere and dynahash is
> just on the receiving end of a memory clobber ... but if so, we have
> very little to go on in guessing where to look.
> 
> Can anyone else reproduce the problem?  Does anything show up in the
> postmaster log at or just before the crash?
> 
>             regards, tom lane

I think I got it. in storage/lmgr/lock.c:WaitOnLock:
char        old_status[64],            new_status[64];    :    :
strcpy(old_status, PS_STATUS);strcpy(new_status, PS_STATUS);strcat(new_status, " waiting");PS_SET_STATUS(new_status);
:    :PS_SET_STATUS(old_status);
 

The current status string is copied into old_status, then the pointer
to it is set to a gloable variable ps_status by PS_SET_STATUS
macro. Unfortunately old_status is on the stack, so once WaitOnLock
returns, ps_status would point to a garbage. In the subsequent call to
WaitOnLock,
strcpy(old_status, PS_STATUS);

will copy garbage string into old_status. So if the string is longer
than 63, the stack would be broken. Note that this would not happen on
Linux due to the difference of the definition of the macro. See
include/utils/ps_status.h for more details.

Also, I don't understand why:
strcpy(new_status, PS_STATUS);strcat(new_status, " waiting");PS_SET_STATUS(new_status);

is necessary. Just:
PS_SET_STATUS("waiting");

should be enough. After doing some tests on my FreeBSD and Linux box,
I will commit fixes to both current and 6.5 source tree.

> PS: pgbench's configure fails on HPUX, because HP's compiler doesn't
> like whitespace before #include.  I modified configure.in like this:
> 
> AC_TRY_LINK([#include <sys/time.h>
> #include <sys/resource.h>],
>         [struct rlimit rlim;

Thanks. I will incorporate your fix.

BTW, I think pgbench is usefull to detect this kind of problems. Can I
put it into contrib or somewhere?
--
Tatsuo Ishii


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: 6.6 release
Следующее
От: Tom Lane
Дата:
Сообщение: OK, what's this LDREL all about?