Обсуждение: Posgresql 7.2b1 crashes
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've had two overnight crashes with postgresql-7.2b1. Neither logged any useful info in the logfile created by pg_ctl, syslog or messages. The server has one user database with 1148 records. It is however queried for each incoming email. It failed 2 regressions test the time timetz which was discussed and geometry which it always fails. Postgresql-7.1.x was running rather smoothly on this box. The box is an i586, running linux 2.4 with gcc-3.0.2, binutils-2.11.2, glibc-2.2.x. If you need anymore info or have any suggestions that might help debug this feel free to mail me. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Made with pgp4pine 1.76 iEYEARECAAYFAjvhZakACgkQwtU6L/A4vVBrRQCgkQSxwwkX2QfNm+tdAW+UxDGm T3IAniTGEcImI1i/Ggbbhy9dfGfUPBDQ =cgeK -----END PGP SIGNATURE-----
"Mr. Shannon Aldinger" <god@yinyang.hjsoft.com> writes: > I've had two overnight crashes with postgresql-7.2b1. Neither logged any > useful info in the logfile created by pg_ctl, syslog or messages. Please define "crash". If it was a coredump, how about a stack backtrace? Can you determine what query it was executing? While I'd like to help you, you have not provided one single bit of information that could possibly be used to identify the problem ... > glibc-2.2.x. ... except perhaps that. If you compiled with --enable-locale, an update to glibc 2.2.3 is strongly advised. There's a nasty bug in strcoll() in 2.2.x. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 1 Nov 2001, Tom Lane wrote: > "Mr. Shannon Aldinger" <god@yinyang.hjsoft.com> writes: > > I've had two overnight crashes with postgresql-7.2b1. Neither logged any > > useful info in the logfile created by pg_ctl, syslog or messages. > > Please define "crash". If it was a coredump, how about a stack > backtrace? Can you determine what query it was executing? > I don't have a core file, it died overnight both times so i don't know exactly but I can give you the general query it performs. By crash i mean the postmaster process is gone along with it's sub-processes or threads. It runs several hundred of these queries per day: select error from accessdb where lower(email)=lower('%s') limit 1; %s is usually replaced with an email address, domain name or ip address. > While I'd like to help you, you have not provided one single bit of > information that could possibly be used to identify the problem ... > Sorry, but i've been unable to gather all that much about the problem. I started postmaster with -B 512 -N 64 -i, I'm going to try to up the debugging level and see if it gives anymore incite into why it crashed. > > glibc-2.2.x. > > ... except perhaps that. If you compiled with --enable-locale, an > update to glibc 2.2.3 is strongly advised. There's a nasty bug in > strcoll() in 2.2.x. > I think i'm running 2.2.3, but i'm not 100% sure. from config.status: ./configure --enable-multibyte --with-maxbackends=128 --with-openssl - --enable-odbc --with-CXX --with-gnu-ld --enable-syslog -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Made with pgp4pine 1.76 iEYEARECAAYFAjvhkZQACgkQwtU6L/A4vVDD+gCfeOlPaEgRtdtRtjy6Ku7l2/jh M/0An2OT5vNFrfx2vc5FjpzccAiBi2sg =Ry99 -----END PGP SIGNATURE-----
"Mr. Shannon Aldinger" <god@yinyang.hjsoft.com> writes: > I don't have a core file, it died overnight both times so i don't know > exactly but I can give you the general query it performs. By crash i mean > the postmaster process is gone along with it's sub-processes or threads. Postmaster dies too? Wow. If you aren't seeing a core file, perhaps it's because you are starting the postmaster under "ulimit -c 0". You need the process context to be "ulimit -c unlimited" to allow cores to be dropped. Might be worth running with -d 2 to enable query logging as well. >> ... except perhaps that. If you compiled with --enable-locale, an >> update to glibc 2.2.3 is strongly advised. There's a nasty bug in >> strcoll() in 2.2.x. >> > I think i'm running 2.2.3, but i'm not 100% sure. > from config.status: > ./configure --enable-multibyte --with-maxbackends=128 --with-openssl > - --enable-odbc --with-CXX --with-gnu-ld --enable-syslog Since you didn't use --enable-locale, it's irrelevant; AFAIK we don't call strcoll() unless that option's been selected. The known forms of the strcoll problem wouldn't cause a postmaster crash anyway, only backend crashes. So you've got something new. Please keep us posted. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 1 Nov 2001, Tom Lane wrote: > "Mr. Shannon Aldinger" <god@yinyang.hjsoft.com> writes: > > I don't have a core file, it died overnight both times so i don't know > > exactly but I can give you the general query it performs. By crash i mean > > the postmaster process is gone along with it's sub-processes or threads. > > Postmaster dies too? Wow. If you aren't seeing a core file, perhaps > it's because you are starting the postmaster under "ulimit -c 0". > You need the process context to be "ulimit -c unlimited" to allow cores > to be dropped. Might be worth running with -d 2 to enable query logging > as well. > I'm not sure about the ulimit, I restarted postmaster with -d 1 a few minutes ago, I'll check the ulimit and restart it again and hopefully it dies with some useful info this time. > Since you didn't use --enable-locale, it's irrelevant; AFAIK we don't > call strcoll() unless that option's been selected. The known forms of > the strcoll problem wouldn't cause a postmaster crash anyway, only > backend crashes. So you've got something new. Please keep us posted. > I probably won't have any new info till tommorow morning EST, it died once around 4am, the other at 5:25am so it's kinda hard to tell what made it go belly up at this point. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Made with pgp4pine 1.76 iEYEARECAAYFAjvhmvkACgkQwtU6L/A4vVDjuwCdG0UhoVvE4weow0P1wPxAZuha hioAoK9SttvMZadAyoAJxKQgFTHHWmb8 =Hjr9 -----END PGP SIGNATURE-----
"Mr. Shannon Aldinger" <god@yinyang.hjsoft.com> writes: > I probably won't have any new info till tommorow morning EST, it died once > around 4am, the other at 5:25am so it's kinda hard to tell what made it go > belly up at this point. Okay. One thing to keep in mind is that the postmaster will drop core in whatever directory you are in when you start it, whereas individual backends drop core in the $PGDATA/base/dbnumber/ subdirectory of the database they are attached to. regards, tom lane
On Thu, 1 Nov 2001, Mr. Shannon Aldinger wrote: > > > glibc-2.2.x. > > > > ... except perhaps that. If you compiled with --enable-locale, an > > update to glibc 2.2.3 is strongly advised. There's a nasty bug in > > strcoll() in 2.2.x. > I think i'm running 2.2.3, but i'm not 100% sure. Try: $ echo /lib/libc-2.2*.so Matthew.
Tom Lane wrote: > > "Mr. Shannon Aldinger" <god@yinyang.hjsoft.com> writes: > > I don't have a core file, it died overnight both times so i don't know > > exactly but I can give you the general query it performs. By crash i mean > > the postmaster process is gone along with it's sub-processes or threads. > > Postmaster dies too? Wow. If you aren't seeing a core file, perhaps > it's because you are starting the postmaster under "ulimit -c 0". > You need the process context to be "ulimit -c unlimited" to allow cores > to be dropped. Might be worth running with -d 2 to enable query logging > as well. I have seen the same thing, and I have been trying to reproduce it. I know for a fact that it was in the middle of : (In a C application using libpq.) declare temp_curs binary cursor for select scene_name_full, track from favorites where rating > 6 and track < 1000000000 order by scene_name_full Performing a loop on: fetch 1000 from temp_curs I have lots of memory, lots of disk, I'm pretty sure it isn't a resource issue. (It could be a shared memory issue?) I have not been able to reproduce it. Hope this helps. cdinfo=# explain select scene_name_full, track from favorites where rating > 6 and track < 1000000000 order by scene_name_full; NOTICE: QUERY PLAN: Sort (cost=517675.50..517675.50 rows=2091003 width=18) -> Seq Scan on favorites (cost=0.00..135699.52 rows=2091003 width=18) EXPLAIN
On Thu, Nov 01, 2001 at 01:51:21PM -0500, Tom Lane wrote: > > I think i'm running 2.2.3, but i'm not 100% sure. > > from config.status: > > ./configure --enable-multibyte --with-maxbackends=128 --with-openssl > > - --enable-odbc --with-CXX --with-gnu-ld --enable-syslog > > Since you didn't use --enable-locale, it's irrelevant; AFAIK we don't > call strcoll() unless that option's been selected. The known forms of > the strcoll problem wouldn't cause a postmaster crash anyway, only > backend crashes. So you've got something new. Please keep us posted. May be try compile it --enable-cassert. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 1 Nov 2001, Tom Lane wrote: > Postmaster dies too? Wow. If you aren't seeing a core file, perhaps > it's because you are starting the postmaster under "ulimit -c 0". > You need the process context to be "ulimit -c unlimited" to allow cores > to be dropped. Might be worth running with -d 2 to enable query logging > as well. > There were no core files dropped under the data directory. However I did get a core file in ~postgres, presumably it's from the postmaster. I also put up the logfile which doesn't really contain anything too intresting at the end. The core and logfile can be found at: http://yinyang.hjsoft.com/core.gz http://yinyang.hjsoft.com/logfile.gz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Made with pgp4pine 1.76 iEYEARECAAYFAjvioVkACgkQwtU6L/A4vVBE8gCePeyyZbUBaNKE+qtSqi+BbZmp xDgAn24Win7EAkAWoRrq98keiMqHPAzx =0TDa -----END PGP SIGNATURE-----
Karel Zak <zakkr@zf.jcu.cz> writes: > May be try compile it --enable-cassert. Excellent recommendation. (Actually, I'd recommend --enable-cassert for anyone working with beta code, whether you're currently chasing a problem or not. I'm not sure it's appropriate for production servers, because it turns what might be relatively harmless errors into database restarts; but for development and testing it's essential.) regards, tom lane