Re: [HACKERS] Shared memory corruption?
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] Shared memory corruption? |
Дата | |
Msg-id | 199802122010.PAA03196@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Shared memory corruption? (Tom I Helbekkmo <tih@Hamartun.Priv.NO>) |
Список | pgsql-hackers |
Vadim, I may need your help on this one. I can reproduce it by runinng the regression test, and doing a shell 'while' loop that continuously creates databases: while : do sh -c 'createdb $$' done I get the errors too. I have no idea on a cause. I would hope it is not the new deadlock code, or locking fixes I did. I think the message comes from smgrblindwrt. Is it possible our new speedups are causing it? > > [similar report submitted previously, but this is more complete] > > There is something that looks like shared memory corruption going on, > which I first noticed by accident the other day, in the 1998-02-09 > snapshot. It's still there today, with the 1998-02-12 one, and looks > like the following on my Sun SS2 under NetBSD/sparc 1.3 (I've created > a simple test case here, for easy testing elsewhere): > > First, I run initdb, start a postmaster, create a user 'tih', stop the > postmaster, restart the postmaster with '-d', thus: > > barsoom:postgres> postmaster -i -d > FindBackend: searching PATH ... > FindBackend: found "/usr/local/pgsql/bin/postgres" using PATH > > Next, I create a database 'words', thus: > > barsoom:tih> createdb words > barsoom:tih> > > The postmaster says: > > postmaster: BackendStartup: pid 6542 user tih db template1 socket 5 > postmaster: reaping dead processes... > postmaster: CleanupProc: pid 6542 exited with status 0 > > I fire up psql, thus: > > barsoom:tih> psql words > words=> > > The postmaster goes: > > postmaster: BackendStartup: pid 6549 user tih db words socket 5 > > In psql, I then do the following: > > words=> create table dictionary (entry char(64)); > CREATE > words=> create unique index dict_by_entry on dictionary (entry); > CREATE > words=> copy dictionary from '/usr/share/dict/words'; > > The postmaster generates no output at this, and the copy starts as it > should. There is much disk activity. Next, while this is running,in > another terminal window, as the same user 'tih', I do: > > barsoom:tih> createdb > Connection to database 'template1' failed. > PQexec() -- There is no connection to the backend. > createdb: database creation failed on tih. > barsoom:tih> > > When this happens, the postmaster generates the following output: > > postmaster: BackendStartup: pid 6560 user tih db template1 socket 5 > ERROR: cannot write block 171 of dict_by_entry [words] blind > postmaster: reaping dead processes... > postmaster: CleanupProc: pid 6560 exited with status 0 > > Looking at processes running on the system at this time, I see: > > 6549 p6 R+ 2:01.88 /usr/local/pgsql/bin/postgres -p -Q -P5 -v 65536 words > > This is the backend doing the copy. It is spinning furiously, eating > CPU like there was no tomorrow -- but there is no more disk activity. > The terminal window where I initiated the copy operation looks as > though it were proceeding normally. So now I attempt to perform the > database creation again, thus (in the second terminal): > > barsoom:tih> createdb > > Nothing happens -- it just hangs there. The postmaster says: > > postmaster: BackendStartup: pid 6595 user tih db template1 socket 5 > > Looking with ps again, I can see that this backend is now also running > wild, sharing the CPU half and half with the one with PID 6549... > > Note that I'm trying to create a different database when it breaks; > the only possible interaction is through the shared memory that I > understand is maintained by the postmaster on behalf of the backends. > As for seeing this on other platforms, I certainly hope it's > repeatable elsewhere, but it's not unreasonable to assume that it > could cause different symptoms on other platforms, including quiet > data corruption... > > The whole thing is completely repeatable here -- any ideas can be > verified quickly and easily -- and with enthusiasm. :-) > > -tih > -- > Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier" > > > -- Bruce Momjian maillist@candle.pha.pa.us
В списке pgsql-hackers по дате отправления: