Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
От | Fujii Masao |
---|---|
Тема | Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) |
Дата | |
Msg-id | CAHGQGwGzUWvkWk7w0W3O79uaHrTXowZ6ou58E2df2K+5JqMRZg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Scaling XLog insertion (was Re: Moving more work outside
WALInsertLock)
(Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
|
Список | pgsql-hackers |
On Thu, Feb 16, 2012 at 6:15 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Feb 16, 2012 at 5:02 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 15.02.2012 18:52, Fujii Masao wrote: >>> >>> On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> >>>> Are you still seeing this failure with the latest patch I posted >>>> >>>> (http://archives.postgresql.org/message-id/4F38F5E5.8050203@enterprisedb.com)? >>> >>> >>> Yes. Just to be safe, I again applied the latest patch to HEAD, >>> compiled that and tried >>> the same test. Then unfortunately I got the same failure again. >> >> >> Ok. >> >>> I ran the configure with '--enable-debug' '--enable-cassert' >>> 'CPPFLAGS=-DWAL_DEBUG', >>> and make with -j 2 option. >>> >>> When I ran the test with wal_debug = on, I got the following assertion >>> failure. >>> >>> LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap - >>> insert: rel 1663/12277/16384; tid 0/197 >>> STATEMENT: create table t (i int); insert into t >>> values(generate_series(1,10000)); delete from t >>> LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap - >>> insert: rel 1663/12277/16384; tid 0/198 >>> STATEMENT: create table t (i int); insert into t >>> values(generate_series(1,10000)); delete from t >>> TRAP: FailedAssertion("!(((bool) (((void*)(&(target->tid)) != ((void >>> *)0))&& ((&(target->tid))->ip_posid != 0))))", File: "heapam.c", >>> >>> Line: 5578) >>> LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0 >>> LOG: xlog bg flush request 0/17B4000; write 0/17B0000; flush 0/17B0000 >>> LOG: server process (PID 16806) was terminated by signal 6: Abort trap >>> >>> This might be related to the original problem which Jeff and I saw. >> >> >> That's strange. I made a fresh checkout, too, and applied the patch, but >> still can't reproduce. I used the attached script to test it. >> >> It's surprising that the crash happens when the records are inserted, not at >> recovery. I don't see anything obviously wrong there, so could you please >> take a look around in gdb and see if you can get a clue what's going on? >> What's the stack trace? > > According to the above log messages, one strange thing is that the location > of the WAL record (i.e., 0/17B3F90) is not the same as the previous location > of the following WAL record (i.e., 0/17B3F50). Is this intentional? > > BTW, when I ran the test on my Ubuntu, I could not reproduce the problem. > I could reproduce the problem only in MacOS. + nextslot = Insert->nextslot; + if (NextSlotNo(nextslot) == lastslot) + { + /* + * Oops, we've "caught our tail" and the oldest slot is still in use. + * Have to wait for it to become vacant. + */ + SpinLockRelease(&Insert->insertpos_lck); + WaitForXLogInsertionSlotToBecomeFree(); + goto retry; + } + myslot = &XLogCtl->XLogInsertSlots[nextslot]; + nextslot = NextSlotNo(nextslot); nextslot can reach NumXLogInsertSlots, which would be a bug, I guess. When I did the quick-fix and ran the test, I could not reproduce the problem any more. I'm not sure if this is really the cause of the problem, though. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Следующее
От: Alexander KorotkovДата:
Сообщение: Re: Designing an extension for feature-space similarity search