Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"
Дата
Msg-id 20140716044659.GD11811@eldon.alvh.no-ip.org
обсуждение исходный текст
Ответ на 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
I'm not saying there is no multixact bug here, but I wonder if this part
of your crasher patch might be the cause:

--- 754,771 ----                  errmsg("could not seek to block %u in file \"%s\": %m",
blocknum,FilePathName(v->mdfd_vfd)))); 
 
!         if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress()) {
!       nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3);
!         ereport(FATAL,
!                 (errcode(ERRCODE_DISK_FULL),
!                  errmsg("could not write block %u of relation %s: wrote only %d of %d bytes",
!                         blocknum,
!                         relpath(reln->smgr_rnode, forknum),
!                         nbytes, BLCKSZ),
!                  errhint("JJ is screwing with the database.")));
!         } else {
!       nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ);
!     }

Wouldn't this BLCKSZ/3 business update the page's LSN but not the full
contents, meaning that on xlog replay the block wouldn't be rewritten
when the xlog replays next time around?  That could cause the block to
have the upper two thirds containing multixacts in xmax that had been
removed by a vacuuming round previous to the crash.

(Maybe I'm just too tired and I'm failing to fully understand the torn
page protection.  I thought I understood how it worked, but now I'm not
sure -- I mean I don't see how it can possibly have any value at all.
Surely if the disk writes the first 512-byte sector of the page and then
forgets the updates to the next 15 sectors, the page will appear as not
needing the full page image to be restored ...)

Is the page containing the borked multixact value the one that was
half-written by this code?

Is the problem reproducible if you cause this path to ereport(FATAL)
without writing 1/3rd of the page?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [bug fix] pg_ctl always uses the same event source
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: gaussian distribution pgbench