Re: 10.5 but not 10.4: backend startup during reindex system: couldnot read block 0 in file "base/16400/..": read only 0 of 8192 bytes

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: 10.5 but not 10.4: backend startup during reindex system: couldnot read block 0 in file "base/16400/..": read only 0 of 8192 bytes
Дата
Msg-id 20180830215711.GW23024@telsasoft.com
обсуждение исходный текст
Ответ на Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes
Список pgsql-hackers
On Thu, Aug 30, 2018 at 05:30:30PM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Wed, Aug 29, 2018 at 11:35:51AM -0400, Tom Lane wrote:
> >> As far as we can tell, that bug is a dozen years old, so it's not clear
> >> why you find that you can reproduce it only in 10.5.  But there might be
> >> some subtle timing change accounting for that.
> 
> > It seems to me there's one root problem occurring in (at least) two slightly
> > different ways.  The issue/symptom that I've been seeing occurs in 10.5 but not
> > 10.4, and specifically at commit 2ce64ca, but not before. 
> 
> Yeah, as you probably saw in the other thread, we later realized that
> 2ce64ca created an additional pathway for ScanPgRelation to recurse;
> a pathway that's evidently easier to hit than the pre-existing ones.
> I note that both of your stack traces display ScanPgRelation recursion,
> so I'm feeling pretty confident that what you're seeing is the same
> thing.
> 
> But, as Andres says, it'd be great if you could confirm whether the
> draft patches fix it for you.

I tested with relcache-rebuild.diff which hasn't broken in 15min, so I'm
confident that doesn't hit the additional recusive pathway, but have to wait
awhile and see if autovacuum survives, too.

I tried to apply fix-missed-inval-msg-accepts-1.patch on top of PG10.5 but
patch didn't apply, so I can test HEAD after the first patch soaks awhile.

Just curious, is there really any difficulty in reproducing this?  Once I
realized this was a continuing issue and started to suspect pg10.5, it takes
just about nothing to reproduce anywhere I've tried.  I just tested 5 servers,
and only one took more than a handful of seconds to fail.  I gave up waiting
for a 6th server, because I found it was waiting on a pre-existing lock.

[pryzbyj@database ~]$ while :; do for a in pg_class_oid_index pg_class_relname_nsp_index
pg_class_tblspc_relfilenode_index;do psql ts -qc "REINDEX INDEX $a"; done; done&
 
[pryzbyj@database ~]$ a=0; time while psql ts -qc ''; do a=$((1+a)); done ; echo "$a"
psql: FATAL:  could not read block 0 in file "base/16400/313581263": read only 0 of 8192 bytes

real    0m1.772s
user    0m0.076s
sys     0m0.116s
47

Justin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Use C99 designated initializers for some structs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes