Could not read block at end of the relation
От | Ronan Dunklau |
---|---|
Тема | Could not read block at end of the relation |
Дата | |
Msg-id | 1878547.tdWV9SEqCh@aivenlaptop обсуждение исходный текст |
Ответы |
FSM Corruption (was: Could not read block at end of the relation)
|
Список | pgsql-bugs |
Hello, I'm sorry as this will be a very poor bug report. On PG16, I'm am experiencing random errors which share the same characteristics: - happens during heavy system load - lots of concurrent writes happening on a table - often (but haven't been able to confirm it is necessary), a vacuum is running on the table at the same time the error is triggered Then, several backends get the same error at once "ERROR: could not read block XXXX in file "base/XXXX/XXXX": read only 0 of 8192 bytes", with different block numbers. The relation is always a table (regular or toast). The blocks are past the end of the relation, and the different backends are all trying to read a different block. The offending queries are either an INSERT / UPDATE / COPY. I've seen that several bugs have been fixed in 16.1 and 16.2 regarding the new relation extension infrastructure, involving partitioned tables in one case and temp tables in the other one so I suspect maybe some other corner cases are uncovered in there. I suspected the FSM could be corrupted in some way but taking a look at it just after the errors have been triggered, the offending (non existing)blocks are just not present in the FSM either. I'm desperately trying to reproduce the issue in a test environment, without any luck so far... I suspected a race condition with VACUUM trying to reclaim the space at the end of the relation, but running a custom build trying to reproduce that (by always trying to truncate the relation during VACUUM regardless of the amount of possibly-freeable-space) hasn't led me anywhere. It seems that for some reason, a backend is extending the relation for the other waiting ones, and the newly allocated blocks don't end up being pinned in shared_buffers. They could then be evicted, and the waiting backend is now trying to read from a block which has to be read from disk but has never been marked dirty and never persisted. I don't have anything to back that hypothesis though... Once again I'm sorry that this report is too vague, I'll update if I manage to reproduce the issue or gather some more information, but in the meantime has anybody witnessed something similar ? And more importantly, do you have any pointers on how to investigate to try to trigger the issue manually ? Best regards, -- Ronan Dunklau
В списке pgsql-bugs по дате отправления: