Re: Online checksums verification in the backend
От | Julien Rouhaud |
---|---|
Тема | Re: Online checksums verification in the backend |
Дата | |
Msg-id | 20200318101055.GA36918@nol обсуждение исходный текст |
Ответ на | Re: Online checksums verification in the backend (Julien Rouhaud <rjuju123@gmail.com>) |
Ответы |
Re: Online checksums verification in the backend
|
Список | pgsql-hackers |
On Wed, Mar 18, 2020 at 07:06:19AM +0100, Julien Rouhaud wrote: > On Wed, Mar 18, 2020 at 01:20:47PM +0900, Michael Paquier wrote: > > On Mon, Mar 16, 2020 at 09:21:22AM +0100, Julien Rouhaud wrote: > > > On Mon, Mar 16, 2020 at 12:29:28PM +0900, Michael Paquier wrote: > > >> With a large amount of > > >> shared buffer eviction you actually increase the risk of torn page > > >> reads. Instead of a logic relying on partition mapping locks, which > > >> could be unwise on performance grounds, did you consider different > > >> approaches? For example a kind of pre-emptive lock on the page in > > >> storage to prevent any shared buffer operation to happen while the > > >> block is read from storage, that would act like a barrier. > > > > > > Even with a workload having a large shared_buffers eviction pattern, I don't > > > think that there's a high probability of hitting a torn page. Unless I'm > > > mistaken it can only happen if all those steps happen concurrently to doing the > > > block read just after releasing the LWLock: > > > > > > - postgres read the same block in shared_buffers (including all the locking) > > > - dirties it > > > - writes part of the page > > > > > > It's certainly possible, but it seems so unlikely that the optimistic lock-less > > > approach seems like a very good tradeoff. > > > > Having false reports in this area could be very confusing for the > > user. That's for example possible now with checksum verification and > > base backups. > > > I agree, however this shouldn't be the case here, as the block will be > rechecked while holding proper lock the 2nd time in case of possible false > positive before being reported as corrupted. So the only downside is to check > twice a corrupted block that's not found in shared buffers (or concurrently > loaded/modified/half flushed). As the number of corrupted or concurrently > loaded/modified/half flushed blocks should usually be close to zero, it seems > worthwhile to have a lockless check first for performance reason. I just noticed some dumb mistakes while adding the new GUCs. v5 attached to fix that, no other changes.
Вложения
В списке pgsql-hackers по дате отправления: