Re: Weird planner issue on a standby

Поиск
Список
Период
Сортировка
От Guillaume Lelarge
Тема Re: Weird planner issue on a standby
Дата
Msg-id CAECtzeWr=N7pO0wO+nio-zxebkTdnz6v9Apftz0Z2JboOv=xYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Weird planner issue on a standby  (Ron <ronljohnsonjr@gmail.com>)
Список pgsql-general
Le mer. 12 oct. 2022 à 06:08, Ron <ronljohnsonjr@gmail.com> a écrit :
On 10/11/22 22:35, Julien Rouhaud wrote:
> On Tue, Oct 11, 2022 at 07:42:55PM +0200, Guillaume Lelarge wrote:
>> Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera <alvherre@alvh.no-ip.org> a
>> écrit :
>>
>>> On 2022-Oct-11, Tom Lane wrote:
>>>
>>>> Are there any tables in this query where extremal values of the join
>>>> key are likely to be in recently-added or recently-dead rows?  Does
>>>> VACUUM'ing on the primary help?
>>> I remember having an hypothesis, upon getting a report of this exact
>>> problem on a customer system once, that it could be due to killtuple not
>>> propagating to standbys except by FPIs.  I do not remember if we proved
>>> that true or not.  I do not remember observing that tables were being
>>> read, however.
>>>
>>>
>> Thanks for your answers.
>>
>> The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have no
>> idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
>> yesterday) is much probably recently-added. I can ask my customer if you
>> want but this looks like a pretty safe bet.
>>
>> On the VACUUM question, I didn't say, but we're kind of wondering if it was
>> lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the
>> database (and especially on the 1.6TB table which is part of the query).
>> I'm kind of skeptical because if the VACUUM wasn't enough on the standby,
>> it should be the same on the primary.
>>
>> Actually, there are two things that really bug me:
>> * why the difference between primary and both standbys?
>> * why now? (it worked great before this weekend, and the only thing I know
>> happened before is a batch delete on sunday... which may be a good-enough
>> reason for things to get screwed, but once again, why only both standbys?)
>>
>> Julien Rouhaud also told me about killtuples, but I have no idea what they
>> are. I suppose this is different from dead tuples. Anyway, if you can
>> enlighten me, I'll be happy :)
> That's an optimisation where an index scan can mark an index entry as dead
> (LP_DEAD) if if tries to fetch some data from the heap that turns out to be all
> dead, so further scans won't have to check again (you can grep kill_prior_tuple
> in the source for more details).  As that's a hint bit, it may not be
> replicated unless you enable wal_log_hints or data_checksums (or write it as a
> FPI indeed), which could explain discrepancy between primary (after a first
> slow index scan) and standby nodes.
>
> But since your customer recreated their standbys from scratch *after* that
> delete, all the nodes should have those hint bits set (Guillaume confirmed
> off-list that they used a fresh BASE_BACKUP).  Note that Guillaume also
> confirmed off-list that the customer has checksums enabled, which means that
> MarkBufferDirtyHint() should be guaranteed to mark the buffers as dirty, so I'm
> out of ideas to explain the different behavior on standbys.

Would EXPLAIN (VERBOSE, COSTS, FORMAT JSON) run on both nodes help show any
differences?


No differences.


--
Guillaume.

В списке pgsql-general по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Weird planner issue on a standby
Следующее
От: Guillaume Lelarge
Дата:
Сообщение: Re: Weird planner issue on a standby