Re: Slow planning time for simple query

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Slow planning time for simple query
Дата
Msg-id 7067.1529246768@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Slow planning time for simple query  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-hackers
[ trimming cc's ]

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> See index_fetch_heap:
>     /*
>      * If we scanned a whole HOT chain and found only dead tuples, tell index
>      * AM to kill its entry for that TID (this will take effect in the next
>      * amgettuple call, in index_getnext_tid).  We do not do this when in
>      * recovery because it may violate MVCC to do so.  See comments in
>      * RelationGetIndexScan().
>      */
>     if (!scan->xactStartedInRecovery)
>         scan->kill_prior_tuple = all_dead;

So the referenced comment in RelationGetIndexScan says

     * During recovery we ignore killed tuples and don't bother to kill them
     * either. We do this because the xmin on the primary node could easily be
     * later than the xmin on the standby node, so that what the primary
     * thinks is killed is supposed to be visible on standby. So for correct
     * MVCC for queries during recovery we must ignore these hints and check
     * all tuples. Do *not* set ignore_killed_tuples to true when running in a
     * transaction that was started during recovery. xactStartedInRecovery
     * should not be altered by index AMs.

but it seems to me that this is not terribly carefully thought through.

1. If global xmin on the primary is later than on the standby, VACUUM could
remove tuples that should be visible on the standby, and that would shortly
propagate to the standby.  Simply ignoring index kill bits on the standby
won't prevent that.

2. Although _bt_killitems doesn't WAL-log its setting of kill bits, those
bits could propagate to the standby anyway, as a result of a subsequent
WAL action on the index page that gets a full-page image added.

I believe that in some replication modes, #1 isn't a problem because we
have mechanisms to hold back the primary's global xmin.  If that's true
always, then all of this logic is obsolete silliness that we could remove.
If it's not true always, we have bigger problems to fix.

Similarly, the possibility of #2 means the standby can't trust the kill
bits to reflect its own xmin anyway.  So if we aren't junking this entire
stack of logic, we could perhaps allow the standby to set kill bits per
its own understanding of global xmin, and then allow the kill bits to be
used in SnapshotNonVacuumable scans even though they're unsafe for regular
MVCC scans.

However, I'm still caffeine-short so maybe there's something I missed.

            regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: SCRAM with channel binding downgrade attack
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_config.h.win32 missing a set of flags from pg_config.h.in added in v11 development