Re: Build-farm - intermittent error in 031_column_list.pl

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Build-farm - intermittent error in 031_column_list.pl
Дата
Msg-id dc08add3-10a8-738b-983a-191c7406707b@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Build-farm - intermittent error in 031_column_list.pl  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Build-farm - intermittent error in 031_column_list.pl  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Build-farm - intermittent error in 031_column_list.pl  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 5/25/22 13:26, Amit Kapila wrote:
> On Wed, May 25, 2022 at 8:16 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>>
>> It does "fix" the case of [1].  But AFAIS
>> RelationSyncEntry.replicate_valid is only used to inhibit repeated
>> loading in get_rel_sync_entry and the function doesn't seem to be
>> assumed to return a invalid entry. (Since the flag is not checked
>> nowhere else.)
>>
>> For example pgoutput_change does not check for the flag of the entry
>> returned from the function before uses it, which is not seemingly
>> safe. (I didn't check further, though)
>>
>> Don't we need to explicitly avoid using invalid entries outside the
>> function?
>>
> 
> We decide that based on pubactions in the callers, so even if entry is
> valid, it won't do anything.  Actually, we don't need to avoid setting
> replication_valid flag as some of the publications for the table may
> be already present. We can check if the publications_valid flag is set
> while trying to validate the entry. Now, even if we don't find any
> publications the replicate_valid flag will be set but none of the
> actions will be set, so it won't do anything in the caller. Is this
> better than the previous approach?
> 

For the record, I'm not convinced this is the right way to fix the
issue, as it may easily mask the real problem.

We do silently ignore missing objects in various places, but only when
either requested or when it's obvious it's expected and safe to ignore.
But I'm not sure that applies here, in a clean way.

Imagine you have a subscriber using two publications p1 and p2, and
someone comes around and drops p1 by mistake. With the proposed patch,
the subscription will notice this, but it'll continue sending data
ignoring the missing publication. Yes, it will continue working, but
it's quite possible this breaks the subscriber and it's be better to
fail and stop replicating.

The other aspect I dislike is that we just stop caching publication
info, forcing us to reload it for every replicated change/row. So even
if dropping the publication happens not to "break" the subscriber (i.e.
the data makes sense), this may easily cause performance issues, lag in
the replication, and so on. And the users will have no idea why and/or
how to fix it, because we just do this silently.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT
Следующее
От: Tom Lane
Дата:
Сообщение: Re: "ERROR: latch already owned" on gharial