Re: [Patch] ALTER SYSTEM READ ONLY

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: [Patch] ALTER SYSTEM READ ONLY
Дата
Msg-id A50E23AD-02B6-4950-BF12-0329B1816E66@enterprisedb.com
обсуждение исходный текст
Ответ на Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Ответы Re: [Patch] ALTER SYSTEM READ ONLY  (Mark Dilger <mark.dilger@enterprisedb.com>)
Список pgsql-hackers

> On Sep 10, 2021, at 7:36 AM, Amul Sul <sulamul@gmail.com> wrote:
>
>> v33-0005
>>
>> This patch makes bool XLogInsertAllowed() more complicated than before.  The result used to depend mostly on the
valueof LocalXLogInsertAllowed except that when that value was negative, the result was determined by
RecoveryInProgress().There was an arcane rule that LocalXLogInsertAllowed must have the non-negative values binary
coercibleto boolean "true" and "false", with the basis for that rule being the coding of XLogInsertAllowed().  Now that
thefunction is more complicated, this rule seems even more arcane.  Can we change the logic to not depend on casting an
integerto bool? 
>>
>
> We can't use a boolean variable because LocalXLogInsertAllowed
> represents three states as, 1 means "wal is allowed'', 0 for "wal is
> disallowed", and -1 is for "need to check".

I'm complaining that we're using an integer rather than an enum.  I'm ok if we define it so that WAL_ALLOWABLE_UNKNOWN
=-1, WAL_DISALLOWED = 0, WAL_ALLOWED = 1 or such, but the logic of the function has gotten complicated enough that
havingto remember which number represents which logical condition has become a (small) mental burden.  Given how hard
theWAL code is to read and fully grok, I'd rather avoid any unnecessary burden, even small ones. 

>> The new function CheckWALPermitted() seems to test the current state of variables but not lock any of them, and the
newfunction comment says: 
>>
>
> CheckWALPermitted() calls XLogInsertAllowed() does check the
> LocalXLogInsertAllowed flag which is local to that process only, and
> nobody else reads that concurrently.
>
>> /*
>> * In opposite to the above assertion if a transaction doesn't have valid XID
>> * (e.g. VACUUM) then it won't be killed while changing the system state to WAL
>> * prohibited.  Therefore, we need to explicitly error out before entering into
>> * the critical section.
>> */
>>
>> This suggests to me that a vacuum process can check whether wal is prohibited, then begin a critical section which
needswal to be allowed, and concurrently somebody else might disable wal without killing the vacuum process.  I'm given
towonder what horrors await when the vacuum process does something that needs to be wal logged but cannot be.  Does it
triggera panic?  I don't like the idea that calling pg_prohibit_wal durning a vacuum might panic the cluster.  If there
issome reason this is not a problem, I think the comment should explain it.  In particular, why is it sufficient to
checkwhether wal is prohibited before entering the critical section and not necessary to be sure it remains allowed
throughthe lifetime of that critical section? 
>>
>
> Hm, interrupts absorption are disabled inside the critical section.
> The wal prohibited state for that process (here vacuum) will never get
> set until it sees the interrupts & the system will not be said wal
> prohibited until every process sees that interrupts. I am not sure we
> should explain the characteristics of the critical section at this
> place, if want, we can add a brief saying that inside the critical
> section we should not worry about the state change which never happens
> because interrupts are disabled there.

I think the fact that interrupts are disabled during critical sections is understood, so there is no need to mention
that. The problem is that the method for taking the system read-only is less generally known, and readers of other
sectionsof code need to jump to the definition of CheckWALPermitted to read the comments and understand what it does.
Takefor example a code stanza from heapam.c: 

    if (needwal)
        CheckWALPermitted();

    /* NO EREPORT(ERROR) from here till changes are logged */
    START_CRIT_SECTION();

Now, I know that interrupts won't be processed after starting the critical section, but I can see plain as day that an
interruptmight get processed *during* CheckWALPermitted, since that function isn't atomic.  It might happen after the
checkis meaningfully finished but before the function actually returns.  So I'm not inclined to believe that the way
thisall works is dependent on interrupts being blocked.  So I think, maybe this is all protected by some other scheme.
Butwhat?  It's not clear from the code comments for CheckWALPermitted, so I'm left having to reverse engineer the
systemto understand it. 

One interpretation is that the signal handler will exit() my backend if it receives a signal saying that the system is
goingread-only, so there is no race condition.  But then why the call to CheckWALPermitted()?  If this interpretation
werecorrect, we'd happily enter the critical section without checking, secure in the knowledge that as long as we
haven'texited yet, all is ok. 

Another interpretation is that the whole thing is just a performance trick.  Maybe we're ok with the idea that we will
occasionallymiss the fact that wal is prohibited, do whatever work we need in the critical section, and then fail
later. But if that is true, it had better not be a panic, because designing the system to panic 1% of the time (or
whateverpercent it works out to be) isn't project style.  So looking into the critical section in the heapam.c code, I
see:

        XLogBeginInsert();
        XLogRegisterData((char *) &xlrec, SizeOfHeapInplace);

        XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
        XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);

And jumping to the definition of XLogBeginInsert() I see

    /*
     * WAL permission must have checked before entering the critical section.
     * Otherwise, WAL prohibited error will force system panic.
     */

So now I'm flummoxed.  Is it that the code is broken, or is it that I don't know what the strategy behind all this is?
Ifthere were a code comment saying how this all works, I'd be in a better position to either know that it is truly safe
oralternately know that the strategy is wrong. 

Even if my analysis that this is all flawed is incorrect, I still think that a code comment would help.

>> v33-0007:
>>
>> I don't really like what the documentation has to say about pg_prohibit_wal.  Why should pg_prohibit_wal differ from
othersignal sending functions in whether it returns a boolean?  If you believe it must always succeed, you can still
defineit as returning a boolean and always return true.  That leaves the door open to future code changes which might
needto return false for some reason. 
>>
>
> Ok, I am fine to always return true.

Ok.

>> But I also don't like the idea that existing transactions with xids are immediately killed.  Shouldn't this function
takean optional timeout, perhaps defaulting to none, but otherwise allowing the user to put the system into
WALPROHIBIT_STATE_GOING_READ_ONLYfor a period of time before killing remaining transactions? 
>>
>
> Ok, will check.
>
>> Why is this function defined to take a boolean such that pg_prohibit_wal(true) means to prohibit wal and
pg_prohibit_wal(false)means to allow wal.  Wouldn't a different function named pg_allow_wal() make it more clear?  This
alsowould be a better interface if taking the system read-only had a timeout as I suggested above, as such a timeout
parameterwhen allowing wal is less clearly useful. 
>>
>
> Like Robert, I am too inclined to have a single function that is easy
> to remember.

For C language functions that take a bool argument, I can jump to the definition using ctags, and I assume most other
developerscan do so using whatever IDE they like.  For SQL functions, it's a bit harder to jump to the definition,
particularlyif you are logged into a production server where non-essential software is intentionally missing.  Then you
haveto wonder, what exactly is the boolean argument toggling here? 

I don't feel strongly about this, though, and you don't need to change it.

> Apart from this, recently while testing this patch with
> pgbench where I have exhausted the connection limit and want to change
> the system's prohibited state in between but I was unable to do that,
> I wish I could do that using the pg_clt option. How about having a
> pg_clt option to alter wal prohibited state?

I'd have to review the implementation, but sure, that sounds like a useful ability.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: parallelizing the archiver
Следующее
От: Zhihong Yu
Дата:
Сообщение: Re: a misbehavior of partition row movement (?)