Re: Support for N synchronous standby servers - take 2

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Support for N synchronous standby servers - take 2
Дата
Msg-id 20160329.173644.22077026.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Support for N synchronous standby servers - take 2  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
I personally don't think it needs such a survive measure. It is
very small syntax and the parser reads very short text. If the
parser failes in such mode, something more serious should have
occurred.

At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g@mail.gmail.com>
> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Hello,
> >
> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q@mail.gmail.com>
> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > As mentioned in my comment, SQL parser converts yy_fatal_error
> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
> > #define'ing fprintf). So it is doable if you mind exit().
> 
> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
> flex fatal error occurs, postmaster just exits instead of jumping out of parser.

If The ERROR may be LOG or DEBUG2 either, if we think the parser
fatal erros are recoverable. guc-file.l is doing so.

> ISTM that, when an internal flex fatal error occurs, it's
> better to elog(FATAL) and terminate the problematic
> process. This might lead to the server crash (e.g., if
> postmaster emits a FATAL error, it and its all child processes
> will exit soon). But probably we can live with this because the
> fatal error basically rarely happens.

I agree to this

> OTOH, if we make the process keep running even after it gets an internal
> fatal error (like Sawada's patch or your idea do), this might cause more
> serious problem. Please imagine the case where one walsender gets the fatal
> error (e.g., because of OOM), abandon new setting value of
> synchronous_standby_names, and keep running with the previous setting value.
> OTOH, the other walsender processes successfully parse the setting and
> keep running with new setting. In this case, the inconsistency of the setting
> which each walsender is based on happens. This completely will mess up the
> synchronous replication.

On the other hand, guc-file.l seems ignoring parser errors under
normal operation, even though it may cause similar inconsistency,
if any..

| LOG:  received SIGHUP, reloading configuration files
| LOG:  input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
| LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied

> Therefore, I think that it's better to make the problematic process exit
> with FATAL error rather than ignore the error and keep it running.

+1. Restarting walsender would be far less harmful than keeping
it running in doubtful state.

Sould I wait for the next version or have a look on the latest?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: backup tools ought to ensure created backups are durable
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Relation extension scalability