Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Дата
Msg-id 004901cdf86a$14a8a1b0$3df9e510$@kapila@huawei.com
обсуждение исходный текст
Ответ на Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
On Monday, January 21, 2013 6:22 PM Magnus Hagander
> On Fri, Jan 18, 2013 at 7:50 AM, Amit Kapila <amit.kapila@huawei.com>
> wrote:
> > On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote:
> >> On 07.01.2013 16:23, Boszormenyi Zoltan wrote:
> >> > Since my other patch against pg_basebackup is now committed,
> >> > this patch doesn't apply cleanly, patch rejects 2 hunks.
> >> > The fixed up patch is attached.
> >>
> >> Now that I look at this a high-level perspective, why are we only
> >> worried about timeouts in the Copy-mode and when connecting? The
> >> initial
> >> checkpoint could take a long time too, and if the server turns into
> a
> >> black hole while the checkpoint is running, pg_basebackup will still
> >> hang. Then again, a short timeout on that phase would be a bad idea,
> >> because the checkpoint can indeed take a long time.
> >
> > True, but IMO, if somebody want to take basebackup, he should do that
> when
> > the server is not loaded.
> 
> A lot of installations don't have such an optino, because there is no
> time whe nthe server is not loaded.

Good to know about it. 
I have always heard that customer will run background maintenance activities
(Reindex, Vacuum Full, etc) when the server is less loaded. 
For example 
a. Billing applications in telecom, at night times they can be relatively
less loaded.
b. Any databases used for Sensex transactions, they will be relatively free
once the market is closed.
c. Banking solutions, because transactions are done mostly in day times.

There will be many cases where Database server will be loaded all the times,
if you can give some example, it will be a good learning for me.

> >> In streaming replication, the keep-alive messages carry additional
> >> information, the timestamps and WAL locations, so a keepalive makes
> >> sense at that level. But otherwise, aren't we just trying to
> >> reimplement
> >> TCP keepalives? TCP keepalives are not perfect, but if we want to
> have
> >> an application level timeout, it should be implemented in the FE/BE
> >> protocol.
> >>
> >> I don't think we need to do anything specific to pg_basebackup. The
> >> user
> >> can simply specify TCP keepalive settings in the connection string,
> >> like
> >> with any libpq program.
> >
> > I think currently user has no way to specify TCP keepalive settings
> from
> > pg_basebackup, please let me know if there is any such existing way?
> 
> You can set it through environment variables. As was discussed
> elsewhere, it would be good to have the ability to do it natively to
> pg_basebackup as well.

Sure, already modifying the existing patch to support connection string in
pg_basebackup and pg_receivexlog.

> 
> > I think specifying TCP settings is very cumbersome for most users,
> that's
> > the reason most standard interfaces (ODBC/JDBC) have such application
> level
> > timeout mechanism.
> >
> > By implementing in FE/BE protocol (do you mean to say that make such
> > non-blocking behavior inside Libpq or something else), it might be
> generic
> > and can be used for others as well but it might need few interface
> changes.
> 
> If it's specifying them that is cumbersome, then that's the part we
> should fix, rather than modifying the protocol, no?

That can be done as part of point 2 of initial proposal
(2. Support recv_timeout separately to provide a way to users who are not
comfortable tcp keepalives).

To achieve this there can be 2 ways.
1. Change in FE/BE protocol - I am not sure exactly how this can be done,
but as per Heikki this is better way of implementing it.
2. Make the socket as non-blocking in pg_basebackup.

Advantage of Approach-1 is that if we do in such a fashion that in lower
layers (libpq) it is addressed then all other apps (pg_basebackup, etc) can
use it, no need to handle separately in each application.

So now as changes in Approach-1 seems to be invasive, we decided to do it
later. 

With Regards,
Amit Kapila.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: CF3+4 (was Re: Parallel query execution)
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Patch for removng unused targets