Re: pg_basebackup -x stream from the standby gets stuck
От | Magnus Hagander |
---|---|
Тема | Re: pg_basebackup -x stream from the standby gets stuck |
Дата | |
Msg-id | CABUevEyRy-6V6EBocFq0Mzb=73DmkLrNtxQyODP5tw3BC0H=bg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_basebackup -x stream from the standby gets stuck (Magnus Hagander <magnus@hagander.net>) |
Ответы |
Re: pg_basebackup -x stream from the standby gets stuck
(Fujii Masao <masao.fujii@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Mar 2, 2012 at 2:26 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> Hi, >>>> >>>> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ >>>>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication >>>>> xlog start point: 2/AC4E2600 >>>>> pg_basebackup: starting background WAL receiver >>>>> 692447/692447 kB (100%), 1/1 tablespace >>>>> xlog end point: 2/AC4E2600 >>>>> pg_basebackup: waiting for background process to finish streaming... >>>>> pg_basebackup: base backup completed >>>>> >>>>> real 3m56.237s >>>>> user 0m0.224s >>>>> sys 0m0.936s >>>>> >>>>> (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) >>>> >>>> The above article points out the problem of pg_basebackup from the standby: >>>> when "-x stream" is specified, pg_basebackup from the standby gets stuck if >>>> there is no traffic in the database. >>>> >>>> When "-x stream" is specified, pg_basebackup forks the background process >>>> for receiving WAL records during backup, takes an online backup and waits for >>>> the background process to end. The forked background process keeps receiving >>>> WAL records, and whenever it reaches end of WAL file, it checks whether it has >>>> already received all WAL files required for the backup, and exits if yes. Which >>>> means that at least one WAL segment switch is required for pg_basebackup with >>>> "-x stream" option to end. >>>> >>>> In the backup from the master, WAL file switch always occurs at both start and >>>> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the >>>> above logic works fine even if there is no traffic. OTOH, in the backup from the >>>> standby, while there is no traffic, WAL file switch is not performed at all. So >>>> in that case, there is no chance that the background process reaches end of WAL >>>> file, check whether all required WAL arrives and exit. At the end, pg_basebackup >>>> gets stuck. >>>> >>>> To fix the problem, I'd propose to change the background process so that it >>>> checks whether all required WAL has arrived, every time data is received, even >>>> if end of WAL file is not reached. Patch attached. Comments? >>> >>> This seems like a good thing in general. >>> >>> Why does it need to modify pg_receivexlog, though? I thought only >>> pg_basebackup had tihs issue? >>> >>> I guess it is because of the change of the API to >>> stream_continue_callback only? >> >> Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c. >> >> But the reason why I changed segment_callback() in pg_receivexlog.c is not the >> same. I did that because previously segment_finish_callback is called >> only at the >> end of WAL segment but in the patch it can be called at the middle of segment. >> OTOH, segment_callback() must emit a verbose message only when current >> WAL segment is complete. So I had to add the check of whether current WAL >> segment is partial or complete into segment_callback(). > > Yeah, I caught that. > > >>> Looking at it after your patch, >>> stream_continue_callback and segment_finish_callback are the same. >>> Should we perhaps just fold them into a single >>> stream_continue_callback? Since you had to move the "detect segment >>> end" to the caller anyway? >> >> No. I think we cannot do that because in pg_receivexlog they are not the same. > > But couldn't they be made the same by making the same check as you put > in for the verbose message above? > While reviewing and cleaning this patch up a bit I noticed it actually broke pg_receivexlog in the renaming. Here is a new version of the patch, reworked based on the above so we're down to a single callback. I moved the "rename last segment file even if it's not complete" to be a parameter into ReceiveXlogStream() instead of trying to overload a third functionality on the callback (which is what broke pg_receivexlog). How does this look? Have I overlooked any cases? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Вложения
В списке pgsql-hackers по дате отправления: