Re: Streaming replication - 11.5

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Streaming replication - 11.5
Дата
Msg-id e683dbd6-931c-86ab-2056-1e02f00a9273@aklaver.com
обсуждение исходный текст
Ответ на Re: Streaming replication - 11.5  (Nicola Contu <nicola.contu@gmail.com>)
Ответы Re: Streaming replication - 11.5
Список pgsql-general
On 3/11/20 2:54 AM, Nicola Contu wrote:
> These are the lines before
> 
> 2020-03-11 09:05:08 GMT [127.0.0.1(40214)] [43853]: [1-1] 
> db=cmdv3,user=zabbix_check ERROR:  recovery is in progress
> 2020-03-11 09:05:08 GMT [127.0.0.1(40214)] [43853]: [2-1] 
> db=cmdv3,user=zabbix_check HINT:  WAL control functions cannot be 
> executed during recovery.
> 2020-03-11 09:05:08 GMT [127.0.0.1(40214)] [43853]: [3-1] 
> db=cmdv3,user=zabbix_check STATEMENT:  select 
> greatest(0,pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) from 
> pg_stat_replication where client_addr ='10.150.20.22'
> 
> That query is made by Zabbix. So I stopped the zabbix agent and tested 
> again. But still failing, because of this now :
> 
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_51199"
> *pg_basebackup: could not receive data from WAL stream: SSL SYSCALL 
> error: EOF detected
> *^C4699810/504983062 kB (70%), 0/1 tablespace 
> (...ql11/data/base/16401/231363544.2)

So you started over with a pg_basebackup?

Also from below:

2020-03-11 09:43:53 GMT [10.150.20.22(54906)] [51199]: [1-1] 
db=[unknown],user=replicator LOG:  terminating walsender process due to 
replication timeout

Where are the master and standby in relation to each other network wise?

Intervening firewalls, network latency issues?



> 
> 
> here the full log starting right before the last try :
> 
> 2020-03-11 09:22:44 GMT [] [12598]: [4508-1] db=,user= LOG: 
>   restartpoint complete: wrote 19565 buffers (0.2%); 0 WAL file(s) 
> added, 0 removed, 7 recycled; write=270.014 s, sync=0.009 s, 
> total=270.036 s; sync files=804, longest=0.001 s, average=0.000 s; 
> distance=131239 kB, estimate=725998 kB
> 2020-03-11 09:22:44 GMT [] [12598]: [4509-1] db=,user= LOG:  recovery 
> restart point at 643A/D8C05F70
> 2020-03-11 09:22:44 GMT [] [12598]: [4510-1] db=,user= DETAIL:  Last 
> completed transaction was at log time 2020-03-11 09:22:44.050084+00.
> 2020-03-11 09:23:14 GMT [] [12598]: [4511-1] db=,user= LOG: 
>   restartpoint starting: time
> 2020-03-11 09:27:44 GMT [] [12598]: [4512-1] db=,user= LOG: 
>   restartpoint complete: wrote 17069 buffers (0.2%); 0 WAL file(s) 
> added, 0 removed, 17 recycled; write=269.879 s, sync=0.006 s, 
> total=269.902 s; sync files=811, longest=0.001 s, average=0.000 s; 
> distance=120469 kB, estimate=665445 kB
> 2020-03-11 09:27:44 GMT [] [12598]: [4513-1] db=,user= LOG:  recovery 
> restart point at 643A/E01AB438
> 2020-03-11 09:27:44 GMT [] [12598]: [4514-1] db=,user= DETAIL:  Last 
> completed transaction was at log time 2020-03-11 09:27:43.945485+00.
> 2020-03-11 09:27:44 GMT [] [12598]: [4515-1] db=,user= LOG: 
>   restartpoint starting: force wait
> 2020-03-11 09:29:24 GMT [10.222.8.2(47834)] [50961]: [1-1] 
> db=cmdv3,user=nis LOG:  duration: 1402.004 ms  statement: SELECT id, 
> name, parent_id, parent, short_name, sales_rep_id FROM mmx_clients;
> 2020-03-11 09:29:34 GMT [10.222.8.2(47834)] [50961]: [2-1] 
> db=cmdv3,user=nis LOG:  duration: 9493.259 ms  statement: SELECT slid, 
> gnid, sof_id, client_id, product FROM mmx_slids;
> 2020-03-11 09:32:14 GMT [] [12598]: [4516-1] db=,user= LOG: 
>   restartpoint complete: wrote 71260 buffers (0.8%); 0 WAL file(s) 
> added, 0 removed, 13 recycled; write=269.953 s, sync=0.012 s, 
> total=269.979 s; sync files=760, longest=0.002 s, average=0.000 s; 
> distance=123412 kB, estimate=611242 kB
> 2020-03-11 09:32:14 GMT [] [12598]: [4517-1] db=,user= LOG:  recovery 
> restart point at 643A/E7A30498
> 2020-03-11 09:32:14 GMT [] [12598]: [4518-1] db=,user= DETAIL:  Last 
> completed transaction was at log time 2020-03-11 09:32:13.916101+00.
> 2020-03-11 09:32:44 GMT [] [12598]: [4519-1] db=,user= LOG: 
>   restartpoint starting: time
> 2020-03-11 09:37:14 GMT [] [12598]: [4520-1] db=,user= LOG: 
>   restartpoint complete: wrote 27130 buffers (0.3%); 0 WAL file(s) 
> added, 0 removed, 12 recycled; write=270.026 s, sync=0.007 s, 
> total=270.052 s; sync files=814, longest=0.001 s, average=0.000 s; 
> distance=280595 kB, estimate=578177 kB
> 2020-03-11 09:37:14 GMT [] [12598]: [4521-1] db=,user= LOG:  recovery 
> restart point at 643A/F8C351C8
> 2020-03-11 09:37:14 GMT [] [12598]: [4522-1] db=,user= DETAIL:  Last 
> completed transaction was at log time 2020-03-11 09:37:14.067443+00.
> 2020-03-11 09:37:44 GMT [] [12598]: [4523-1] db=,user= LOG: 
>   restartpoint starting: time
> 2020-03-11 09:42:14 GMT [] [12598]: [4524-1] db=,user= LOG: 
>   restartpoint complete: wrote 26040 buffers (0.3%); 0 WAL file(s) 
> added, 0 removed, 9 recycled; write=269.850 s, sync=0.019 s, 
> total=269.886 s; sync files=834, longest=0.002 s, average=0.000 s; 
> distance=236392 kB, estimate=543999 kB
> 2020-03-11 09:42:14 GMT [] [12598]: [4525-1] db=,user= LOG:  recovery 
> restart point at 643B/730F3F8
> 2020-03-11 09:42:14 GMT [] [12598]: [4526-1] db=,user= DETAIL:  Last 
> completed transaction was at log time 2020-03-11 09:42:13.900088+00.
> 2020-03-11 09:42:44 GMT [] [12598]: [4527-1] db=,user= LOG: 
>   restartpoint starting: time
> 2020-03-11 09:43:53 GMT [10.150.20.22(54906)] [51199]: [1-1] 
> db=[unknown],user=replicator LOG:  terminating walsender process due to 
> replication timeout
> 2020-03-11 09:47:14 GMT [] [12598]: [4528-1] db=,user= LOG: 
>   restartpoint complete: wrote 20966 buffers (0.2%); 0 WAL file(s) 
> added, 0 removed, 9 recycled; write=270.048 s, sync=0.014 s, 
> total=270.085 s; sync files=852, longest=0.001 s, average=0.000 s; 
> distance=183749 kB, estimate=507974 kB
> 2020-03-11 09:47:14 GMT [] [12598]: [4529-1] db=,user= LOG:  recovery 
> restart point at 643B/12680A80
> 2020-03-11 09:47:14 GMT [] [12598]: [4530-1] db=,user= DETAIL:  Last 
> completed transaction was at log time 2020-03-11 09:47:14.069731+00.
> 2020-03-11 09:47:44 GMT [] [12598]: [4531-1] db=,user= LOG: 
>   restartpoint starting: time
> 
> 
> 
> Il giorno mer 11 mar 2020 alle ore 01:53 Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> ha scritto:
> 
>     On 3/10/20 8:17 AM, Nicola Contu wrote:
>     Please post to list also.
>     Ccing list.
> 
>     What came immediately before the temporary file error?
> 
>      >    2020-03-10 15:10:17 GMT [[local]] [28171]: [1-1]
>      > db=postgres,user=postgres LOG:  temporary file: path
>      > "base/pgsql_tmp/pgsql_tmp28171.0", size 382474936
>      > 2020-03-10 15:10:17 GMT [[local]] [28171]: [4-1]
>      > db=postgres,user=postgres LOG:  could not send data to client:
>     Broken pipe
>      > 2020-03-10 15:10:17 GMT [[local]] [28171]: [5-1]
>      > db=postgres,user=postgres FATAL:  connection to client lost
>      > 2020-03-10 15:10:26 GMT [] [12598]: [3544-1] db=,user= LOG:
>      >   restartpoint complete: wrote 37315 buffers (0.4%); 0 WAL file(s)
>      > added, 0 removed, 16 recycled; write=269.943 s, sync=0.039 s,
>      > total=269.999 s; sync files=1010, longest=0.001 s, average=0.000 s;
>      > distance=175940 kB, estimate=416149 kB
>      > 2020-03-10 15:10:26 GMT [] [12598]: [3545-1] db=,user= LOG:
>       recovery
>      > restart point at 6424/1D7DEDE8
>      >
>      > It is a cascade replication
>      >
>      > Il giorno mar 10 mar 2020 alle ore 15:58 Adrian Klaver
>      > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>     <mailto:adrian.klaver@aklaver.com
>     <mailto:adrian.klaver@aklaver.com>>> ha scritto:
>      >
>      >     On 3/10/20 2:26 AM, Nicola Contu wrote:
>      >      > Hello,
>      >      > I have two servers connected to the same switch running
>     postgres 11.5
>      >      >
>      >      > I am trying to replicate one of those servers after a planned
>      >     work on
>      >      > the master, so the replica has been lost. It has always worked
>      >     but now I
>      >      > get this :
>      >      >
>      >      > pg_basebackup: could not receive data from WAL stream: server
>      >     closed the
>      >      > connection unexpectedly
>      >      >          This probably means the server terminated abnormally
>      >      >          before or while processing the request.
>      >      >
>      >      > I don't really understand what the issue is.
>      >
>      >     I would start with the logs from the Postgres server you are
>     taking the
>      >     backup from.
>      >
>      >      > I had this issue last week as well in another DC and I had to
>      >     reboot the
>      >      > slave to make it working (not sure why it helped)
>      >      >
>      >      > Do you know what can cause this?
>      >      >
>      >      > Thank you,
>      >      > Nicola
>      >
>      >
>      >     --
>      >     Adrian Klaver
>      > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>     <mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>
>      >
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: How to discover what table is
Следующее
От: Nicola Contu
Дата:
Сообщение: Re: Streaming replication - 11.5