Обсуждение: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)

Поиск
Список
Период
Сортировка

PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)

От
"Chen, Yan-Jack (NSB - CN/Hangzhou)"
Дата:

Hi,

  We recently encounter one issue about PostgreSQL ODBC client doesn’t receive response from PostgreSQL server in time (client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours).  Both PostgreSQL and its client are deployed in VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via below configuration.

 

log_min_messages            = debug5

log_min_error_statement     = debug5

log_line_prefix             = '[%p-%c-%l] %m <%h %i> '

log_error_verbosity         = verbose

log_statement               = 'all'

logging_collector           = on

log_destination             = 'stderr'

log_directory               = '/tmp'

log_filename                = 'postgresql-%Y-%m-%d_%H%M%S.log'

log_duration                = on

log_autovacuum_min_duration = 2

 

Client log indicate it received response 10 seconds after sending the request:

 

Nov 13 22:25:58 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:25:58:050678|TEXT:WQTST before call db:1|DATA:

Nov 13 22:25:58 CFPU-0 raktor[2701622]: RU:OMU-0|PNAME:QNRAKTOR|LPID:2701622|PID:0x4EE 0x31 0x0|TIME:2021-11-14 00:26:03:054718|TEXT:RAV:set_wcel_state_r-db response timeout|DATA:

Nov 13 22:26:08 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:26:08:012336|TEXT:RAV Exit: set_wcel_state__r:= |DATA:

Nov 13 22:26:08 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:26:08:012501|TEXT:WQTST after call db:1|DATA: 

 

While PostgreSQL log recorded for the procedure indicate it executed quickly.

 

[2701833-618d1b70.293a09-173237] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> DEBUG:  00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173238] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173239] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> STATEMENT:  DEALLOCATE "_PLAN0x7f2ec021ab60"

[2701833-618d1b70.293a09-173240] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOG:  00000: statement: DEALLOCATE "_PLAN0x7f2ec021ab60"

[2701833-618d1b70.293a09-173241] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOCATION:  exec_simple_query, postgres.c:1042

[2701833-618d1b70.293a09-173242] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> DEBUG:  00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173243] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173244] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> STATEMENT:  DEALLOCATE "_PLAN0x7f2ec021ab60"

[2701833-618d1b70.293a09-173245] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOG:  00000: duration: 0.098 ms

[2701833-618d1b70.293a09-173246] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOCATION:  exec_simple_query, postgres.c:1287

[2701833-618d1b70.293a09-173247] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> DEBUG:  00000: parse _PLAN0x7f2ec021ab60: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402

[2701833-618d1b70.293a09-173248] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> LOCATION:  exec_parse_message, postgres.c:1342

[2701833-618d1b70.293a09-173249] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> STATEMENT:  UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402

[2701833-618d1b70.293a09-173250] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> DEBUG:  00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173251] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173252] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> STATEMENT:  UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402

[2701833-618d1b70.293a09-173253] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOG:  00000: duration: 0.089 ms

[2701833-618d1b70.293a09-173254] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION:  exec_parse_message, postgres.c:1550

[2701833-618d1b70.293a09-173255] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> DEBUG:  00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173256] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173257] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG:  00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173258] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173259] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG:  00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173260] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173261] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG:  00000: bind <unnamed> to _PLAN0x7f2ec021ab60

[2701833-618d1b70.293a09-173262] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION:  exec_bind_message, postgres.c:1600

[2701833-618d1b70.293a09-173263] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> DEBUG:  00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0

[2701833-618d1b70.293a09-173264] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOCATION:  ShowTransactionStateRec, xact.c:5333

[2701833-618d1b70.293a09-173265] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> STATEMENT:  UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402

[2701833-618d1b70.293a09-173266] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOG:  00000: duration: 0.107 ms

[2701833-618d1b70.293a09-173267] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOCATION:  exec_bind_message, postgres.c:1917

[2701833-618d1b70.293a09-173268] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOG:  00000: execute _PLAN0x7f2ec021ab60: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402

[2701833-618d1b70.293a09-173269] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DETAIL:  parameters: $1 = '0'

[2701833-618d1b70.293a09-173270] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION:  exec_execute_message, postgres.c:2055

[2701833-618d1b70.293a09-173271] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOG:  00000: duration: 0.080 ms

[2701833-618d1b70.293a09-173272] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION:  exec_execute_message, postgres.c:2138

[2701833-618d1b70.293a09-173273] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DEBUG:  00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 23280/1/1

[2701833-618d1b70.293a09-173274] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION:  ShowTransactionStateRec, xact.c:5333

 

Finally, the tcpdump indicate the delay happened at PostgreSQL server.

 

 

This mail is not to investigate why PostgreSQL server delay sending the response message. We suspect the backend I/O performance issue.

This mail is to ask why PostgreSQL debug log doesn’t really include the response message delay which may cause misleading why troubleshooting. It looks to me the debug log doesn’t record the whole procedure. If there are some developer options include the missing part but we didn’t enable?

 

  

 

 

Best Regards

It always takes longer than you expect,

even when you take into account

----------------------------------------------------------------------------------------------

Yan-Jack Chen (陈雁)

Tel: +8613957141340

Addr: No.567 XinCheng Rd, Binjiang District, Hangzhou, China, 310053

 

Вложения
"Chen, Yan-Jack (NSB - CN/Hangzhou)" <yan-jack.chen@nokia-sbell.com> writes:
>   We recently encounter one issue about PostgreSQL ODBC client doesn¡¯t receive response from PostgreSQL server in
time(client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours).  Both PostgreSQL and its client are
deployedin VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via
belowconfiguration. 

We can see from this that the server spent 10 seconds in
CommitTransaction, so the question is what took so long.
I'd wonder first about end-of-transaction triggers (have you got foreign
keys on that table?  maybe an event trigger?), and then second about
delays in writing/fsyncing WAL (what's the underlying storage?  do you
have synchronous_commit replication turned on?).

> This mail is to ask why PostgreSQL debug log doesn¡¯t really include the response message delay which may cause
misleadingwhy troubleshooting. It looks to me the debug log doesn¡¯t record the whole procedure. If there are some
developeroptions include the missing part but we didn¡¯t enable? 

[ shrug... ] Sure, we could put an elog(DEBUG) after every line of code
in the server, and then high-level debugging logs would be even more
impossibly voluminous than they are now.  I'd say the existing logging
gave you plenty of clue where to look.

            regards, tom lane



RE: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)

От
"Chen, Yan-Jack (NSB - CN/Hangzhou)"
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us>  writes: 
> We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long.
I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table?  maybe an event trigger?),
andthen second about delays in writing/fsyncing WAL (what's the underlying storage?  do you have synchronous_commit
replicationturned on?).
 

We have the same as your second suspicion. Delays in fsyncing WAL as the backend ceph cluster storage and we see there
wassignificant latency increased during that time. Yes. the synchronous_commit is 
 
on. 

fsync                                                   | on                                           | Forces
synchronizationof updates to disk.
 
synchronous_commit                     | on                                           | Sets the current transaction's
synchronizationlevel.
 


> [ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging
logswould be even more impossibly voluminous than they are now.  I'd say the existing logging gave you plenty of clue
whereto look.
 

Actually, do not need to write log for every line code. The last debug log we can see is for CommitTransaction which
stateis INPROGRESS. But we can't see when  CommitTransaction state is DONE/COMPLETE from the debug log which I think
thiskind of log is valuable which can help to identify where the delay comes, from server or client from the postgres
serverdebug log only shall be enough without to monitoring and analyzing the TCP message.
 

[2701833-618d1b70.293a09-173273] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DEBUG:  00000: CommitTransaction(1)
name:unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 23280/1/1
 
[2701833-618d1b70.293a09-173274] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION:  ShowTransactionStateRec,
xact.c:5333



Best Regards
It always takes longer than you expect, 
even when you take into account
----------------------------------------------------------------------------------------------
Yan-Jack Chen (陈雁)
Tel: +8613957141340
Addr: No.567 XinCheng Rd, Binjiang District, Hangzhou, China, 310053

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: 2021年11月15日 23:09
To: Chen, Yan-Jack (NSB - CN/Hangzhou) <yan-jack.chen@nokia-sbell.com>
Cc: pgsql-general@postgresql.org
Subject: Re: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)

"Chen, Yan-Jack (NSB - CN/Hangzhou)" <yan-jack.chen@nokia-sbell.com> writes:
>   We recently encounter one issue about PostgreSQL ODBC client doesn¡¯t receive response from PostgreSQL server in
time(client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours).  Both PostgreSQL and its client are
deployedin VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via
belowconfiguration.
 

We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long.
I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table?  maybe an event trigger?),
andthen second about delays in writing/fsyncing WAL (what's the underlying storage?  do you have synchronous_commit
replicationturned on?).
 

> This mail is to ask why PostgreSQL debug log doesn¡¯t really include the response message delay which may cause
misleadingwhy troubleshooting. It looks to me the debug log doesn¡¯t record the whole procedure. If there are some
developeroptions include the missing part but we didn¡¯t enable?
 

[ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging
logswould be even more impossibly voluminous than they are now.  I'd say the existing logging gave you plenty of clue
whereto look.
 

            regards, tom lane