Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity
От | Fujii Masao |
---|---|
Тема | Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity |
Дата | |
Msg-id | CAHGQGwEQWk6J+vqQnnVFFngtMdMuw-AwuEFnq=u3Wb2STXu3TA@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #10123: Weird entries in pg_stat_activity (maxim.boguk@gmail.com) |
Ответы |
Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity
|
Список | pgsql-bugs |
On Thu, Apr 24, 2014 at 7:19 PM <maxim.boguk@gmail.com> wrote: > > The following bug has been logged on the website: > > Bug reference: 10123 > Logged by: Maxim Boguk > Email address: maxim.boguk@gmail.com > PostgreSQL version: 9.2.8 > Operating system: Linux > Description: > > Hi everyone, > > During debug of the weird nagios database alerts about long transactions I > found that sometime pg_stat_activity contains very weird entries with > xact_start NOT NULL and state='idle'... what's more they happened during > backend idle state. > > To check what's going on I performed the following actions on production > database: > 1)set log_min_duration_statement=0 to see ALL database queries > 2)run single line bash script to check for weird entries: > while true; do psql -x -X -c "select now() as ts,now()-xact_start as > tx_age, now()-state_change as change_age,* from pg_stat_activity where > state='idle' and xact_start is not null" | grep -vE '^$|No rows'; sleep > 0.1; done > > And yes, once-twice-trice per minute script output some interesting results > such as: > > > -[ RECORD 1 > ]----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ts | 2014-04-24 13:49:03.498354+04 > tx_age | 00:04:01.283106 > change_age | 00:04:01.279373 > datid | 16385 > datname | ******* > pid | 685 > usesysid | 16384 > usename | ******* > application_name | > client_addr | 127.0.0.1 > client_hostname | > client_port | 37461 > backend_start | 2014-04-24 13:30:03.275711+04 > xact_start | 2014-04-24 13:45:02.215248+04 > query_start | 2014-04-24 13:45:02.215248+04 > state_change | 2014-04-24 13:45:02.218981+04 > waiting | f > state | idle > query | > | SELECT cache.save_response( > | ***** > | ); > > > State=idle > xact_start=4 min ago > > Now what I found in the full database log for this pid: > > ... > > 2014-04-24 13:45:02 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 3.703 ms statement: > SELECT cache.save_response(...) > > -- no queries between 13:45 and 13:50 at all! > > 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 0.962 ms statement: select 1 > 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0] > [SELECT] LOG: duration: 0.415 ms statement: SELECT * FROM o_queries_queue > ... > > So it seems that sometime pg_stat_activity entry become alive with the old > information (it shows xact_start from the last executed query in backend, > even if that query finished long time ago). > I know about asynchronous behavior of pg_stat_activity but getting > xact_start from deep past seems wrong for me. > > What's more: i tested that problem on another databases - and found they > have same effect (so it doesn't look like a problem local to that single > exact database). Sorry for replying to a very old thread. My customer also observed weird entries in pg_stat_activity, with state="idle" and xact_start showing an old timestamp. I figured out this issue occurs if catchup interrupts arrive in "idle" state and ProcessCatchupInterrupt() is called. ProcessCatchupInterrupt() starts and completes the transaction, as follows, without changing the state (it remains "idle"). However, StartTransactionCommand() sets xact_start to the timestamp of the last query executed, not the current timestamp. If pg_stat_activity is read during this brief transaction, the weird entries (state="idle" and xact_start="old timestamp") can appear. else { elog(DEBUG4, "ProcessCatchupEvent outside transaction"); StartTransactionCommand(); CommitTransactionCommand(); } Similarly, RemoveTempRelationsCallback() can cause the same issue. When a backend exits and RemoveTempRelationsCallback() is called, it also starts and completes a transaction, as follows, without changing the state. This sets xact_start to the timestamp of the last query executed, too. if (OidIsValid(myTempNamespace)) /* should always be true */ { /* Need to ensure we have a usable transaction. */ AbortOutOfAnyTransaction(); StartTransactionCommand(); PushActiveSnapshot(GetTransactionSnapshot()); RemoveTempRelations(myTempNamespace); PopActiveSnapshot(); CommitTransactionCommand(); } Additionally, exec_describe_statement_message() and exec_describe_portal_message() can cause pg_stat_activity records with state="idle" but xact_start!=NULL. However, these cases call SetCurrentStatementStartTimestamp(), setting xact_start to the current timestamp instead of an old one. I'm not sure if these are bugs. However, to prevent confusion from these odd pg_stat_activity entries, I suggest the following changes: 1. Modify ProcessCatchupInterrupt() and RemoveTempRelationsCallback() to reset stmtStartTimestamp (timestamp of the last query executed) to 0 before starting the transaction. This way, during the transaction, records with state="idle" and xact_start=NULL would be observed. 2. Update exec_describe_statement_message() and exec_describe_portal_message() to change the state to "active," similar to exec_simple_query(). This would result in records with state="active" and xact_start="current timestamp" during those operations. Thoughts? Regards, -- Fujii Masao
В списке pgsql-bugs по дате отправления:
Предыдущее
От: "Ware, Christopher M. (LARC-D318)[RSES]"Дата:
Сообщение: RE: [EXTERNAL] Re: BUG #18528: Installer displays error when installing
Следующее
От: Alvaro HerreraДата:
Сообщение: Re: BUG #18500: Detaching a partition with an index manually attached to the parent's index triggers Assert