Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity
Дата
Msg-id CAHGQGwEQWk6J+vqQnnVFFngtMdMuw-AwuEFnq=u3Wb2STXu3TA@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #10123: Weird entries in pg_stat_activity  (maxim.boguk@gmail.com)
Ответы Re: [BUGS] BUG #10123: Weird entries in pg_stat_activity
Список pgsql-bugs
On Thu, Apr 24, 2014 at 7:19 PM <maxim.boguk@gmail.com> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      10123
> Logged by:          Maxim Boguk
> Email address:      maxim.boguk@gmail.com
> PostgreSQL version: 9.2.8
> Operating system:   Linux
> Description:
>
> Hi everyone,
>
> During debug of the weird nagios database alerts about long transactions I
> found that sometime pg_stat_activity contains very weird entries with
> xact_start NOT NULL and state='idle'... what's more they happened during
> backend idle state.
>
> To check what's going on I performed the following actions on production
> database:
> 1)set log_min_duration_statement=0 to see ALL database queries
> 2)run single line bash script to check for weird entries:
> while true; do  psql -x -X -c "select now() as ts,now()-xact_start as
> tx_age, now()-state_change as change_age,* from pg_stat_activity where
> state='idle' and xact_start is not null" |  grep -vE '^$|No rows'; sleep
> 0.1; done
>
> And yes, once-twice-trice per minute script output some interesting results
> such as:
>
>
> -[ RECORD 1
>
]----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ts               | 2014-04-24 13:49:03.498354+04
> tx_age           | 00:04:01.283106
> change_age       | 00:04:01.279373
> datid            | 16385
> datname          | *******
> pid              | 685
> usesysid         | 16384
> usename          | *******
> application_name |
> client_addr      | 127.0.0.1
> client_hostname  |
> client_port      | 37461
> backend_start    | 2014-04-24 13:30:03.275711+04
> xact_start       | 2014-04-24 13:45:02.215248+04
> query_start      | 2014-04-24 13:45:02.215248+04
> state_change     | 2014-04-24 13:45:02.218981+04
> waiting          | f
> state            | idle
> query            |
>                  |                     SELECT cache.save_response(
>                  |                      *****
>                  |                     );
>
>
> State=idle
> xact_start=4 min ago
>
> Now what I found in the full database log for this pid:
>
> ...
>
> 2014-04-24 13:45:02 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0]
> [SELECT] LOG:  duration: 3.703 ms  statement:
>                             SELECT cache.save_response(...)
>
> -- no queries between 13:45 and 13:50 at all!
>
> 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0]
> [SELECT] LOG:  duration: 0.962 ms  statement: select 1
> 2014-04-24 13:50:01 MSK 685 ****@**** from 127.0.0.1 [vxid:24/0 txid:0]
> [SELECT] LOG:  duration: 0.415 ms  statement: SELECT * FROM o_queries_queue
> ...
>
> So it seems that sometime pg_stat_activity entry become alive with the old
> information (it shows xact_start from the last executed query in backend,
> even if that query finished long time ago).
> I know about asynchronous behavior of pg_stat_activity but getting
> xact_start from deep past seems wrong for me.
>
> What's more: i tested that problem on another databases - and found they
> have same effect (so it doesn't look like a problem local to that single
> exact database).

Sorry for replying to a very old thread.

My customer also observed weird entries in pg_stat_activity, with state="idle"
and xact_start showing an old timestamp. I figured out this issue occurs
if catchup interrupts arrive in "idle" state and
ProcessCatchupInterrupt() is called.

ProcessCatchupInterrupt() starts and completes the transaction, as follows,
without changing the state (it remains "idle"). However,
StartTransactionCommand() sets xact_start to the timestamp of the last query
executed, not the current timestamp. If pg_stat_activity is read during
this brief transaction, the weird entries (state="idle" and
xact_start="old timestamp")
can appear.

        else
        {
                elog(DEBUG4, "ProcessCatchupEvent outside transaction");
                StartTransactionCommand();
                CommitTransactionCommand();
        }

Similarly, RemoveTempRelationsCallback() can cause the same issue.
When a backend exits and RemoveTempRelationsCallback() is called,
it also starts and completes a transaction, as follows, without changing
the state. This sets xact_start to the timestamp of the last query
executed, too.

        if (OidIsValid(myTempNamespace)) /* should always be true */
        {
                /* Need to ensure we have a usable transaction. */
                AbortOutOfAnyTransaction();
                StartTransactionCommand();
                PushActiveSnapshot(GetTransactionSnapshot());

                RemoveTempRelations(myTempNamespace);

                PopActiveSnapshot();
                CommitTransactionCommand();
        }

Additionally, exec_describe_statement_message() and
exec_describe_portal_message() can cause pg_stat_activity records
with state="idle" but xact_start!=NULL. However, these cases call
SetCurrentStatementStartTimestamp(), setting xact_start to
the current timestamp instead of an old one.

I'm not sure if these are bugs. However, to prevent confusion from
these odd pg_stat_activity entries, I suggest the following changes:

        1. Modify ProcessCatchupInterrupt() and RemoveTempRelationsCallback()
            to reset stmtStartTimestamp (timestamp of the last query
executed) to 0
            before starting the transaction. This way, during the transaction,
            records with state="idle" and xact_start=NULL would be observed.

        2. Update exec_describe_statement_message() and
exec_describe_portal_message()
            to change the state to "active," similar to exec_simple_query().
            This would result in records with state="active" and
xact_start="current timestamp"
            during those operations.

Thoughts?

Regards,

--
Fujii Masao



В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Ware, Christopher M. (LARC-D318)[RSES]"
Дата:
Сообщение: RE: [EXTERNAL] Re: BUG #18528: Installer displays error when installing
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #18500: Detaching a partition with an index manually attached to the parent's index triggers Assert