Обсуждение: Momentary Delay

Поиск
Список
Период
Сортировка

Momentary Delay

От
Anderson Valadares
Дата:
Hi
 We are experiencing an intermittent problem in a GIS database from a client. Some times of the day is as if the PostgreSQL executed the slowest operations see below an example. The query was performed three times, twice I canceled and the third time I left it to end. The consultation took 10 seconds to finish, but the measured execution time is 20 ms. As we have several queries that run every 2 seconds when this momentary delay occurs queries accumulate and burst the limit of 203 connections allowed. The interval of "momentary stop" are approximately 2 seconds occurring at random times and during these stoppages occur no locks and no increased load on the server is identified. There is a job 2/2 seconds locks collection of information, running queries, etc., nmon also collects information every 2 seconds. My client asks what causes these momentary stops? because it affects all operations of the database? etc. How do I identify what is causing these delays in executions of operations in the database?

The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit

Evolution of the number of connections for a period

snap_datetime                        |  #connections
----------------------------------------------+-------------
2015-06-05 09:25:00.954731-03 | 74
2015-06-05 09:26:00.249187-03 | 205
2015-06-05 09:27:00.826874-03 | 207
2015-06-05 09:28:00.374666-03 | 73
2015-06-05 09:29:00.690696-03 | 75


Occurrence of the problem

citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
            dvi_sub.via_cod_viagem
             from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem via_sub, gis_iti_itinerario iti_sub
             where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
               and via_sub.via_status_viagem = 'A'
               and via_sub.via_dt_hora_ini > now() - interval '9 hours'
               and iti_sub.lin_cod_linha = 389
               and iti_sub.iti_sentido_itinerario = 'I'
               and iti_sub.iti_cod_itinerario_linha = via_sub.iti_cod_itinerario_linha
               and dvi_sub.vlo_cod_localizacao is not null
             group by dvi_sub.via_cod_viagem, iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2555.557 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
            dvi_sub.via_cod_viagem
             from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem via_sub, gis_iti_itinerario iti_sub
             where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
               and via_sub.via_status_viagem = 'A'
               and via_sub.via_dt_hora_ini > now() - interval '9 hours'
               and iti_sub.lin_cod_linha = 389
               and iti_sub.iti_sentido_itinerario = 'I'
               and iti_sub.iti_cod_itinerario_linha = via_sub.iti_cod_itinerario_linha
               and dvi_sub.vlo_cod_localizacao is not null
             group by dvi_sub.via_cod_viagem, iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2192.572 ms
citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
            dvi_sub.via_cod_viagem
             from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem via_sub, gis_iti_itinerario iti_sub
             where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
               and via_sub.via_status_viagem = 'A'
               and via_sub.via_dt_hora_ini > now() - interval '9 hours'
               and iti_sub.lin_cod_linha = 389
               and iti_sub.iti_sentido_itinerario = 'I'
               and iti_sub.iti_cod_itinerario_linha = via_sub.iti_cod_itinerario_linha
               and dvi_sub.vlo_cod_localizacao is not null
             group by dvi_sub.via_cod_viagem, iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 10164.015 ms
citgis=# select now();
              now
-------------------------------
 2015-06-05 09:27:22.006072-03
(1 row)

Time: 0.152 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
            dvi_sub.via_cod_viagem
             from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem via_sub, gis_iti_itinerario iti_sub
             where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
               and via_sub.via_status_viagem = 'A'
               and via_sub.via_dt_hora_ini > now() - interval '9 hours'
               and iti_sub.lin_cod_linha = 389
               and iti_sub.iti_sentido_itinerario = 'I'
               and iti_sub.iti_cod_itinerario_linha = via_sub.iti_cod_itinerario_linha
               and dvi_sub.vlo_cod_localizacao is not null
             group by dvi_sub.via_cod_viagem, iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
 dvi_cod_dados_via_iti | via_cod_viagem
-----------------------+----------------
            1059964443 |        7989813
            1060072723 |        7990876
(2 rows)

Time: 5565.175 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
            dvi_sub.via_cod_viagem
             from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem via_sub, gis_iti_itinerario iti_sub
             where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
               and via_sub.via_status_viagem = 'A'
               and via_sub.via_dt_hora_ini > now() - interval '9 hours'
               and iti_sub.lin_cod_linha = 389
               and iti_sub.iti_sentido_itinerario = 'I'
               and iti_sub.iti_cod_itinerario_linha = via_sub.iti_cod_itinerario_linha
               and dvi_sub.vlo_cod_localizacao is not null
             group by dvi_sub.via_cod_viagem, iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
 dvi_cod_dados_via_iti | via_cod_viagem
-----------------------+----------------
            1059964445 |        7989813
            1060072725 |        7990876
(2 rows)

Time: 27.944 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
            dvi_sub.via_cod_viagem
             from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem via_sub, gis_iti_itinerario iti_sub
             where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
               and via_sub.via_status_viagem = 'A'
               and via_sub.via_dt_hora_ini > now() - interval '9 hours'
               and iti_sub.lin_cod_linha = 389
               and iti_sub.iti_sentido_itinerario = 'I'
               and iti_sub.iti_cod_itinerario_linha = via_sub.iti_cod_itinerario_linha
               and dvi_sub.vlo_cod_localizacao is not null
             group by dvi_sub.via_cod_viagem, iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
 dvi_cod_dados_via_iti | via_cod_viagem
-----------------------+----------------
            1059964445 |        7989813
            1060072727 |        7990876
(2 rows)

Time: 24.428 ms

Greetings
  
  Anderson

Re: Momentary Delay

От
Adrian Klaver
Дата:
On 06/08/2015 07:59 AM, Anderson Valadares wrote:
> Hi
>   We are experiencing an intermittent problem in a GIS database from a
> client. Some times of the day is as if the PostgreSQL executed the
> slowest operations see below an example. The query was performed three
> times, twice I canceled and the third time I left it to end. The
> consultation took 10 seconds to finish, but the measured execution time
> is 20 ms. As we have several queries that run every 2 seconds when this
> momentary delay occurs queries accumulate and burst the limit of 203
> connections allowed. The interval of "momentary stop" are approximately
> 2 seconds occurring at random times and during these stoppages occur no
> locks and no increased load on the server is identified. There is a job
> 2/2 seconds locks collection of information, running queries, etc., nmon
> also collects information every 2 seconds.

I do not have an answer, but I do have a question. What does:

"There is a job
2/2 seconds locks collection of information, running queries, etc"

mean?


My client asks what causes
> these momentary stops? because it affects all operations of the
> database? etc. How do I identify what is causing these delays in
> executions of operations in the database?
>
> The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
>
> Evolution of the number of connections for a period
>
> snap_datetime                       |  #connections
> ----------------------------------------------+-------------
> 2015-06-05 09:25:00.954731-03 |74
> 2015-06-05 09:26:00.249187-03 |205
> 2015-06-05 09:27:00.826874-03 |207
> 2015-06-05 09:28:00.374666-03 |73
> 2015-06-05 09:29:00.690696-03 |75
>
>
> Occurrence of the problem
>
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>              dvi_sub.via_cod_viagem
>               from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem via_sub, gis_iti_itinerario iti_sub
>               where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                 and via_sub.via_status_viagem = 'A'
>                 and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                 and iti_sub.lin_cod_linha = 389
>                 and iti_sub.iti_sentido_itinerario = 'I'
>                 and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                 and dvi_sub.vlo_cod_localizacao is not null
>               group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
> Cancel request sent
> ERROR:  canceling statement due to user request
> Time: 2555.557 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>              dvi_sub.via_cod_viagem
>               from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem via_sub, gis_iti_itinerario iti_sub
>               where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                 and via_sub.via_status_viagem = 'A'
>                 and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                 and iti_sub.lin_cod_linha = 389
>                 and iti_sub.iti_sentido_itinerario = 'I'
>                 and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                 and dvi_sub.vlo_cod_localizacao is not null
>               group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
> Cancel request sent
> ERROR:  canceling statement due to user request
> Time: 2192.572 ms
> citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> dvi_cod_dados_via_iti,
>              dvi_sub.via_cod_viagem
>               from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem via_sub, gis_iti_itinerario iti_sub
>               where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                 and via_sub.via_status_viagem = 'A'
>                 and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                 and iti_sub.lin_cod_linha = 389
>                 and iti_sub.iti_sentido_itinerario = 'I'
>                 and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                 and dvi_sub.vlo_cod_localizacao is not null
>               group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
> Cancel request sent
> ERROR:  canceling statement due to user request
> Time: 10164.015 ms
> citgis=# select now();
>                now
> -------------------------------
>   2015-06-05 09:27:22.006072-03
> (1 row)
>
> Time: 0.152 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>              dvi_sub.via_cod_viagem
>               from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem via_sub, gis_iti_itinerario iti_sub
>               where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                 and via_sub.via_status_viagem = 'A'
>                 and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                 and iti_sub.lin_cod_linha = 389
>                 and iti_sub.iti_sentido_itinerario = 'I'
>                 and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                 and dvi_sub.vlo_cod_localizacao is not null
>               group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
>   dvi_cod_dados_via_iti | via_cod_viagem
> -----------------------+----------------
>              1059964443 |        7989813
>              1060072723 |        7990876
> (2 rows)
>
> Time: 5565.175 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>              dvi_sub.via_cod_viagem
>               from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem via_sub, gis_iti_itinerario iti_sub
>               where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                 and via_sub.via_status_viagem = 'A'
>                 and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                 and iti_sub.lin_cod_linha = 389
>                 and iti_sub.iti_sentido_itinerario = 'I'
>                 and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                 and dvi_sub.vlo_cod_localizacao is not null
>               group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
>   dvi_cod_dados_via_iti | via_cod_viagem
> -----------------------+----------------
>              1059964445 |        7989813
>              1060072725 |        7990876
> (2 rows)
>
> Time: 27.944 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>              dvi_sub.via_cod_viagem
>               from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem via_sub, gis_iti_itinerario iti_sub
>               where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                 and via_sub.via_status_viagem = 'A'
>                 and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                 and iti_sub.lin_cod_linha = 389
>                 and iti_sub.iti_sentido_itinerario = 'I'
>                 and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                 and dvi_sub.vlo_cod_localizacao is not null
>               group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
>   dvi_cod_dados_via_iti | via_cod_viagem
> -----------------------+----------------
>              1059964445 |        7989813
>              1060072727 |        7990876
> (2 rows)
>
> Time: 24.428 ms
>
> Greetings
>    Anderson


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Momentary Delay

От
Anderson Valadares
Дата:
2015-06-08 12:30 GMT-03:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 06/08/2015 07:59 AM, Anderson Valadares wrote:
Hi
  We are experiencing an intermittent problem in a GIS database from a
client. Some times of the day is as if the PostgreSQL executed the
slowest operations see below an example. The query was performed three
times, twice I canceled and the third time I left it to end. The
consultation took 10 seconds to finish, but the measured execution time
is 20 ms. As we have several queries that run every 2 seconds when this
momentary delay occurs queries accumulate and burst the limit of 203
connections allowed. The interval of "momentary stop" are approximately
2 seconds occurring at random times and during these stoppages occur no
locks and no increased load on the server is identified. There is a job
2/2 seconds locks collection of information, running queries, etc., nmon
also collects information every 2 seconds.

I do not have an answer, but I do have a question. What does:

"There is a job
2/2 seconds locks collection of information, running queries, etc"

mean?



My client asks what causes
these momentary stops? because it affects all operations of the
database? etc. How do I identify what is causing these delays in
executions of operations in the database?

The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit

Evolution of the number of connections for a period

snap_datetime                       |  #connections
----------------------------------------------+-------------
2015-06-05 09:25:00.954731-03 |74
2015-06-05 09:26:00.249187-03 |205
2015-06-05 09:27:00.826874-03 |207
2015-06-05 09:28:00.374666-03 |73
2015-06-05 09:29:00.690696-03 |75


Occurrence of the problem

citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
             dvi_sub.via_cod_viagem
              from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
                and via_sub.via_status_viagem = 'A'
                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
                and iti_sub.lin_cod_linha = 389
                and iti_sub.iti_sentido_itinerario = 'I'
                and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
                and dvi_sub.vlo_cod_localizacao is not null
              group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2555.557 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
             dvi_sub.via_cod_viagem
              from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
                and via_sub.via_status_viagem = 'A'
                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
                and iti_sub.lin_cod_linha = 389
                and iti_sub.iti_sentido_itinerario = 'I'
                and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
                and dvi_sub.vlo_cod_localizacao is not null
              group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2192.572 ms
citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
dvi_cod_dados_via_iti,
             dvi_sub.via_cod_viagem
              from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
                and via_sub.via_status_viagem = 'A'
                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
                and iti_sub.lin_cod_linha = 389
                and iti_sub.iti_sentido_itinerario = 'I'
                and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
                and dvi_sub.vlo_cod_localizacao is not null
              group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 10164.015 ms
citgis=# select now();
               now
-------------------------------
  2015-06-05 09:27:22.006072-03
(1 row)

Time: 0.152 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
             dvi_sub.via_cod_viagem
              from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
                and via_sub.via_status_viagem = 'A'
                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
                and iti_sub.lin_cod_linha = 389
                and iti_sub.iti_sentido_itinerario = 'I'
                and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
                and dvi_sub.vlo_cod_localizacao is not null
              group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
  dvi_cod_dados_via_iti | via_cod_viagem
-----------------------+----------------
             1059964443 |        7989813
             1060072723 |        7990876
(2 rows)

Time: 5565.175 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
             dvi_sub.via_cod_viagem
              from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
                and via_sub.via_status_viagem = 'A'
                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
                and iti_sub.lin_cod_linha = 389
                and iti_sub.iti_sentido_itinerario = 'I'
                and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
                and dvi_sub.vlo_cod_localizacao is not null
              group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
  dvi_cod_dados_via_iti | via_cod_viagem
-----------------------+----------------
             1059964445 |        7989813
             1060072725 |        7990876
(2 rows)

Time: 27.944 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
             dvi_sub.via_cod_viagem
              from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
                and via_sub.via_status_viagem = 'A'
                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
                and iti_sub.lin_cod_linha = 389
                and iti_sub.iti_sentido_itinerario = 'I'
                and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
                and dvi_sub.vlo_cod_localizacao is not null
              group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
  dvi_cod_dados_via_iti | via_cod_viagem
-----------------------+----------------
             1059964445 |        7989813
             1060072727 |        7990876
(2 rows)

Time: 24.428 ms

Greetings
   Anderson


--
Adrian Klaver
adrian.klaver@aklaver.com


Adrian
 forgiveness for my english the meant "Exists a job run every 2 seconds capturing information of locks, long running queries, etc."


Anderson

Re: Momentary Delay

От
Adrian Klaver
Дата:
On 06/08/2015 09:25 AM, Anderson Valadares wrote:
> 2015-06-08 12:30 GMT-03:00 Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>>:
>

>
>
>
> Adrian
>   forgiveness for my english the meant "Exists a job run every 2 seconds
> capturing information of locks, long running queries, etc."

So have you looked at the Postgres logs during these time periods to see
if there is anything relevant?

>
>
> Anderson


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Momentary Delay

От
Anderson Valadares
Дата:

2015-06-08 13:58 GMT-03:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 06/08/2015 09:25 AM, Anderson Valadares wrote:
2015-06-08 12:30 GMT-03:00 Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:





Adrian
  forgiveness for my english the meant "Exists a job run every 2 seconds
capturing information of locks, long running queries, etc."

So have you looked at the Postgres logs during these time periods to see if there is anything relevant?




Anderson


--
Adrian Klaver
adrian.klaver@aklaver.com

I saw the logs and there is nothing wrong.


Re: Momentary Delay

От
Bill Moran
Дата:
On Mon, 8 Jun 2015 11:59:31 -0300
Anderson Valadares <andervalbh@gmail.com> wrote:

> Hi
>  We are experiencing an intermittent problem in a GIS database from a
> client. Some times of the day is as if the PostgreSQL executed the slowest
> operations see below an example. The query was performed three times, twice
> I canceled and the third time I left it to end. The consultation took 10
> seconds to finish, but the measured execution time is 20 ms. As we have
> several queries that run every 2 seconds when this momentary delay occurs
> queries accumulate and burst the limit of 203 connections allowed. The
> interval of "momentary stop" are approximately 2 seconds occurring at
> random times and during these stoppages occur no locks and no increased
> load on the server is identified. There is a job 2/2 seconds locks
> collection of information, running queries, etc., nmon also collects
> information every 2 seconds. My client asks what causes these momentary
> stops? because it affects all operations of the database? etc. How do I
> identify what is causing these delays in executions of operations in the
> database?

Number of points to consider:
* Check the contents of pg_locks and see if something is waiting on a
  lock when the query is slow.
* Also, ensure log_lock_waits is turned on for the server and check
  PostgreSQL's logs for messages about long lock waits.
* Make sure shared_buffers isn't larger than 16G. I've seen issues with
  PG 9.X and shared_buffers settings larger than that causing
  intermittant stalls.
* Based on the connection behavior you describe, I'm guessing it's a
  Tomcat app using some sort of conection pool. Whatever it is, tune
  your connection pool settings so that the max size of the pool doesn't
  exceed the available PostgreSQL connections. At least that will prevent
  errors from happening when the problem occurs.
* The query you you keep cancelling below, run judy EXPLAIN on it (not
  EXPLAIN ANALYZE) which will certainly finish and give you a plan
  that can be reviewed to help determine what the problem is.
* On the OS, monitor iostat or something else to see if you're
  saturating disk capacity when the problem happens.

> The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
>
> Evolution of the number of connections for a period
>
> snap_datetime                        |  #connections
> ----------------------------------------------+-------------
> 2015-06-05 09:25:00.954731-03 | 74
> 2015-06-05 09:26:00.249187-03 | 205
> 2015-06-05 09:27:00.826874-03 | 207
> 2015-06-05 09:28:00.374666-03 | 73
> 2015-06-05 09:29:00.690696-03 | 75
>
>
> Occurrence of the problem
>
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>             dvi_sub.via_cod_viagem
>              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> via_sub, gis_iti_itinerario iti_sub
>              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                and via_sub.via_status_viagem = 'A'
>                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                and iti_sub.lin_cod_linha = 389
>                and iti_sub.iti_sentido_itinerario = 'I'
>                and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                and dvi_sub.vlo_cod_localizacao is not null
>              group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
> Cancel request sent
> ERROR:  canceling statement due to user request
> Time: 2555.557 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>             dvi_sub.via_cod_viagem
>              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> via_sub, gis_iti_itinerario iti_sub
>              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                and via_sub.via_status_viagem = 'A'
>                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                and iti_sub.lin_cod_linha = 389
>                and iti_sub.iti_sentido_itinerario = 'I'
>                and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                and dvi_sub.vlo_cod_localizacao is not null
>              group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
> Cancel request sent
> ERROR:  canceling statement due to user request
> Time: 2192.572 ms
> citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> dvi_cod_dados_via_iti,
>             dvi_sub.via_cod_viagem
>              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> via_sub, gis_iti_itinerario iti_sub
>              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                and via_sub.via_status_viagem = 'A'
>                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                and iti_sub.lin_cod_linha = 389
>                and iti_sub.iti_sentido_itinerario = 'I'
>                and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                and dvi_sub.vlo_cod_localizacao is not null
>              group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
> Cancel request sent
> ERROR:  canceling statement due to user request
> Time: 10164.015 ms
> citgis=# select now();
>               now
> -------------------------------
>  2015-06-05 09:27:22.006072-03
> (1 row)
>
> Time: 0.152 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>             dvi_sub.via_cod_viagem
>              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> via_sub, gis_iti_itinerario iti_sub
>              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                and via_sub.via_status_viagem = 'A'
>                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                and iti_sub.lin_cod_linha = 389
>                and iti_sub.iti_sentido_itinerario = 'I'
>                and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                and dvi_sub.vlo_cod_localizacao is not null
>              group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
>  dvi_cod_dados_via_iti | via_cod_viagem
> -----------------------+----------------
>             1059964443 |        7989813
>             1060072723 |        7990876
> (2 rows)
>
> Time: 5565.175 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>             dvi_sub.via_cod_viagem
>              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> via_sub, gis_iti_itinerario iti_sub
>              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                and via_sub.via_status_viagem = 'A'
>                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                and iti_sub.lin_cod_linha = 389
>                and iti_sub.iti_sentido_itinerario = 'I'
>                and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                and dvi_sub.vlo_cod_localizacao is not null
>              group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
>  dvi_cod_dados_via_iti | via_cod_viagem
> -----------------------+----------------
>             1059964445 |        7989813
>             1060072725 |        7990876
> (2 rows)
>
> Time: 27.944 ms
> citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
>             dvi_sub.via_cod_viagem
>              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> via_sub, gis_iti_itinerario iti_sub
>              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
>                and via_sub.via_status_viagem = 'A'
>                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
>                and iti_sub.lin_cod_linha = 389
>                and iti_sub.iti_sentido_itinerario = 'I'
>                and iti_sub.iti_cod_itinerario_linha =
> via_sub.iti_cod_itinerario_linha
>                and dvi_sub.vlo_cod_localizacao is not null
>              group by dvi_sub.via_cod_viagem,
> iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> ;
>  dvi_cod_dados_via_iti | via_cod_viagem
> -----------------------+----------------
>             1059964445 |        7989813
>             1060072727 |        7990876
> (2 rows)
>
> Time: 24.428 ms
>
> Greetings
>
>   Anderson


--
Bill Moran <wmoran@potentialtech.com>


Re: Momentary Delay

От
Bill Moran
Дата:
Please do not remove the mailing list from replies. See below.

On Fri, 12 Jun 2015 09:21:19 -0300
Anderson Valadares <andervalbh@gmail.com> wrote:

> 2015-06-08 20:33 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
>
> > On Mon, 8 Jun 2015 11:59:31 -0300
> > Anderson Valadares <andervalbh@gmail.com> wrote:
> >
> > > Hi
> > >  We are experiencing an intermittent problem in a GIS database from a
> > > client. Some times of the day is as if the PostgreSQL executed the
> > slowest
> > > operations see below an example. The query was performed three times,
> > twice
> > > I canceled and the third time I left it to end. The consultation took 10
> > > seconds to finish, but the measured execution time is 20 ms. As we have
> > > several queries that run every 2 seconds when this momentary delay occurs
> > > queries accumulate and burst the limit of 203 connections allowed. The
> > > interval of "momentary stop" are approximately 2 seconds occurring at
> > > random times and during these stoppages occur no locks and no increased
> > > load on the server is identified. There is a job 2/2 seconds locks
> > > collection of information, running queries, etc., nmon also collects
> > > information every 2 seconds. My client asks what causes these momentary
> > > stops? because it affects all operations of the database? etc. How do I
> > > identify what is causing these delays in executions of operations in the
> > > database?
> >
> > Number of points to consider:
> > * Check the contents of pg_locks and see if something is waiting on a
> >   lock when the query is slow.
>
>  There is nothing waiting when the query is slow, see:
>
>          snap_datetime         | waiting | count
> -------------------------------+---------+-------
>  2015-06-05 09:25:00.954731-03 | f       |    74
>  2015-06-05 09:26:00.249187-03 | f       |   205
>  2015-06-05 09:27:00.826874-03 | f       |   207

I don't know what that means. Since you don't show the query that
generated that output, I have no idea if your statement is valid, or
if you're running a query that will inherently produce incorrect
results.

> * Also, ensure log_lock_waits is turned on for the server and check
> >   PostgreSQL's logs for messages about long lock waits.
>
> There is also nothing

This is even worse than the previous statement. What did you _do_ to
determine this?

Since you're unable to diagnose and fix the problem on your own, the
possibility exists that the problem is with your diagnostic steps. If
you don't share the details of those steps, I have no way to help you
with them.

> > * Based on the connection behavior you describe, I'm guessing it's a
> >   Tomcat app using some sort of conection pool. Whatever it is, tune
> >   your connection pool settings so that the max size of the pool doesn't
> >   exceed the available PostgreSQL connections. At least that will prevent
> >   errors from happening when the problem occurs.
>
> Yes it is a tomcat application using connection pooling and will be set to
> not exceed the limit of PostgreSQL connections, but the central point is
> that the number of connections has increased due to what I am calling
> "momentary stoppages"

I understand that. I made the suggestion as a temporary fix to reduce
the impact until you are able to fix the actual cause.

> > * The query you you keep cancelling below, run just EXPLAIN on it (not
> >   EXPLAIN ANALYZE) which will certainly finish and give you a plan
> >   that can be reviewed to help determine what the problem is.
>
> Explain plan in http://explain.depesz.com/s/bWw

I'm confused. The query at that URL only took 10ms to complete, which
doesn't seem at all unreasonable.

> * On the OS, monitor iostat or something else to see if you're
> >   saturating disk capacity when the problem happens.
>
> The OS is also being monitored and there occurs no saturation.

Again, without any details, it's possible that there is a problem
in your monitoring.

>
> > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
> > >
> > > Evolution of the number of connections for a period
> > >
> > > snap_datetime                        |  #connections
> > > ----------------------------------------------+-------------
> > > 2015-06-05 09:25:00.954731-03 | 74
> > > 2015-06-05 09:26:00.249187-03 | 205
> > > 2015-06-05 09:27:00.826874-03 | 207
> > > 2015-06-05 09:28:00.374666-03 | 73
> > > 2015-06-05 09:29:00.690696-03 | 75
> > >
> > >
> > > Occurrence of the problem
> > >
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > > Cancel request sent
> > > ERROR:  canceling statement due to user request
> > > Time: 2555.557 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > > Cancel request sent
> > > ERROR:  canceling statement due to user request
> > > Time: 2192.572 ms
> > > citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > > Cancel request sent
> > > ERROR:  canceling statement due to user request
> > > Time: 10164.015 ms
> > > citgis=# select now();
> > >               now
> > > -------------------------------
> > >  2015-06-05 09:27:22.006072-03
> > > (1 row)
> > >
> > > Time: 0.152 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > -----------------------+----------------
> > >             1059964443 |        7989813
> > >             1060072723 |        7990876
> > > (2 rows)
> > >
> > > Time: 5565.175 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > -----------------------+----------------
> > >             1059964445 |        7989813
> > >             1060072725 |        7990876
> > > (2 rows)
> > >
> > > Time: 27.944 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > -----------------------+----------------
> > >             1059964445 |        7989813
> > >             1060072727 |        7990876
> > > (2 rows)
> > >
> > > Time: 24.428 ms
> > >
> > > Greetings
> > >
> > >   Anderson
> >
> >
> > --
> > Bill Moran <wmoran@potentialtech.com>
> >


--
Bill Moran <wmoran@potentialtech.com>


Re: Momentary Delay

От
Anderson Valadares
Дата:

2015-06-12 19:56 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:

Please do not remove the mailing list from replies. See below.

On Fri, 12 Jun 2015 09:21:19 -0300
Anderson Valadares <andervalbh@gmail.com> wrote:

> 2015-06-08 20:33 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
>
> > On Mon, 8 Jun 2015 11:59:31 -0300
> > Anderson Valadares <andervalbh@gmail.com> wrote:
> >
> > > Hi
> > >  We are experiencing an intermittent problem in a GIS database from a
> > > client. Some times of the day is as if the PostgreSQL executed the
> > slowest
> > > operations see below an example. The query was performed three times,
> > twice
> > > I canceled and the third time I left it to end. The consultation took 10
> > > seconds to finish, but the measured execution time is 20 ms. As we have
> > > several queries that run every 2 seconds when this momentary delay occurs
> > > queries accumulate and burst the limit of 203 connections allowed. The
> > > interval of "momentary stop" are approximately 2 seconds occurring at
> > > random times and during these stoppages occur no locks and no increased
> > > load on the server is identified. There is a job 2/2 seconds locks
> > > collection of information, running queries, etc., nmon also collects
> > > information every 2 seconds. My client asks what causes these momentary
> > > stops? because it affects all operations of the database? etc. How do I
> > > identify what is causing these delays in executions of operations in the
> > > database?
> >
> > Number of points to consider:
> > * Check the contents of pg_locks and see if something is waiting on a
> >   lock when the query is slow.
>
>  There is nothing waiting when the query is slow, see:
>
>          snap_datetime         | waiting | count
> -------------------------------+---------+-------
>  2015-06-05 09:25:00.954731-03 | f       |    74
>  2015-06-05 09:26:00.249187-03 | f       |   205
>  2015-06-05 09:27:00.826874-03 | f       |   207

I don't know what that means. Since you don't show the query that
generated that output, I have no idea if your statement is valid, or
if you're running a query that will inherently produce incorrect
results.

Sorry the information was incomplete. Below is the query that was
performed to extract information from tables that used to monitor the database.
The snap_datetime column indicates the time that occurred monitoring,
the waiting column tells you whether any connection was on hold and the column
count tells how many connections existed at the time.

select
     s.snap_datetime
     ,a.waiting
     , count(*)
from stat_snapshot s, stat_activity a
where s.snap_id = a.snap_id
  and s.snap_datetime >= '2015-06-05 09:25:00'
  and s.snap_datetime <= '2015-06-05 09:28:00'
group by
          s.snap_datetime
          ,a.waiting
order by s.snap_datetime
;
         snap_datetime         | waiting | count
-------------------------------+---------+-------
 2015-06-05 09:25:00.954731-03 | f       |    74
 2015-06-05 09:26:00.249187-03 | f       |   205
 2015-06-05 09:27:00.826874-03 | f       |   207
(3 rows) 


> * Also, ensure log_lock_waits is turned on for the server and check
> >   PostgreSQL's logs for messages about long lock waits.
>
> There is also nothing

This is even worse than the previous statement. What did you _do_ to
determine this?

Since you're unable to diagnose and fix the problem on your own, the
possibility exists that the problem is with your diagnostic steps. If
you don't share the details of those steps, I have no way to help you
with them.
 


> > * Based on the connection behavior you describe, I'm guessing it's a
> >   Tomcat app using some sort of conection pool. Whatever it is, tune
> >   your connection pool settings so that the max size of the pool doesn't
> >   exceed the available PostgreSQL connections. At least that will prevent
> >   errors from happening when the problem occurs.
>
> Yes it is a tomcat application using connection pooling and will be set to
> not exceed the limit of PostgreSQL connections, but the central point is
> that the number of connections has increased due to what I am calling
> "momentary stoppages"

I understand that. I made the suggestion as a temporary fix to reduce
the impact until you are able to fix the actual cause.

> > * The query you you keep cancelling below, run just EXPLAIN on it (not
> >   EXPLAIN ANALYZE) which will certainly finish and give you a plan
> >   that can be reviewed to help determine what the problem is.
>
> Explain plan in http://explain.depesz.com/s/bWw

I'm confused. The query at that URL only took 10ms to complete, which
doesn't seem at all unreasonable.


I agree with you and that's why I ask why the delay in the execution at certain times
as demonstrated above.

 
> * On the OS, monitor iostat or something else to see if you're
> >   saturating disk capacity when the problem happens.
>
> The OS is also being monitored and there occurs no saturation.

Again, without any details, it's possible that there is a problem
in your monitoring.


 

>
> > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
> > >
> > > Evolution of the number of connections for a period
> > >
> > > snap_datetime                        |  #connections
> > > ----------------------------------------------+-------------
> > > 2015-06-05 09:25:00.954731-03 | 74
> > > 2015-06-05 09:26:00.249187-03 | 205
> > > 2015-06-05 09:27:00.826874-03 | 207
> > > 2015-06-05 09:28:00.374666-03 | 73
> > > 2015-06-05 09:29:00.690696-03 | 75
> > >
> > >
> > > Occurrence of the problem
> > >
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > > Cancel request sent
> > > ERROR:  canceling statement due to user request
> > > Time: 2555.557 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > > Cancel request sent
> > > ERROR:  canceling statement due to user request
> > > Time: 2192.572 ms
> > > citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > > Cancel request sent
> > > ERROR:  canceling statement due to user request
> > > Time: 10164.015 ms
> > > citgis=# select now();
> > >               now
> > > -------------------------------
> > >  2015-06-05 09:27:22.006072-03
> > > (1 row)
> > >
> > > Time: 0.152 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > -----------------------+----------------
> > >             1059964443 |        7989813
> > >             1060072723 |        7990876
> > > (2 rows)
> > >
> > > Time: 5565.175 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > -----------------------+----------------
> > >             1059964445 |        7989813
> > >             1060072725 |        7990876
> > > (2 rows)
> > >
> > > Time: 27.944 ms
> > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > dvi_cod_dados_via_iti,
> > >             dvi_sub.via_cod_viagem
> > >              from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
> > > via_sub, gis_iti_itinerario iti_sub
> > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > >                and via_sub.via_status_viagem = 'A'
> > >                and via_sub.via_dt_hora_ini > now() - interval '9 hours'
> > >                and iti_sub.lin_cod_linha = 389
> > >                and iti_sub.iti_sentido_itinerario = 'I'
> > >                and iti_sub.iti_cod_itinerario_linha =
> > > via_sub.iti_cod_itinerario_linha
> > >                and dvi_sub.vlo_cod_localizacao is not null
> > >              group by dvi_sub.via_cod_viagem,
> > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > ;
> > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > -----------------------+----------------
> > >             1059964445 |        7989813
> > >             1060072727 |        7990876
> > > (2 rows)
> > >
> > > Time: 24.428 ms
> > >
> > > Greetings
> > >
> > >   Anderson
> >
> >
> > --
> > Bill Moran <wmoran@potentialtech.com>
> >


--
Bill Moran <wmoran@potentialtech.com>

Re: Momentary Delay

От
Bill Moran
Дата:
On Mon, 15 Jun 2015 17:48:54 -0300
Anderson Valadares <andervalbh@gmail.com> wrote:

> 2015-06-12 19:56 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
>
> > Please do not remove the mailing list from replies. See below.
> >
> > On Fri, 12 Jun 2015 09:21:19 -0300
> > Anderson Valadares <andervalbh@gmail.com> wrote:
> >
> > > 2015-06-08 20:33 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
> > >
> > > > On Mon, 8 Jun 2015 11:59:31 -0300
> > > > Anderson Valadares <andervalbh@gmail.com> wrote:
> > > >
> > > > > Hi
> > > > >  We are experiencing an intermittent problem in a GIS database from a
> > > > > client. Some times of the day is as if the PostgreSQL executed the
> > > > slowest
> > > > > operations see below an example. The query was performed three times,
> > > > twice
> > > > > I canceled and the third time I left it to end. The consultation
> > took 10
> > > > > seconds to finish, but the measured execution time is 20 ms. As we
> > have
> > > > > several queries that run every 2 seconds when this momentary delay
> > occurs
> > > > > queries accumulate and burst the limit of 203 connections allowed.
> > The
> > > > > interval of "momentary stop" are approximately 2 seconds occurring at
> > > > > random times and during these stoppages occur no locks and no
> > increased
> > > > > load on the server is identified. There is a job 2/2 seconds locks
> > > > > collection of information, running queries, etc., nmon also collects
> > > > > information every 2 seconds. My client asks what causes these
> > momentary
> > > > > stops? because it affects all operations of the database? etc. How
> > do I
> > > > > identify what is causing these delays in executions of operations in
> > the
> > > > > database?
> > > >
> > > > Number of points to consider:
> > > > * Check the contents of pg_locks and see if something is waiting on a
> > > >   lock when the query is slow.
> > >
> > >  There is nothing waiting when the query is slow, see:
> > >
> > >          snap_datetime         | waiting | count
> > > -------------------------------+---------+-------
> > >  2015-06-05 09:25:00.954731-03 | f       |    74
> > >  2015-06-05 09:26:00.249187-03 | f       |   205
> > >  2015-06-05 09:27:00.826874-03 | f       |   207
> >
> > I don't know what that means. Since you don't show the query that
> > generated that output, I have no idea if your statement is valid, or
> > if you're running a query that will inherently produce incorrect
> > results.
>
> Sorry the information was incomplete. Below is the query that was
> performed to extract information from tables that used to monitor the
> database.
> The snap_datetime column indicates the time that occurred monitoring,
> the waiting column tells you whether any connection was on hold and the
> column
> count tells how many connections existed at the time.
>
> select
>      s.snap_datetime
>      ,a.waiting
>      , count(*)
> from stat_snapshot s, stat_activity a
> where s.snap_id = a.snap_id
>   and s.snap_datetime >= '2015-06-05 09:25:00'
>   and s.snap_datetime <= '2015-06-05 09:28:00'
> group by
>           s.snap_datetime
>           ,a.waiting
> order by s.snap_datetime
> ;
>          snap_datetime         | waiting | count
> -------------------------------+---------+-------
>  2015-06-05 09:25:00.954731-03 | f       |    74
>  2015-06-05 09:26:00.249187-03 | f       |   205
>  2015-06-05 09:27:00.826874-03 | f       |   207
> (3 rows)

Given what I'm seeing above, there's still a lot of information
missing. Those tables are not standard PostgreSQL tables, and I'm
not aware of any tool that creates them, thus I have to assume
it's a cron job or something similar created in-house. Given the
query you ran and the resultant data, my first guess is that the
data in stat_snapshot and stat_activity is worthless: i.e. it's
captured once per minute, and therefore there are 59+ seconds
worth of detail that aren't captured, thus the actual liklihood
that the those tables will contain any indication of the problem
is very low.

> > > * Also, ensure log_lock_waits is turned on for the server and check
> > > >   PostgreSQL's logs for messages about long lock waits.
> > >
> > > There is also nothing
> >
> > This is even worse than the previous statement. What did you _do_ to
> > determine this?
> >
> > Since you're unable to diagnose and fix the problem on your own, the
> > possibility exists that the problem is with your diagnostic steps. If
> > you don't share the details of those steps, I have no way to help you
> > with them.
>
> See the log in
> https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_000000.log

Apologies, my above comment was intended to be in regards to the IO load.

However, that log shows that you have massive lock contention. In the early
part of the log a process waits over 5 minutes for a share lock:
2015-06-05 05:51:32 BRT [44236822] - [10.57.1.26] [14-1] db=citgis,user=opegis LOG:  process 44236822 acquired
ShareLockon transaction 3090377758 after 374146.817 ms 

Without seeing the source of the PL/pgSQL functions that seem to be
causing the problem, I can only suggest that you review those functions
for excessive locking and inefficient access. I'm going to guess, with
the large number of AccessExclusiveLocks in the log, that the functions
are doing explicit locking, and not doing so in an efficient manner.

> > > > * Based on the connection behavior you describe, I'm guessing it's a
> > > >   Tomcat app using some sort of conection pool. Whatever it is, tune
> > > >   your connection pool settings so that the max size of the pool
> > doesn't
> > > >   exceed the available PostgreSQL connections. At least that will
> > prevent
> > > >   errors from happening when the problem occurs.
> > >
> > > Yes it is a tomcat application using connection pooling and will be set
> > to
> > > not exceed the limit of PostgreSQL connections, but the central point is
> > > that the number of connections has increased due to what I am calling
> > > "momentary stoppages"
> >
> > I understand that. I made the suggestion as a temporary fix to reduce
> > the impact until you are able to fix the actual cause.
> >
> > > > * The query you you keep cancelling below, run just EXPLAIN on it (not
> > > >   EXPLAIN ANALYZE) which will certainly finish and give you a plan
> > > >   that can be reviewed to help determine what the problem is.
> > >
> > > Explain plan in http://explain.depesz.com/s/bWw
> >
> > I'm confused. The query at that URL only took 10ms to complete, which
> > doesn't seem at all unreasonable.
>
> I agree with you and that's why I ask why the delay in the execution at
> certain times
> as demonstrated above.
>
> > > * On the OS, monitor iostat or something else to see if you're
> > > >   saturating disk capacity when the problem happens.
> > >
> > > The OS is also being monitored and there occurs no saturation.
> >
> > Again, without any details, it's possible that there is a problem
> > in your monitoring.
> >
> >
> See the monitoring spreadsheet generated from nmon in
> https://github.com/andervalbh/serv-gis01/blob/master/transfacil07_150605_0000.nmon.rar
>
>
> >
> > > > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> > > > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
> > > > >
> > > > > Evolution of the number of connections for a period
> > > > >
> > > > > snap_datetime                        |  #connections
> > > > > ----------------------------------------------+-------------
> > > > > 2015-06-05 09:25:00.954731-03 | 74
> > > > > 2015-06-05 09:26:00.249187-03 | 205
> > > > > 2015-06-05 09:27:00.826874-03 | 207
> > > > > 2015-06-05 09:28:00.374666-03 | 73
> > > > > 2015-06-05 09:29:00.690696-03 | 75
> > > > >
> > > > >
> > > > > Occurrence of the problem
> > > > >
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > > Cancel request sent
> > > > > ERROR:  canceling statement due to user request
> > > > > Time: 2555.557 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > > Cancel request sent
> > > > > ERROR:  canceling statement due to user request
> > > > > Time: 2192.572 ms
> > > > > citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > > Cancel request sent
> > > > > ERROR:  canceling statement due to user request
> > > > > Time: 10164.015 ms
> > > > > citgis=# select now();
> > > > >               now
> > > > > -------------------------------
> > > > >  2015-06-05 09:27:22.006072-03
> > > > > (1 row)
> > > > >
> > > > > Time: 0.152 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > -----------------------+----------------
> > > > >             1059964443 |        7989813
> > > > >             1060072723 |        7990876
> > > > > (2 rows)
> > > > >
> > > > > Time: 5565.175 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > -----------------------+----------------
> > > > >             1059964445 |        7989813
> > > > >             1060072725 |        7990876
> > > > > (2 rows)
> > > > >
> > > > > Time: 27.944 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > -----------------------+----------------
> > > > >             1059964445 |        7989813
> > > > >             1060072727 |        7990876
> > > > > (2 rows)
> > > > >
> > > > > Time: 24.428 ms
> > > > >
> > > > > Greetings
> > > > >
> > > > >   Anderson
> > > >
> > > >
> > > > --
> > > > Bill Moran <wmoran@potentialtech.com>
> > > >
> >
> >
> > --
> > Bill Moran <wmoran@potentialtech.com>
> >


--
Bill Moran <wmoran@potentialtech.com>


Re: Momentary Delay

От
Anderson Valadares
Дата:

2015-06-15 18:19 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
On Mon, 15 Jun 2015 17:48:54 -0300
Anderson Valadares <andervalbh@gmail.com> wrote:

> 2015-06-12 19:56 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
>
> > Please do not remove the mailing list from replies. See below.
> >
> > On Fri, 12 Jun 2015 09:21:19 -0300
> > Anderson Valadares <andervalbh@gmail.com> wrote:
> >
> > > 2015-06-08 20:33 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
> > >
> > > > On Mon, 8 Jun 2015 11:59:31 -0300
> > > > Anderson Valadares <andervalbh@gmail.com> wrote:
> > > >
> > > > > Hi
> > > > >  We are experiencing an intermittent problem in a GIS database from a
> > > > > client. Some times of the day is as if the PostgreSQL executed the
> > > > slowest
> > > > > operations see below an example. The query was performed three times,
> > > > twice
> > > > > I canceled and the third time I left it to end. The consultation
> > took 10
> > > > > seconds to finish, but the measured execution time is 20 ms. As we
> > have
> > > > > several queries that run every 2 seconds when this momentary delay
> > occurs
> > > > > queries accumulate and burst the limit of 203 connections allowed.
> > The
> > > > > interval of "momentary stop" are approximately 2 seconds occurring at
> > > > > random times and during these stoppages occur no locks and no
> > increased
> > > > > load on the server is identified. There is a job 2/2 seconds locks
> > > > > collection of information, running queries, etc., nmon also collects
> > > > > information every 2 seconds. My client asks what causes these
> > momentary
> > > > > stops? because it affects all operations of the database? etc. How
> > do I
> > > > > identify what is causing these delays in executions of operations in
> > the
> > > > > database?
> > > >
> > > > Number of points to consider:
> > > > * Check the contents of pg_locks and see if something is waiting on a
> > > >   lock when the query is slow.
> > >
> > >  There is nothing waiting when the query is slow, see:
> > >
> > >          snap_datetime         | waiting | count
> > > -------------------------------+---------+-------
> > >  2015-06-05 09:25:00.954731-03 | f       |    74
> > >  2015-06-05 09:26:00.249187-03 | f       |   205
> > >  2015-06-05 09:27:00.826874-03 | f       |   207
> >
> > I don't know what that means. Since you don't show the query that
> > generated that output, I have no idea if your statement is valid, or
> > if you're running a query that will inherently produce incorrect
> > results.
>
> Sorry the information was incomplete. Below is the query that was
> performed to extract information from tables that used to monitor the
> database.
> The snap_datetime column indicates the time that occurred monitoring,
> the waiting column tells you whether any connection was on hold and the
> column
> count tells how many connections existed at the time.
>
> select
>      s.snap_datetime
>      ,a.waiting
>      , count(*)
> from stat_snapshot s, stat_activity a
> where s.snap_id = a.snap_id
>   and s.snap_datetime >= '2015-06-05 09:25:00'
>   and s.snap_datetime <= '2015-06-05 09:28:00'
> group by
>           s.snap_datetime
>           ,a.waiting
> order by s.snap_datetime
> ;
>          snap_datetime         | waiting | count
> -------------------------------+---------+-------
>  2015-06-05 09:25:00.954731-03 | f       |    74
>  2015-06-05 09:26:00.249187-03 | f       |   205
>  2015-06-05 09:27:00.826874-03 | f       |   207
> (3 rows)

Given what I'm seeing above, there's still a lot of information
missing. Those tables are not standard PostgreSQL tables, and I'm
not aware of any tool that creates them, thus I have to assume
it's a cron job or something similar created in-house. Given the
query you ran and the resultant data, my first guess is that the
data in stat_snapshot and stat_activity is worthless: i.e. it's
captured once per minute, and therefore there are 59+ seconds
worth of detail that aren't captured, thus the actual liklihood
that the those tables will contain any indication of the problem
is very low.

Yes it is a cron, created at home, running every 1 hour gathering information from PostgreSQL tables (pg_stat_activity, pg_locks, etc).
How to improve the collection of information PostgreSQL?

 

> > > * Also, ensure log_lock_waits is turned on for the server and check
> > > >   PostgreSQL's logs for messages about long lock waits.
> > >
> > > There is also nothing
> >
> > This is even worse than the previous statement. What did you _do_ to
> > determine this?
> >
> > Since you're unable to diagnose and fix the problem on your own, the
> > possibility exists that the problem is with your diagnostic steps. If
> > you don't share the details of those steps, I have no way to help you
> > with them.
>
> See the log in
> https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_000000.log

Apologies, my above comment was intended to be in regards to the IO load.

However, that log shows that you have massive lock contention. In the early
part of the log a process waits over 5 minutes for a share lock:
2015-06-05 05:51:32 BRT [44236822] - [10.57.1.26] [14-1] db=citgis,user=opegis LOG:  process 44236822 acquired ShareLock on transaction 3090377758 after 374146.817 ms

Without seeing the source of the PL/pgSQL functions that seem to be
causing the problem, I can only suggest that you review those functions
for excessive locking and inefficient access. I'm going to guess, with
the large number of AccessExclusiveLocks in the log, that the functions
are doing explicit locking, and not doing so in an efficient manner.

 
This function PL / pgSQL has been fixed, but see that during the period in which the SQL presenting slowness problem (between 9:26:00 and 9:27:00) there is no block occurrences.


> > > > * Based on the connection behavior you describe, I'm guessing it's a
> > > >   Tomcat app using some sort of conection pool. Whatever it is, tune
> > > >   your connection pool settings so that the max size of the pool
> > doesn't
> > > >   exceed the available PostgreSQL connections. At least that will
> > prevent
> > > >   errors from happening when the problem occurs.
> > >
> > > Yes it is a tomcat application using connection pooling and will be set
> > to
> > > not exceed the limit of PostgreSQL connections, but the central point is
> > > that the number of connections has increased due to what I am calling
> > > "momentary stoppages"
> >
> > I understand that. I made the suggestion as a temporary fix to reduce
> > the impact until you are able to fix the actual cause.
> >
> > > > * The query you you keep cancelling below, run just EXPLAIN on it (not
> > > >   EXPLAIN ANALYZE) which will certainly finish and give you a plan
> > > >   that can be reviewed to help determine what the problem is.
> > >
> > > Explain plan in http://explain.depesz.com/s/bWw
> >
> > I'm confused. The query at that URL only took 10ms to complete, which
> > doesn't seem at all unreasonable.
>
> I agree with you and that's why I ask why the delay in the execution at
> certain times
> as demonstrated above.
>
> > > * On the OS, monitor iostat or something else to see if you're
> > > >   saturating disk capacity when the problem happens.
> > >
> > > The OS is also being monitored and there occurs no saturation.
> >
> > Again, without any details, it's possible that there is a problem
> > in your monitoring.
> >
> >
> See the monitoring spreadsheet generated from nmon in
> https://github.com/andervalbh/serv-gis01/blob/master/transfacil07_150605_0000.nmon.rar
>
>
> >
> > > > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> > > > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
> > > > >
> > > > > Evolution of the number of connections for a period
> > > > >
> > > > > snap_datetime                        |  #connections
> > > > > ----------------------------------------------+-------------
> > > > > 2015-06-05 09:25:00.954731-03 | 74
> > > > > 2015-06-05 09:26:00.249187-03 | 205
> > > > > 2015-06-05 09:27:00.826874-03 | 207
> > > > > 2015-06-05 09:28:00.374666-03 | 73
> > > > > 2015-06-05 09:29:00.690696-03 | 75
> > > > >
> > > > >
> > > > > Occurrence of the problem
> > > > >
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > > Cancel request sent
> > > > > ERROR:  canceling statement due to user request
> > > > > Time: 2555.557 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > > Cancel request sent
> > > > > ERROR:  canceling statement due to user request
> > > > > Time: 2192.572 ms
> > > > > citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > > Cancel request sent
> > > > > ERROR:  canceling statement due to user request
> > > > > Time: 10164.015 ms
> > > > > citgis=# select now();
> > > > >               now
> > > > > -------------------------------
> > > > >  2015-06-05 09:27:22.006072-03
> > > > > (1 row)
> > > > >
> > > > > Time: 0.152 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > -----------------------+----------------
> > > > >             1059964443 |        7989813
> > > > >             1060072723 |        7990876
> > > > > (2 rows)
> > > > >
> > > > > Time: 5565.175 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > -----------------------+----------------
> > > > >             1059964445 |        7989813
> > > > >             1060072725 |        7990876
> > > > > (2 rows)
> > > > >
> > > > > Time: 27.944 ms
> > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > >             dvi_sub.via_cod_viagem
> > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > gis_via_viagem
> > > > > via_sub, gis_iti_itinerario iti_sub
> > > > >              where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > >                and via_sub.via_status_viagem = 'A'
> > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > hours'
> > > > >                and iti_sub.lin_cod_linha = 389
> > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > via_sub.iti_cod_itinerario_linha
> > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > >              group by dvi_sub.via_cod_viagem,
> > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > ;
> > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > -----------------------+----------------
> > > > >             1059964445 |        7989813
> > > > >             1060072727 |        7990876
> > > > > (2 rows)
> > > > >
> > > > > Time: 24.428 ms
> > > > >
> > > > > Greetings
> > > > >
> > > > >   Anderson
> > > >
> > > >
> > > > --
> > > > Bill Moran <wmoran@potentialtech.com>
> > > >
> >
> >
> > --
> > Bill Moran <wmoran@potentialtech.com>
> >


--
Bill Moran <wmoran@potentialtech.com>

Re: Momentary Delay

От
Bill Moran
Дата:
On Wed, 17 Jun 2015 10:33:37 -0300
Anderson Valadares <andervalbh@gmail.com> wrote:

> 2015-06-15 18:19 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
>
> > On Mon, 15 Jun 2015 17:48:54 -0300
> > Anderson Valadares <andervalbh@gmail.com> wrote:
> >
> > > 2015-06-12 19:56 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
> > >
> > > > Please do not remove the mailing list from replies. See below.
> > > >
> > > > On Fri, 12 Jun 2015 09:21:19 -0300
> > > > Anderson Valadares <andervalbh@gmail.com> wrote:
> > > >
> > > > > 2015-06-08 20:33 GMT-03:00 Bill Moran <wmoran@potentialtech.com>:
> > > > >
> > > > > > On Mon, 8 Jun 2015 11:59:31 -0300
> > > > > > Anderson Valadares <andervalbh@gmail.com> wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >  We are experiencing an intermittent problem in a GIS database
> > from a
> > > > > > > client. Some times of the day is as if the PostgreSQL executed
> > the
> > > > > > slowest
> > > > > > > operations see below an example. The query was performed three
> > times,
> > > > > > twice
> > > > > > > I canceled and the third time I left it to end. The consultation
> > > > took 10
> > > > > > > seconds to finish, but the measured execution time is 20 ms. As
> > we
> > > > have
> > > > > > > several queries that run every 2 seconds when this momentary
> > delay
> > > > occurs
> > > > > > > queries accumulate and burst the limit of 203 connections
> > allowed.
> > > > The
> > > > > > > interval of "momentary stop" are approximately 2 seconds
> > occurring at
> > > > > > > random times and during these stoppages occur no locks and no
> > > > increased
> > > > > > > load on the server is identified. There is a job 2/2 seconds
> > locks
> > > > > > > collection of information, running queries, etc., nmon also
> > collects
> > > > > > > information every 2 seconds. My client asks what causes these
> > > > momentary
> > > > > > > stops? because it affects all operations of the database? etc.
> > How
> > > > do I
> > > > > > > identify what is causing these delays in executions of
> > operations in
> > > > the
> > > > > > > database?
> > > > > >
> > > > > > Number of points to consider:
> > > > > > * Check the contents of pg_locks and see if something is waiting
> > on a
> > > > > >   lock when the query is slow.
> > > > >
> > > > >  There is nothing waiting when the query is slow, see:
> > > > >
> > > > >          snap_datetime         | waiting | count
> > > > > -------------------------------+---------+-------
> > > > >  2015-06-05 09:25:00.954731-03 | f       |    74
> > > > >  2015-06-05 09:26:00.249187-03 | f       |   205
> > > > >  2015-06-05 09:27:00.826874-03 | f       |   207
> > > >
> > > > I don't know what that means. Since you don't show the query that
> > > > generated that output, I have no idea if your statement is valid, or
> > > > if you're running a query that will inherently produce incorrect
> > > > results.
> > >
> > > Sorry the information was incomplete. Below is the query that was
> > > performed to extract information from tables that used to monitor the
> > > database.
> > > The snap_datetime column indicates the time that occurred monitoring,
> > > the waiting column tells you whether any connection was on hold and the
> > > column
> > > count tells how many connections existed at the time.
> > >
> > > select
> > >      s.snap_datetime
> > >      ,a.waiting
> > >      , count(*)
> > > from stat_snapshot s, stat_activity a
> > > where s.snap_id = a.snap_id
> > >   and s.snap_datetime >= '2015-06-05 09:25:00'
> > >   and s.snap_datetime <= '2015-06-05 09:28:00'
> > > group by
> > >           s.snap_datetime
> > >           ,a.waiting
> > > order by s.snap_datetime
> > > ;
> > >          snap_datetime         | waiting | count
> > > -------------------------------+---------+-------
> > >  2015-06-05 09:25:00.954731-03 | f       |    74
> > >  2015-06-05 09:26:00.249187-03 | f       |   205
> > >  2015-06-05 09:27:00.826874-03 | f       |   207
> > > (3 rows)
> >
> > Given what I'm seeing above, there's still a lot of information
> > missing. Those tables are not standard PostgreSQL tables, and I'm
> > not aware of any tool that creates them, thus I have to assume
> > it's a cron job or something similar created in-house. Given the
> > query you ran and the resultant data, my first guess is that the
> > data in stat_snapshot and stat_activity is worthless: i.e. it's
> > captured once per minute, and therefore there are 59+ seconds
> > worth of detail that aren't captured, thus the actual liklihood
> > that the those tables will contain any indication of the problem
> > is very low.
>
> Yes it is a cron, created at home, running every 1 hour gathering
> information from PostgreSQL tables (pg_stat_activity, pg_locks, etc).
> How to improve the collection of information PostgreSQL?

Look at the log file, details of what is going on are being recorded
as they happen.

However, given your statement below (claiming that nothing happened
during the observed slowdown between 9:26 and 9:27) you may want to
find someone who is actually capable of interpreting the log output,
as there are dozens of indicators of what's wrong.  Immediately after
the problem disappears, a number of queries finish after spending a
lot of time executing, in addition to a number of complaints of long
lock waits. So all those queries were fighting each other and when
they finally untangled themselves and got their work done, everything
started running smoothly again.

I would look at fct_sinotico_simplificado(), as it seems to be putting
an undue load on the system. FCT_PAINEL_CCO_VEICULOS_GMAPS() might
also be a problem, but it seems less prevalent. I don't know what
language those function names are in, so I can't really guess at
what they do to make any more detailed recommendations.

Additionally, there is a checkpoint in the middle of the problem
period that took 44 seconds to write 600M to disk. That seems to
contradict your earlier statement that IO isn't being maxed out and
makes me doubt that your IO monitoring is effective.

My overall guess is that you have some poorly written code, both on
the Java side and on the database trigger/function side; but that
under normal usage the hardware is sufficient to keep up. However,
when you have a brief spike in activity, the inefficiencies create
a cascading performance problem. I've seen this before at a previous
job, and the high-level problem was that most developers are good
enough to throw something together that will work, but not talented
and experienced enough to write something truely efficient.

Lock down your connection pool to keep it smaller. Look at where you
can do caching on the Java side to avoid rapid-fire duplicate queries.
Look at limiting inserts to busy tables to a single worker thread that
has a queueing structure so that it can absorb spike loads and spread
the load on the DB out more evenly. Implement SOA so that frequently
accessed data can be kept in Java's memory on the services instead of
_always_ having to query the database. These are just a few of the
techniques available to fix problems like this.

> > > > > * Also, ensure log_lock_waits is turned on for the server and check
> > > > > >   PostgreSQL's logs for messages about long lock waits.
> > > > >
> > > > > There is also nothing
> > > >
> > > > This is even worse than the previous statement. What did you _do_ to
> > > > determine this?
> > > >
> > > > Since you're unable to diagnose and fix the problem on your own, the
> > > > possibility exists that the problem is with your diagnostic steps. If
> > > > you don't share the details of those steps, I have no way to help you
> > > > with them.
> > >
> > > See the log in
> > >
> > https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_000000.log
> >
> > Apologies, my above comment was intended to be in regards to the IO load.
> >
> > However, that log shows that you have massive lock contention. In the early
> > part of the log a process waits over 5 minutes for a share lock:
> > 2015-06-05 05:51:32 BRT [44236822] - [10.57.1.26] [14-1]
> > db=citgis,user=opegis LOG:  process 44236822 acquired ShareLock on
> > transaction 3090377758 after 374146.817 ms
> >
> > Without seeing the source of the PL/pgSQL functions that seem to be
> > causing the problem, I can only suggest that you review those functions
> > for excessive locking and inefficient access. I'm going to guess, with
> > the large number of AccessExclusiveLocks in the log, that the functions
> > are doing explicit locking, and not doing so in an efficient manner.
>
> This function PL / pgSQL has been fixed, but see that during the period in
> which the SQL presenting slowness problem (between 9:26:00 and 9:27:00)
> there is no block occurrences.
>
> > > > > > * Based on the connection behavior you describe, I'm guessing it's
> > a
> > > > > >   Tomcat app using some sort of conection pool. Whatever it is,
> > tune
> > > > > >   your connection pool settings so that the max size of the pool
> > > > doesn't
> > > > > >   exceed the available PostgreSQL connections. At least that will
> > > > prevent
> > > > > >   errors from happening when the problem occurs.
> > > > >
> > > > > Yes it is a tomcat application using connection pooling and will be
> > set
> > > > to
> > > > > not exceed the limit of PostgreSQL connections, but the central
> > point is
> > > > > that the number of connections has increased due to what I am calling
> > > > > "momentary stoppages"
> > > >
> > > > I understand that. I made the suggestion as a temporary fix to reduce
> > > > the impact until you are able to fix the actual cause.
> > > >
> > > > > > * The query you you keep cancelling below, run just EXPLAIN on it
> > (not
> > > > > >   EXPLAIN ANALYZE) which will certainly finish and give you a plan
> > > > > >   that can be reviewed to help determine what the problem is.
> > > > >
> > > > > Explain plan in http://explain.depesz.com/s/bWw
> > > >
> > > > I'm confused. The query at that URL only took 10ms to complete, which
> > > > doesn't seem at all unreasonable.
> > >
> > > I agree with you and that's why I ask why the delay in the execution at
> > > certain times
> > > as demonstrated above.
> > >
> > > > > * On the OS, monitor iostat or something else to see if you're
> > > > > >   saturating disk capacity when the problem happens.
> > > > >
> > > > > The OS is also being monitored and there occurs no saturation.
> > > >
> > > > Again, without any details, it's possible that there is a problem
> > > > in your monitoring.
> > > >
> > > >
> > > See the monitoring spreadsheet generated from nmon in
> > >
> > https://github.com/andervalbh/serv-gis01/blob/master/transfacil07_150605_0000.nmon.rar
> > >
> > >
> > > >
> > > > > > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> > > > > > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
> > > > > > >
> > > > > > > Evolution of the number of connections for a period
> > > > > > >
> > > > > > > snap_datetime                        |  #connections
> > > > > > > ----------------------------------------------+-------------
> > > > > > > 2015-06-05 09:25:00.954731-03 | 74
> > > > > > > 2015-06-05 09:26:00.249187-03 | 205
> > > > > > > 2015-06-05 09:27:00.826874-03 | 207
> > > > > > > 2015-06-05 09:28:00.374666-03 | 73
> > > > > > > 2015-06-05 09:29:00.690696-03 | 75
> > > > > > >
> > > > > > >
> > > > > > > Occurrence of the problem
> > > > > > >
> > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > > dvi_cod_dados_via_iti,
> > > > > > >             dvi_sub.via_cod_viagem
> > > > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > > > gis_via_viagem
> > > > > > > via_sub, gis_iti_itinerario iti_sub
> > > > > > >              where dvi_sub.via_cod_viagem =
> > via_sub.via_cod_viagem
> > > > > > >                and via_sub.via_status_viagem = 'A'
> > > > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > > > hours'
> > > > > > >                and iti_sub.lin_cod_linha = 389
> > > > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > > > via_sub.iti_cod_itinerario_linha
> > > > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > > > >              group by dvi_sub.via_cod_viagem,
> > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > > > ;
> > > > > > > Cancel request sent
> > > > > > > ERROR:  canceling statement due to user request
> > > > > > > Time: 2555.557 ms
> > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > > dvi_cod_dados_via_iti,
> > > > > > >             dvi_sub.via_cod_viagem
> > > > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > > > gis_via_viagem
> > > > > > > via_sub, gis_iti_itinerario iti_sub
> > > > > > >              where dvi_sub.via_cod_viagem =
> > via_sub.via_cod_viagem
> > > > > > >                and via_sub.via_status_viagem = 'A'
> > > > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > > > hours'
> > > > > > >                and iti_sub.lin_cod_linha = 389
> > > > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > > > via_sub.iti_cod_itinerario_linha
> > > > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > > > >              group by dvi_sub.via_cod_viagem,
> > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > > > ;
> > > > > > > Cancel request sent
> > > > > > > ERROR:  canceling statement due to user request
> > > > > > > Time: 2192.572 ms
> > > > > > > citgis=# explain analyze select
> > max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > > > dvi_cod_dados_via_iti,
> > > > > > >             dvi_sub.via_cod_viagem
> > > > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > > > gis_via_viagem
> > > > > > > via_sub, gis_iti_itinerario iti_sub
> > > > > > >              where dvi_sub.via_cod_viagem =
> > via_sub.via_cod_viagem
> > > > > > >                and via_sub.via_status_viagem = 'A'
> > > > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > > > hours'
> > > > > > >                and iti_sub.lin_cod_linha = 389
> > > > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > > > via_sub.iti_cod_itinerario_linha
> > > > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > > > >              group by dvi_sub.via_cod_viagem,
> > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > > > ;
> > > > > > > Cancel request sent
> > > > > > > ERROR:  canceling statement due to user request
> > > > > > > Time: 10164.015 ms
> > > > > > > citgis=# select now();
> > > > > > >               now
> > > > > > > -------------------------------
> > > > > > >  2015-06-05 09:27:22.006072-03
> > > > > > > (1 row)
> > > > > > >
> > > > > > > Time: 0.152 ms
> > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > > dvi_cod_dados_via_iti,
> > > > > > >             dvi_sub.via_cod_viagem
> > > > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > > > gis_via_viagem
> > > > > > > via_sub, gis_iti_itinerario iti_sub
> > > > > > >              where dvi_sub.via_cod_viagem =
> > via_sub.via_cod_viagem
> > > > > > >                and via_sub.via_status_viagem = 'A'
> > > > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > > > hours'
> > > > > > >                and iti_sub.lin_cod_linha = 389
> > > > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > > > via_sub.iti_cod_itinerario_linha
> > > > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > > > >              group by dvi_sub.via_cod_viagem,
> > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > > > ;
> > > > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > > > -----------------------+----------------
> > > > > > >             1059964443 |        7989813
> > > > > > >             1060072723 |        7990876
> > > > > > > (2 rows)
> > > > > > >
> > > > > > > Time: 5565.175 ms
> > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > > dvi_cod_dados_via_iti,
> > > > > > >             dvi_sub.via_cod_viagem
> > > > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > > > gis_via_viagem
> > > > > > > via_sub, gis_iti_itinerario iti_sub
> > > > > > >              where dvi_sub.via_cod_viagem =
> > via_sub.via_cod_viagem
> > > > > > >                and via_sub.via_status_viagem = 'A'
> > > > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > > > hours'
> > > > > > >                and iti_sub.lin_cod_linha = 389
> > > > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > > > via_sub.iti_cod_itinerario_linha
> > > > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > > > >              group by dvi_sub.via_cod_viagem,
> > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > > > ;
> > > > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > > > -----------------------+----------------
> > > > > > >             1059964445 |        7989813
> > > > > > >             1060072725 |        7990876
> > > > > > > (2 rows)
> > > > > > >
> > > > > > > Time: 27.944 ms
> > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > > > dvi_cod_dados_via_iti,
> > > > > > >             dvi_sub.via_cod_viagem
> > > > > > >              from gis_dvi_dados_viagem_itinerario dvi_sub,
> > > > gis_via_viagem
> > > > > > > via_sub, gis_iti_itinerario iti_sub
> > > > > > >              where dvi_sub.via_cod_viagem =
> > via_sub.via_cod_viagem
> > > > > > >                and via_sub.via_status_viagem = 'A'
> > > > > > >                and via_sub.via_dt_hora_ini > now() - interval '9
> > > > hours'
> > > > > > >                and iti_sub.lin_cod_linha = 389
> > > > > > >                and iti_sub.iti_sentido_itinerario = 'I'
> > > > > > >                and iti_sub.iti_cod_itinerario_linha =
> > > > > > > via_sub.iti_cod_itinerario_linha
> > > > > > >                and dvi_sub.vlo_cod_localizacao is not null
> > > > > > >              group by dvi_sub.via_cod_viagem,
> > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > > > > ;
> > > > > > >  dvi_cod_dados_via_iti | via_cod_viagem
> > > > > > > -----------------------+----------------
> > > > > > >             1059964445 |        7989813
> > > > > > >             1060072727 |        7990876
> > > > > > > (2 rows)
> > > > > > >
> > > > > > > Time: 24.428 ms
> > > > > > >
> > > > > > > Greetings
> > > > > > >
> > > > > > >   Anderson
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Bill Moran <wmoran@potentialtech.com>
> > > > > >
> > > >
> > > >
> > > > --
> > > > Bill Moran <wmoran@potentialtech.com>
> > > >
> >
> >
> > --
> > Bill Moran <wmoran@potentialtech.com>
> >


--
Bill Moran