Re:Re:Re: BUG #15187: When use huge page, there may be a lot ofhanged connections with status startup or authentication

Поиск
Список
Период
Сортировка
От chenhj
Тема Re:Re:Re: BUG #15187: When use huge page, there may be a lot ofhanged connections with status startup or authentication
Дата
Msg-id 4a20a92a.868c.1633620edc3.Coremail.chjischj@163.com
обсуждение исходный текст
Ответ на Re:Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication  (chenhj <chjischj@163.com>)
Ответы Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
>>
>>Chen, have you disabled transparent hugepages and zone reclaim?
>>
>>Greetings,
>>
>>Andres Freund
>
>c) Depend on huge page >huge_page=on, happen(no matter transparent_hugepage is [always] or [never]) >huge_page=off, not happen
>
>When disable transparent hugepages ,this problem also occurs.
>Aboud zone reclaim,I will see it later.
>What I doubt is that this problem does not occurs at PostgreSQL 9.6.2 (I tested 10.2 and 9.6.2 on the same machine)
>d) Depend on PostgreSQL Version
>PostgreSQL 10.2 happen
>PostgreSQL 9.6 not happen
>Chen Huajun
The problem occurs whether vm.zone_reclaim_mode is set to 0 or 1.

In addition, what needs to be corrected is that even huge_pages=off is problematic.

Huge_pages = on SQL execution is a very slow , and with hangd connections in startup and auth state.

This is the stack when SQL is executed

$ pstack 31175
#0  0x00000036d76e15c3 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000862cca in pg_usleep ()
#2  0x0000000000715b19 in perform_spin_delay ()
#3  0x0000000000715c18 in s_lock ()
#4  0x000000000085b157 in GetOldSnapshotThresholdTimestamp ()
#5  0x00000000006f577e in TestForOldSnapshot_impl ()
#6  0x00000000004b4b2d in heap_getnext ()
#7  0x0000000000616e71 in SeqNext ()
#8  0x00000000005fc677 in ExecScan ()
#9  0x000000000060454e in fetch_input_tuple ()
#10 0x00000000006062e9 in ExecAgg ()
#11 0x00000000005f9a9f in standard_ExecutorRun ()
#12 0x00007f8fcff5d6f8 in pgss_ExecutorRun () from /usr/pgsql-10/lib/pg_stat_statements.so
#13 0x000000000072673b in PortalRunSelect ()
#14 0x00000000007278d1 in PortalRun ()
#15 0x0000000000723f51 in exec_simple_query ()
#16 0x0000000000724f09 in PostgresMain ()
#17 0x00000000006b8c8a in PostmasterMain ()
#18 0x00000000006395e0 in main ()

This is the execution time of the SQL

postgres=# select * from pg_stat_statements where query ~ 'test20180503';
-[ RECORD 1 ]-------+----------------------------------
userid              | 16388
dbid                | 13451
queryid             | 1902655528
query               | select count(*) from test20180503
calls               | 608
total_time          | 217341245.157586992
min_time            | 291100.481972999987
max_time            | 378225.054139000014
mean_time           | 357469.153219715168
stddev_time         | 24984.333161207207
rows                | 608
shared_blks_hit     | 2689946
shared_blks_read    | 2278
shared_blks_dirtied | 4425
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 33.5047190000000015
blk_write_time      | 0


While huge_pages=off thres are no connections hanged in startup or auth states, but SQL execution is also very slow for several hundred seconds, during which time the CPU utilization rate is 100%.

This is the SQL execution time when huge_pages=off

postgres=# select * from pg_stat_statements where query ~ 'test20180503';
-[ RECORD 1 ]-------+----------------------------------
userid              | 16388
dbid                | 13451
queryid             | 1902655528
query               | select count(*) from test20180503
calls               | 1001
total_time          | 650132764.661145091
min_time            | 600196.529144000029
max_time            | 672296.767421000055
mean_time           | 649483.281379765598
stddev_time         | 19423.3311969555834
rows                | 1001
shared_blks_hit     | 4432428
shared_blks_read    | 0
shared_blks_dirtied | 4425
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 0
blk_write_time      | 0


While On PostgreSQL9.6@CentOS6.5 and PostgreSQL10.2@CentOS7.3. The same way to execute SQL, only takes 2 seconds.

postgres=# select * from pg_stat_statements where query ~ 'test20180503';
-[ RECORD 1 ]-------+-----------------------------------
userid              | 16389
dbid                | 13269
queryid             | 3703798644
query               | select count(*) from test20180503;
calls               | 1001
total_time          | 2751692.53700000001
min_time            | 1108.32999999999993
max_time            | 4031.02199999999993
mean_time           | 2748.94359340659184
stddev_time         | 642.984323244542566
rows                | 1001
shared_blks_hit     | 4428003
shared_blks_read    | 4425
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 90.8639999999999759
blk_write_time      | 0
-------
Regards,Chen Huajun

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15188: wrong count "tuples removed" in autovacuum log file
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #15187: When use huge page, there may be a lot of hangedconnections with status startup or authentication