[BUGS] BUG #14575: Standby recovery process call close() very slow,when drop (many small files) database on Primary.

Поиск
Список
Период
Сортировка
От digoal@126.com
Тема [BUGS] BUG #14575: Standby recovery process call close() very slow,when drop (many small files) database on Primary.
Дата
Msg-id 20170303023246.25054.66379@wrigleys.postgresql.org
обсуждение исходный текст
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14575
Logged by:          Zhou Digoal
Email address:      digoal@126.com
PostgreSQL version: 9.6.2
Operating system:   CentOS 6.x x64
Description:

HI,
  I have a user case, which will create many table (and index) in
PostgreSQL, about 3000*150 tables, and 2.25 million small file in
database.
  I also use stream replication create a physical standby.
  And in this case, user will create and drop database/schema daily, so
postgresql must create and drop about 3000*150 tables, and 2.25 million
small file in database every day.
  in this case, i found postgresql standby will delay, when checkpoint, when
drop database, when drop schema.
  I use perf, pstack see standby's startup, checkpoint process, it do
close() syscall very slow.
  but why checkpoint and drop database|schema so fast in primary?
  
Best regards, digoal.

test case:

1\.  
  
```  
create database tmp1 with template template0;  
```  
  
2\. 
  
```  
\c tmp1  
  
create table public.test(  
c0 serial  unique  check(c0>0),   
c1 serial  unique  check(c1>0),   
c2 serial  unique  check(c2>0),  
c3 serial  unique  check(c3>0),   
c4 serial  ,  
c5 serial  ,   
c6 serial  ,  
c7 serial  ,   
c8 serial  ,  
c9 serial  ,   
c10 serial  ,   
c11 serial  ,   
c12 serial  ,  
c13 serial  ,   
c14 serial  ,  
c15 serial  ,   
c16 serial  ,  
c17 serial  ,   
c18 serial  ,  
c19 serial  ,   
c20 serial  ,   
c21 serial  ,   
c22 serial  ,  
c23 serial  ,   
c24 serial  ,  
c25 serial  ,   
c26 serial  ,  
c27 serial  ,   
c28 serial  ,  
c29 serial  ,   
c30 serial  ,   
c31 serial  ,   
c32 serial  ,  
c33 serial  ,   
c34 serial  ,  
c35 serial  ,   
c36 serial  ,  
c37 serial  ,   
c38 serial  ,  
c39 serial    
);  
```  

3\. 
  
```  
for ((i=1;i<=3000;i++)) ; do psql tmp1 -c "create schema s$i" ; done  
```  
  
this step , standby no delay.
  
```  
select
pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_insert_location(),sent_location)),

pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_insert_location(),replay_location))
from pg_stat_replication ;  
  
\watch 1  
```  
  
4\. create table now
  
```  
vi test.sh  
  
for ((x=1;x<=150;x++)) ; do (for ((i=1;i<=3000;i++)) ; do psql tmp1 -c
"create table if not exists s$i.test$x (like public.test including all)" ;
done) & done  
  
chmod 500 test.sh  
  
nohup ./test.sh >/dev/null 2>&1 &  
```  
  
this step, standby also no delay.
  
```  
select count(*) from pg_class;  
  
2.25 million record.  
```  
  

  
```  
\l+  
52 GB   
```  
  
5\. now checkpoint in primary
  
```  
checkpoint; 

very quick end;
```  
  
on standby :   

checkpoint process CPU 100%,run about 7 hours.
  
strace -p checkpoint, many close() syscall, and very slow.  
  
perf 
  
```
perf record -avg -p checkpoint PID

perf report --stdio

    99.74%  postgres  postgres           [.] remove_from_unowned_list
            |
            --- remove_from_unowned_list
                smgrclose
                smgrcloseall
                CheckpointerMain
                AuxiliaryProcessMain
                StartChildProcess
                sigusr1_handler
                __restore_rt
                PostmasterMain
                startup_hacks
                __libc_start_main
```
  
and bgwriter perf
  
```
    99.78%  postgres  postgres           [.] remove_from_unowned_list
            |
            --- remove_from_unowned_list
                smgrclose
                smgrcloseall
                BackgroundWriterMain
                AuxiliaryProcessMain
                StartChildProcess
                sigusr1_handler
                __restore_rt
                PostmasterMain
                startup_hacks
                __libc_start_main
```
  
but checkpoint don't conflict with startup, so postgresql standby don't
delay now.

```
close(149466)                           = 0
close(137674)                           = 0
close(895987)                           = 0
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x7facea633e3c)            = 217400424
close(438148)                           = 0
close(541088)                           = 0
```
  
this step , i found when i use very large
max_files_per_process=1024000,checkpointer process in standby can quick stop
( restart standby ). 
  
6\. create another big database (many small files)
  
```  
create database test with template tmp1;  
```  
  
this time, standby also no delay . 

this is standby's startup process strace result.
  
```  
  
lstat("base/16384/12976402", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0  
open("base/16384/12976402", O_RDONLY)   = 12  
open("base/24795111/12976402", O_RDWR|O_CREAT|O_EXCL, 0600) = 13  
read(12, "", 65536)                     = 0  
close(13)                               = 0  
close(12)                               = 0  
lstat("base/16384/305700", {st_mode=S_IFREG|0600, st_size=8192, ...}) = 0

open("base/16384/305700", O_RDONLY)     = 12  
open("base/24795111/305700", O_RDWR|O_CREAT|O_EXCL, 0600) = 13  
read(12, "\0\0\0\0\30p\234+\0\0\0\0000\0\360\37\360\37\4
\0\0\0\0b1\5\0\2\0\0\0"..., 65536) = 8192  
write(13, "\0\0\0\0\30p\234+\0\0\0\0000\0\360\37\360\37\4
\0\0\0\0b1\5\0\2\0\0\0"..., 8192) = 8192  
sync_file_range(0xd, 0, 0x2000, 0x2)    = 0  
read(12, "", 65536)                     = 0  
close(13)                               = 0  
close(12)                               = 0  
lstat("base/16384/639437", {st_mode=S_IFREG|0600, st_size=8192, ...}) = 0

```  
  
7\. drop one schema  
  
```  
\c test  
  
drop schema s1 cascade;  
```  
  
standby's startup process very slow, delay .
  
8\. drop one database
  
```  
create database test1 with template tmp1;
drop database  test;  
drop database test1;

quick on primary
```  
  
standby will delay now, startup process very slow, and CPU  100%,

```
    99.76%  postgres  postgres           [.] remove_from_unowned_list
            |
            --- remove_from_unowned_list
                smgrclose
                smgrcloseall
                XLogDropDatabase
                dbase_redo
                StartupXLOG
                StartupProcessMain
                AuxiliaryProcessMain
                StartChildProcess
                PostmasterMain
                startup_hacks
                __libc_start_main
```




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Denise Wiedl
Дата:
Сообщение: Re: [BUGS] BUG #14573: lateral joins, ambuiguity
Следующее
От: from-postgresql@xenomachina.com
Дата:
Сообщение: [BUGS] BUG #14576: JDBC's PreparedStatement.getParameterMetaData fails with'could not determine polymorphic type'