Обсуждение: insert performance
The database is postgresql 9.3, running on debian7, with 8 cpu cores and 8096MB physical memory.
There is a big table, with 70 more columns. It would be constantly at 700 rows/sec. It's not feasible to use COPY, because the data is not predefined or provisioned, and it's generated on demand by clients.
To make a clean test env, I clone a new table, removing the indexes (keeping the primary key) and triggers, and use pgbench to test insert statement purely.
Here is some key items in the postgresql.conf:
--------------
shared_buffers = 1024MB
work_mem = 32MB
maintenance_work_mem = 128MB
bgwriter_delay = 20ms
synchronous_commit = off
checkpoint_segments = 64
checkpoint_completion_target = 0.9
effective_cache_size = 4096MB
log_min_duration_statement = 1000
--------------
On 1/9/16 11:57 PM, Jinhua Luo wrote: > But I do not understand that why the process do so many IO with async > commit? And it does not even happen at the shared buffer flushing and > locks waiting. Where's the code path doing these IO? I assume you're asking about all the IO to the heap table. That is most likely occurring as part of ReadBuffer(). As soon as you fill up shared buffers, BufferAlloc() is likely to end up with a dirty buffer, resulting in it calling FlushBuffer() (see src/backend/storage/buffer/bufmgr.c#1084). Note that that call is tracked by TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(), so I'd expect you to see it in the relevant systemtap stats. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On Sat, Jan 9, 2016 at 9:57 PM, Jinhua Luo <luajit.io@gmail.com> wrote: > > To make a clean test env, I clone a new table, removing the indexes (keeping > the primary key) and triggers, and use pgbench to test insert statement > purely. Can you share the pgbench command line, and the sql file you feed to it (and whatever is needed to set up the schema)? Thanks, Jeff
Hi, I found the insert performance is not related to the table schema. In fact, I could recur the issue using simple table: create table test(k bigserial primary key, a int, b int, c text, d text); test.sql: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); pgbench -r -N -n -c 4 -j 1 -T 600 -f test.sql I also compile and run it on the latest 9.4 version, the same issue. Regards, Jinhua Luo 2016-01-12 3:20 GMT+08:00 Jeff Janes <jeff.janes@gmail.com>: > On Sat, Jan 9, 2016 at 9:57 PM, Jinhua Luo <luajit.io@gmail.com> wrote: >> >> To make a clean test env, I clone a new table, removing the indexes (keeping >> the primary key) and triggers, and use pgbench to test insert statement >> purely. > > Can you share the pgbench command line, and the sql file you feed to > it (and whatever is needed to set up the schema)? > > > Thanks, > > Jeff
Hi All, I found it's not related to file I/O. I use systemtap to diagnose the postgres backend process. The systemtap script is a modified version of sample-bt-off-cpu: https://gist.github.com/kingluo/15b656998035cef193bc Test process: 1) create a simple table: ----------- create table test(k bigserial primary key, a int, b int, c text, d text); ----------- 2) test.sql: ----------- insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); ----------- 3) I run the pgbench, using 4 connections: ----------- $ time pgbench -r -N -n -c 4 -j 1 -T 3600 -f test.sql ----------- 4) I run the systemtap script for 3 minutes (Here the 987 is the pid of one postgres backend process): Note that sum, max, avg, min time is in microsecond. ----------- # ./sample-bt-off-cpu -a -v -p 987 -t 180 > /tmp/postgres.bt Pass 1: parsed user script and 110 library script(s) using 26916virt/22300res/4276shr/18740data kb, in 380usr/10sys/386real ms. Pass 2: analyzed script: 18 probe(s), 13 function(s), 5 embed(s), 12 global(s) using 54708virt/51956res/5920shr/46532data kb, in 1870usr/360sys/2669real ms. Pass 3: translated to C into "/tmp/stapteVG3Q/stap_18d161acb3024931de917335496977c3_12813_src.c" using 54708virt/52156res/6120shr/46532data kb, in 8680usr/250sys/24647real ms. Pass 4: compiled C into "stap_18d161acb3024931de917335496977c3_12813.ko" in 20450usr/610sys/43898real ms. Pass 5: starting run. WARNING: Tracing 987 (/opt/postgresql/bin/postgres)... WARNING: Too many CFI instuctions WARNING: Time's up. Quitting now...(it may take a while) WARNING: query time, count=646253, sum=102991078, max=2474344, avg=159, min=83 WARNING: lock time, count=142, sum=3306500, max=1158862, avg=23285, min=16 WARNING: lwlock time, count=141272, sum=7260098, max=1383180, avg=51, min=1 WARNING: Number of errors: 0, skipped probes: 24 Pass 5: run completed in 10usr/110sys/180744real ms. ----------- During that 3 minutes, the postgres prints below logs: ----------- 2016-01-13 23:27:21 CST [987-157] postgres@postgres LOG: duration: 2474.304 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:27:48 CST [987-158] postgres@postgres LOG: duration: 1383.359 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:33 CST [987-159] postgres@postgres LOG: process 987 still waiting for ExclusiveLock on extension of relation 16386 of database 12141 after 1000.212 ms 2016-01-13 23:28:33 CST [987-160] postgres@postgres DETAIL: Process holding the lock: 990. Wait queue: 988, 987, 989. 2016-01-13 23:28:33 CST [987-161] postgres@postgres STATEMENT: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:33 CST [987-162] postgres@postgres LOG: process 987 acquired ExclusiveLock on extension of relation 16386 of database 12141 after 1158.818 ms 2016-01-13 23:28:33 CST [987-163] postgres@postgres STATEMENT: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:33 CST [987-164] postgres@postgres LOG: duration: 1159.512 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:45 CST [987-165] postgres@postgres LOG: duration: 1111.664 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); ----------- The final backtrace result is converted into flame graph, see: http://luajit.io/systemtap/pgsql/postgres_1.svg Any advice? Regards, Jinhua Luo 2016-01-11 5:05 GMT+08:00 Jim Nasby <Jim.Nasby@bluetreble.com>: > On 1/9/16 11:57 PM, Jinhua Luo wrote: >> >> But I do not understand that why the process do so many IO with async >> commit? And it does not even happen at the shared buffer flushing and >> locks waiting. Where's the code path doing these IO? > > > I assume you're asking about all the IO to the heap table. That is most > likely occurring as part of ReadBuffer(). As soon as you fill up shared > buffers, BufferAlloc() is likely to end up with a dirty buffer, resulting in > it calling FlushBuffer() (see src/backend/storage/buffer/bufmgr.c#1084). > > Note that that call is tracked by > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(), so I'd expect you to see it in > the relevant systemtap stats. > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi, I thought with async commit enabled, the backend process would rarely do file io. But in fact, it still involves a lot of file io. After inspecting the vfs probes using systemtap, and reading the source codes of postgresql, I found the tight loop of insert or update will cause heavy file io upon the data files (table, indexes) directly by the backend process! And those io has nothing to do with shared buffer dirty writes. The heap_insert() or heap_update() would invoke RelationGetBufferForTuple(), which in turn finally invoke ReadBuffer_common(): 1) lookup or allocate the buffer from shared buffer, which may cause dirty write (but in my case, it's rare. Maybe the shared buffer is big enough and the checkpointer or bgwriter always clean it in time). If the target buffer is found, skip following steps. 2) if it needs to extend the relation (insert or update on new table would normally fall in this case), then it would write zero-filled page into the disk (used to occupy the file space? But most file systems support file hole or space reservation, so maybe this part could be optimized?) This procedure would hold the exclusive lock on the relation. So if the write is slow, it would slow down all pending queries of the lock waiters. 3) Otherwise, it would read from disk. The target buffer would be locked exclusively until the insert or update finish. Note that the insert or update also involve xlog insert, although with async commit enabled, the backend process would not flush the xlog, but chances are that the xlog buffer dirty writes happens (although it's also rare in my case). So I guess the real reason is the file io with lock holding. If io spike happens, it would cause long query duration. Am I correct? Look forward to any advice. Thanks. Regards, Jinhua Luo
Hi, There is another problem. When the autovacuum recycles the old pages, the ReadBuffer_common() would do mdread() instead of mdextend(). The read is synchronous, while the write could be mostly asynchronous, so the frequent read is much worse than write version. Any help? Please. Regards, Jinhua Luo 2016-01-19 12:50 GMT+08:00 Jinhua Luo <luajit.io@gmail.com>: > Hi, > > I thought with async commit enabled, the backend process would rarely > do file io. But in fact, it still involves a lot of file io. > > After inspecting the vfs probes using systemtap, and reading the > source codes of postgresql, I found the tight loop of insert or update > will cause heavy file io upon the data files (table, indexes) directly > by the backend process! And those io has nothing to do with shared > buffer dirty writes. > > The heap_insert() or heap_update() would invoke > RelationGetBufferForTuple(), which in turn finally invoke > ReadBuffer_common(): > > 1) lookup or allocate the buffer from shared buffer, which may cause > dirty write (but in my case, it's rare. Maybe the shared buffer is big > enough and the checkpointer or bgwriter always clean it in time). If > the target buffer is found, skip following steps. > > 2) if it needs to extend the relation (insert or update on new table > would normally fall in this case), then it would write zero-filled > page into the disk (used to occupy the file space? But most file > systems support file hole or space reservation, so maybe this part > could be optimized?) This procedure would hold the exclusive lock on > the relation. So if the write is slow, it would slow down all pending > queries of the lock waiters. > > 3) Otherwise, it would read from disk. > > The target buffer would be locked exclusively until the insert or > update finish. Note that the insert or update also involve xlog > insert, although with async commit enabled, the backend process would > not flush the xlog, but chances are that the xlog buffer dirty writes > happens (although it's also rare in my case). > > So I guess the real reason is the file io with lock holding. If io > spike happens, it would cause long query duration. > > Am I correct? Look forward to any advice. > > Thanks. > > Regards, > Jinhua Luo
On 1/21/16 11:54 PM, Jinhua Luo wrote: > There is another problem. > When the autovacuum recycles the old pages, the ReadBuffer_common() > would do mdread() instead of mdextend(). > The read is synchronous, while the write could be mostly asynchronous, > so the frequent read is much worse than write version. Please don't top post. AFAICT your analysis is correct, though I don't see what autovacuum has to do with anything. When we need a new block to put data on we'll either find one with free space in the free space map and use it, or we'll extend the relation. > Any help? Please. There's been some discussion on ways to improve the performance of relation extension (iirc one of those was to not zero the new page), but ultimately you're at the mercy of the underlying OS and hardware. If you have ideas for improving this you should speak up on -hackers, but before doing so you should read the archives about what's been proposed in the past. > 2016-01-19 12:50 GMT+08:00 Jinhua Luo <luajit.io@gmail.com>: >> Hi, >> >> I thought with async commit enabled, the backend process would rarely >> do file io. But in fact, it still involves a lot of file io. >> >> After inspecting the vfs probes using systemtap, and reading the >> source codes of postgresql, I found the tight loop of insert or update >> will cause heavy file io upon the data files (table, indexes) directly >> by the backend process! And those io has nothing to do with shared >> buffer dirty writes. >> >> The heap_insert() or heap_update() would invoke >> RelationGetBufferForTuple(), which in turn finally invoke >> ReadBuffer_common(): >> >> 1) lookup or allocate the buffer from shared buffer, which may cause >> dirty write (but in my case, it's rare. Maybe the shared buffer is big >> enough and the checkpointer or bgwriter always clean it in time). If >> the target buffer is found, skip following steps. >> >> 2) if it needs to extend the relation (insert or update on new table >> would normally fall in this case), then it would write zero-filled >> page into the disk (used to occupy the file space? But most file >> systems support file hole or space reservation, so maybe this part >> could be optimized?) This procedure would hold the exclusive lock on >> the relation. So if the write is slow, it would slow down all pending >> queries of the lock waiters. >> >> 3) Otherwise, it would read from disk. >> >> The target buffer would be locked exclusively until the insert or >> update finish. Note that the insert or update also involve xlog >> insert, although with async commit enabled, the backend process would >> not flush the xlog, but chances are that the xlog buffer dirty writes >> happens (although it's also rare in my case). >> >> So I guess the real reason is the file io with lock holding. If io >> spike happens, it would cause long query duration. >> >> Am I correct? Look forward to any advice. >> >> Thanks. >> >> Regards, >> Jinhua Luo -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com