Обсуждение: insert performance

Поиск
Список
Период
Сортировка

insert performance

От
Jinhua Luo
Дата:
Hi All,

The database is postgresql 9.3, running on debian7, with 8 cpu cores and 8096MB physical memory.


There is a big table, with 70 more columns. It would be constantly at 700 rows/sec. It's not feasible to use COPY, because the data is not predefined or provisioned, and it's generated on demand by clients.


To make a clean test env, I clone a new table, removing the indexes (keeping the primary key) and triggers, and use pgbench to test insert statement purely.


Here is some key items in the postgresql.conf:

--------------

shared_buffers = 1024MB

work_mem = 32MB

maintenance_work_mem = 128MB

bgwriter_delay = 20ms

synchronous_commit = off

checkpoint_segments = 64

checkpoint_completion_target = 0.9

effective_cache_size = 4096MB

log_min_duration_statement = 1000

--------------


The problem:

The log would show that the duration of some inserts exceed 1 second.

I use iotop to view the io activities of the pg backend worker process, it shows that it writes some MB per second. It's the most confused part. The commit is async, so it would not do and wait the wal writing, as well as the shared buffers. I doubt it would be flush at the shared buffer allocation. So I modify the codes, print the pgBufferUsage.blk_write_time along with the long duration printing. But I found it's a small fraction the total duration. I also add codes to record the total time on lock waiting within the statement execution and print it, and it's also a small fraction of the duration. I could not explain the result. 

Then I use systemtap to check what files the process frequenlty write out:
-----
Tracing 20991 (/usr/lib/postgresql/9.3/bin/postgres)...
Please wait for 30 seconds.

=== Top 10 file writes ===
#1: 6004 times, 49184768 bytes writes in file 21986.44.
#2: 400 times, 3276800 bytes writes in file 21988.3.
#3: 12 times, 98304 bytes writes in file 57ED.
#4: 10 times, 81920 bytes writes in file 57F0.
#5: 10 times, 81920 bytes writes in file 57EE.
#6: 9 times, 73728 bytes writes in file 57F1.
#7: 9 times, 73728 bytes writes in file 57F3.
#8: 8 times, 65536 bytes writes in file 57EB.
#9: 8 times, 65536 bytes writes in file 57F2.
#10: 4 times, 32768 bytes writes in file 57EF.
-----

The "21986.44" is the table data file, and the "21988.3" is the primary key index, and the others are subtrans files.

Obviously, the process does a lot of IO (vfs level), and I doubt the duration spikes are due to the IO contention from time to time.

But I do not understand that why the process do so many IO with async commit? And it does not even happen at the shared buffer flushing and locks waiting. Where's the code path doing these IO?


Regards,
Jinhua Luo

Re: insert performance

От
Jim Nasby
Дата:
On 1/9/16 11:57 PM, Jinhua Luo wrote:
> But I do not understand that why the process do so many IO with async
> commit? And it does not even happen at the shared buffer flushing and
> locks waiting. Where's the code path doing these IO?

I assume you're asking about all the IO to the heap table. That is most
likely occurring as part of ReadBuffer(). As soon as you fill up shared
buffers, BufferAlloc() is likely to end up with a dirty buffer,
resulting in it calling FlushBuffer() (see
src/backend/storage/buffer/bufmgr.c#1084).

Note that that call is tracked by
TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(), so I'd expect you to see it
in the relevant systemtap stats.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: insert performance

От
Jeff Janes
Дата:
On Sat, Jan 9, 2016 at 9:57 PM, Jinhua Luo <luajit.io@gmail.com> wrote:
>
> To make a clean test env, I clone a new table, removing the indexes (keeping
> the primary key) and triggers, and use pgbench to test insert statement
> purely.

Can you share the pgbench command line, and the sql file you feed to
it (and whatever is needed to set up the schema)?


Thanks,

Jeff


Re: insert performance

От
Jinhua Luo
Дата:
Hi,

I found the insert performance is not related to the table schema.

In fact, I could recur the issue using simple table:

create table test(k bigserial primary key, a int, b int, c text, d text);

test.sql:
insert into test(a, b, c, d) values(3438, 1231,
'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');


pgbench -r -N -n -c 4 -j 1 -T 600 -f test.sql

I also compile and run it on the latest 9.4 version, the same issue.

Regards,
Jinhua Luo




2016-01-12 3:20 GMT+08:00 Jeff Janes <jeff.janes@gmail.com>:
> On Sat, Jan 9, 2016 at 9:57 PM, Jinhua Luo <luajit.io@gmail.com> wrote:
>>
>> To make a clean test env, I clone a new table, removing the indexes (keeping
>> the primary key) and triggers, and use pgbench to test insert statement
>> purely.
>
> Can you share the pgbench command line, and the sql file you feed to
> it (and whatever is needed to set up the schema)?
>
>
> Thanks,
>
> Jeff


Re: insert performance

От
Jinhua Luo
Дата:
Hi All,

I found it's not related to file I/O.

I use systemtap to diagnose the postgres backend process.

The systemtap script is a modified version of sample-bt-off-cpu:

https://gist.github.com/kingluo/15b656998035cef193bc


Test process:

1) create a simple table:

-----------
create table test(k bigserial primary key, a int, b int, c text, d text);
-----------

2) test.sql:

-----------
insert into test(a, b, c, d) values(3438, 1231,
'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
-----------

3) I run the pgbench, using 4 connections:

-----------
$ time pgbench -r -N -n -c 4 -j 1 -T 3600 -f test.sql
-----------

4)  I run the systemtap script for 3 minutes (Here the 987 is the pid
of one postgres backend process):

Note that sum, max, avg, min time is in microsecond.

-----------
# ./sample-bt-off-cpu -a -v -p 987 -t 180 > /tmp/postgres.bt
Pass 1: parsed user script and 110 library script(s) using
26916virt/22300res/4276shr/18740data kb, in 380usr/10sys/386real ms.
Pass 2: analyzed script: 18 probe(s), 13 function(s), 5 embed(s), 12
global(s) using 54708virt/51956res/5920shr/46532data kb, in
1870usr/360sys/2669real ms.
Pass 3: translated to C into
"/tmp/stapteVG3Q/stap_18d161acb3024931de917335496977c3_12813_src.c"
using 54708virt/52156res/6120shr/46532data kb, in
8680usr/250sys/24647real ms.
Pass 4: compiled C into
"stap_18d161acb3024931de917335496977c3_12813.ko" in
20450usr/610sys/43898real ms.
Pass 5: starting run.
WARNING: Tracing 987 (/opt/postgresql/bin/postgres)...
WARNING: Too many CFI instuctions
WARNING: Time's up. Quitting now...(it may take a while)
WARNING: query time, count=646253, sum=102991078, max=2474344, avg=159, min=83
WARNING: lock time, count=142, sum=3306500, max=1158862, avg=23285, min=16
WARNING: lwlock time, count=141272, sum=7260098, max=1383180, avg=51, min=1
WARNING: Number of errors: 0, skipped probes: 24
Pass 5: run completed in 10usr/110sys/180744real ms.
-----------

During that 3 minutes, the postgres prints below logs:

-----------
2016-01-13 23:27:21 CST [987-157] postgres@postgres LOG:  duration:
2474.304 ms  statement: insert into test(a, b, c, d) values(3438,
1231, 'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
2016-01-13 23:27:48 CST [987-158] postgres@postgres LOG:  duration:
1383.359 ms  statement: insert into test(a, b, c, d) values(3438,
1231, 'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
2016-01-13 23:28:33 CST [987-159] postgres@postgres LOG:  process 987
still waiting for ExclusiveLock on extension of relation 16386 of
database 12141 after 1000.212 ms
2016-01-13 23:28:33 CST [987-160] postgres@postgres DETAIL:  Process
holding the lock: 990. Wait queue: 988, 987, 989.
2016-01-13 23:28:33 CST [987-161] postgres@postgres STATEMENT:  insert
into test(a, b, c, d) values(3438, 1231,
'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
2016-01-13 23:28:33 CST [987-162] postgres@postgres LOG:  process 987
acquired ExclusiveLock on extension of relation 16386 of database
12141 after 1158.818 ms
2016-01-13 23:28:33 CST [987-163] postgres@postgres STATEMENT:  insert
into test(a, b, c, d) values(3438, 1231,
'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
2016-01-13 23:28:33 CST [987-164] postgres@postgres LOG:  duration:
1159.512 ms  statement: insert into test(a, b, c, d) values(3438,
1231, 'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
2016-01-13 23:28:45 CST [987-165] postgres@postgres LOG:  duration:
1111.664 ms  statement: insert into test(a, b, c, d) values(3438,
1231, 'ooooooooooooooooooooooooooooooooooooo',
'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk');
-----------

The final backtrace result is converted into flame graph, see:
http://luajit.io/systemtap/pgsql/postgres_1.svg

Any advice?


Regards,
Jinhua Luo

2016-01-11 5:05 GMT+08:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
> On 1/9/16 11:57 PM, Jinhua Luo wrote:
>>
>> But I do not understand that why the process do so many IO with async
>> commit? And it does not even happen at the shared buffer flushing and
>> locks waiting. Where's the code path doing these IO?
>
>
> I assume you're asking about all the IO to the heap table. That is most
> likely occurring as part of ReadBuffer(). As soon as you fill up shared
> buffers, BufferAlloc() is likely to end up with a dirty buffer, resulting in
> it calling FlushBuffer() (see src/backend/storage/buffer/bufmgr.c#1084).
>
> Note that that call is tracked by
> TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(), so I'd expect you to see it in
> the relevant systemtap stats.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: insert performance

От
Jinhua Luo
Дата:
Hi,

I thought with async commit enabled, the backend process would rarely
do file io. But in fact, it still involves a lot of file io.

After inspecting the vfs probes using systemtap, and reading the
source codes of postgresql, I found the tight loop of insert or update
will cause heavy file io upon the data files (table, indexes) directly
by the backend process! And those io has nothing to do with shared
buffer dirty writes.

The heap_insert() or heap_update() would invoke
RelationGetBufferForTuple(), which in turn finally invoke
ReadBuffer_common():

1) lookup or allocate the buffer from shared buffer, which may cause
dirty write (but in my case, it's rare. Maybe the shared buffer is big
enough and the checkpointer or bgwriter always clean it in time). If
the target buffer is found, skip following steps.

2)  if it needs to extend the relation (insert or update on new table
would normally fall in this case), then it would write zero-filled
page into the disk (used to occupy the file space? But most file
systems support file hole or space reservation, so maybe this part
could be optimized?) This procedure would hold the exclusive lock on
the relation. So if the write is slow, it would slow down all pending
queries of the lock waiters.

3) Otherwise, it would read from disk.

The target buffer would be locked exclusively until the insert or
update finish. Note that the insert or update also involve xlog
insert, although with async commit enabled, the backend process would
not flush the xlog, but chances are that the xlog buffer dirty writes
happens (although it's also rare in my case).

So I guess the real reason is the file io with lock holding. If io
spike happens, it would cause long query duration.

Am I correct? Look forward to any advice.

Thanks.

Regards,
Jinhua Luo


Re: insert performance

От
Jinhua Luo
Дата:
Hi,

There is another problem.
When the autovacuum recycles the old pages, the ReadBuffer_common()
would do mdread() instead of mdextend().
The read is synchronous, while the write could be mostly asynchronous,
so the frequent read is much worse than write version.

Any help? Please.

Regards,
Jinhua Luo


2016-01-19 12:50 GMT+08:00 Jinhua Luo <luajit.io@gmail.com>:
> Hi,
>
> I thought with async commit enabled, the backend process would rarely
> do file io. But in fact, it still involves a lot of file io.
>
> After inspecting the vfs probes using systemtap, and reading the
> source codes of postgresql, I found the tight loop of insert or update
> will cause heavy file io upon the data files (table, indexes) directly
> by the backend process! And those io has nothing to do with shared
> buffer dirty writes.
>
> The heap_insert() or heap_update() would invoke
> RelationGetBufferForTuple(), which in turn finally invoke
> ReadBuffer_common():
>
> 1) lookup or allocate the buffer from shared buffer, which may cause
> dirty write (but in my case, it's rare. Maybe the shared buffer is big
> enough and the checkpointer or bgwriter always clean it in time). If
> the target buffer is found, skip following steps.
>
> 2)  if it needs to extend the relation (insert or update on new table
> would normally fall in this case), then it would write zero-filled
> page into the disk (used to occupy the file space? But most file
> systems support file hole or space reservation, so maybe this part
> could be optimized?) This procedure would hold the exclusive lock on
> the relation. So if the write is slow, it would slow down all pending
> queries of the lock waiters.
>
> 3) Otherwise, it would read from disk.
>
> The target buffer would be locked exclusively until the insert or
> update finish. Note that the insert or update also involve xlog
> insert, although with async commit enabled, the backend process would
> not flush the xlog, but chances are that the xlog buffer dirty writes
> happens (although it's also rare in my case).
>
> So I guess the real reason is the file io with lock holding. If io
> spike happens, it would cause long query duration.
>
> Am I correct? Look forward to any advice.
>
> Thanks.
>
> Regards,
> Jinhua Luo


Re: insert performance

От
Jim Nasby
Дата:
On 1/21/16 11:54 PM, Jinhua Luo wrote:
> There is another problem.
> When the autovacuum recycles the old pages, the ReadBuffer_common()
> would do mdread() instead of mdextend().
> The read is synchronous, while the write could be mostly asynchronous,
> so the frequent read is much worse than write version.

Please don't top post.

AFAICT your analysis is correct, though I don't see what autovacuum has
to do with anything. When we need a new block to put data on we'll
either find one with free space in the free space map and use it, or
we'll extend the relation.

> Any help? Please.

There's been some discussion on ways to improve the performance of
relation extension (iirc one of those was to not zero the new page), but
ultimately you're at the mercy of the underlying OS and hardware.

If you have ideas for improving this you should speak up on -hackers,
but before doing so you should read the archives about what's been
proposed in the past.

> 2016-01-19 12:50 GMT+08:00 Jinhua Luo <luajit.io@gmail.com>:
>> Hi,
>>
>> I thought with async commit enabled, the backend process would rarely
>> do file io. But in fact, it still involves a lot of file io.
>>
>> After inspecting the vfs probes using systemtap, and reading the
>> source codes of postgresql, I found the tight loop of insert or update
>> will cause heavy file io upon the data files (table, indexes) directly
>> by the backend process! And those io has nothing to do with shared
>> buffer dirty writes.
>>
>> The heap_insert() or heap_update() would invoke
>> RelationGetBufferForTuple(), which in turn finally invoke
>> ReadBuffer_common():
>>
>> 1) lookup or allocate the buffer from shared buffer, which may cause
>> dirty write (but in my case, it's rare. Maybe the shared buffer is big
>> enough and the checkpointer or bgwriter always clean it in time). If
>> the target buffer is found, skip following steps.
>>
>> 2)  if it needs to extend the relation (insert or update on new table
>> would normally fall in this case), then it would write zero-filled
>> page into the disk (used to occupy the file space? But most file
>> systems support file hole or space reservation, so maybe this part
>> could be optimized?) This procedure would hold the exclusive lock on
>> the relation. So if the write is slow, it would slow down all pending
>> queries of the lock waiters.
>>
>> 3) Otherwise, it would read from disk.
>>
>> The target buffer would be locked exclusively until the insert or
>> update finish. Note that the insert or update also involve xlog
>> insert, although with async commit enabled, the backend process would
>> not flush the xlog, but chances are that the xlog buffer dirty writes
>> happens (although it's also rare in my case).
>>
>> So I guess the real reason is the file io with lock holding. If io
>> spike happens, it would cause long query duration.
>>
>> Am I correct? Look forward to any advice.
>>
>> Thanks.
>>
>> Regards,
>> Jinhua Luo


--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com