Обсуждение: slow update of index during insert/copy

Поиск
Список
Период
Сортировка

slow update of index during insert/copy

От
Thomas Finneid
Дата:
Hi

I am working on a table which stores up to 125K rows per second and I
find that the inserts are a little bit slow. The insert is in reality a
COPY of a chunk of rows, up to 125K. A COPY og 25K rows, without an
index, is fast enough, about 150ms. With the index, the insert takes
about 500ms. The read though, is lightning fast, because of the index.
It takes only 10ms to retrieve 1000 rows from a 15M row table. As the
table grows to several billion rows, that might change though.

I would like the insert, with an index, to be a lot faster than 500ms,
preferrably closer to 150ms. Any advice on what to do?
Additionally, I dont enough about pg configuring to be sure I have
included all the important directives and given them proportional
values, so any help on that as well would be appreciated.

Here are the details:

postgres 8.2.7 on latest kubuntu, running on dual Opteron quad cores,
with 8GB memory and 8 sata disks on a raid controller (no raid config)

table:

create table v1
(
         id_s            integer,
         id_f        integer,
         id_st          integer,
         id_t            integer,
         value1          real,
         value2          real,
         value3          real,
         value4          real,
         value5          real,
    ...
         value20         real
);

create index idx_v1 on v1 (id_s, id_st, id_t);

- insert is a COPY into the 5-8 first columns. the rest are unused so
   far.

postgres config:

autovacuum = off
checkpoint_segments = 96
commit_delay = 5
effective_cache_size = 128000
fsync = on
max_fsm_pages = 208000
max_fsm_relations = 10000
max_connections = 20
shared_buffers = 128000
wal_sync_method = fdatasync
wal_buffers = 256
work_mem = 512000
maintenance_work_mem = 2000000

Re: slow update of index during insert/copy

От
"Scott Carey"
Дата:
You may want to investigate pg_bulkload.

http://pgbulkload.projects.postgresql.org/

One major enhancement over COPY is that it does an index merge, rather than modify the index one row at a time. 
http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf



On Sun, Aug 31, 2008 at 6:32 AM, Thomas Finneid <tfinneid@student.matnat.uio.no> wrote:
Hi

I am working on a table which stores up to 125K rows per second and I find that the inserts are a little bit slow. The insert is in reality a COPY of a chunk of rows, up to 125K. A COPY og 25K rows, without an index, is fast enough, about 150ms. With the index, the insert takes about 500ms. The read though, is lightning fast, because of the index. It takes only 10ms to retrieve 1000 rows from a 15M row table. As the table grows to several billion rows, that might change though.

I would like the insert, with an index, to be a lot faster than 500ms, preferrably closer to 150ms. Any advice on what to do?
Additionally, I dont enough about pg configuring to be sure I have included all the important directives and given them proportional values, so any help on that as well would be appreciated.

Here are the details:

postgres 8.2.7 on latest kubuntu, running on dual Opteron quad cores, with 8GB memory and 8 sata disks on a raid controller (no raid config)

table:

create table v1
(
       id_s            integer,
       id_f            integer,
       id_st           integer,
       id_t            integer,
       value1          real,
       value2          real,
       value3          real,
       value4          real,
       value5          real,
       ...
       value20         real
);

create index idx_v1 on v1 (id_s, id_st, id_t);

- insert is a COPY into the 5-8 first columns. the rest are unused so
 far.

postgres config:

autovacuum = off
checkpoint_segments = 96
commit_delay = 5
effective_cache_size = 128000
fsync = on
max_fsm_pages = 208000
max_fsm_relations = 10000
max_connections = 20
shared_buffers = 128000
wal_sync_method = fdatasync
wal_buffers = 256
work_mem = 512000
maintenance_work_mem = 2000000

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:
Scott Carey wrote:
> You may want to investigate pg_bulkload.
>
> http://pgbulkload.projects.postgresql.org/
>
> One major enhancement over COPY is that it does an index merge, rather
> than modify the index one row at a time.

This is a command line tool, right? I need a jdbc driver tool, is that
possible?

regards

thomas


Re: slow update of index during insert/copy

От
Craig Ringer
Дата:
Thomas Finneid wrote:
> Hi
>
> I am working on a table which stores up to 125K rows per second and I
> find that the inserts are a little bit slow. The insert is in reality a
> COPY of a chunk of rows, up to 125K. A COPY og 25K rows, without an
> index, is fast enough, about 150ms. With the index, the insert takes
> about 500ms. The read though, is lightning fast, because of the index.
> It takes only 10ms to retrieve 1000 rows from a 15M row table. As the
> table grows to several billion rows, that might change though.
>
> I would like the insert, with an index, to be a lot faster than 500ms,
> preferrably closer to 150ms. Any advice on what to do?
> Additionally, I dont enough about pg configuring to be sure I have
> included all the important directives and given them proportional
> values, so any help on that as well would be appreciated.
>
> Here are the details:
>
> postgres 8.2.7 on latest kubuntu, running on dual Opteron quad cores,
> with 8GB memory and 8 sata disks on a raid controller (no raid config)

Just on a side note, your system is pretty strangely heavy on CPU
compared to its RAM and disk configuration. Unless your workload in Pg
is computationally intensive or you have something else hosted on the
same machine, those CPUs will probably sit mostly idle.

The first thing you need to do is determine where, during your bulk
loads, the system is bottlenecked. I'd guess it's stuck waiting for disk
writes, personally, but I'd want to investigate anyway.

If you're not satisfied with the results from pg_bulkload you can look
into doing things like moving your indexes to separate tablespaces (so
they don't fight for I/O on the same disk sets as your tables),
separating your bulk load tables from other online/transactional tables,
etc.

Also, to relay common advice from this list:

If you land up considering hardware as a performance answer, getting a
decent SAS RAID controller with a battery backed cache (so you can
enable its write cache) and a set of fast SAS disks might be worth it.
For that matter, a good SATA RAID controller and some 10kRPM SATA disks
could help too. It all appears to depend a lot on the particular
workload and the characteristics of the controller, though.

--
Craig Ringer

Re: slow update of index during insert/copy

От
"Scott Carey"
Дата:
Are you even getting COPY to work with JDBC?  As far as I am aware, COPY doesn't work with JDBC at the moment:
http://jdbc.postgresql.org/todo.html   Listed in the todo page, under "PG Extensions"   is "Add support for COPY."  I tried to use it with JDBC a while ago and gave up after a couple limited experiments and reading that -- but didn't dig very deep into it.

As suggested, you should determine if you are disk bound or CPU bound.  My experience with COPY is that it is suprisingly easy to make it CPU bound, but the conditions for that can vary quire a bit from schema to schema and hardware to hardware.
 
pg_bulkload may not be the tool for you for many reasons -- it requires a rigid data format and control file, very similar to Oracle's sqlloader.  It may not fit your needs at all -- its just worth a look to see if it does since if there's a match, it will be much faster. 

On Sun, Aug 31, 2008 at 6:49 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Thomas Finneid wrote:
> Hi
>
> I am working on a table which stores up to 125K rows per second and I
> find that the inserts are a little bit slow. The insert is in reality a
> COPY of a chunk of rows, up to 125K. A COPY og 25K rows, without an
> index, is fast enough, about 150ms. With the index, the insert takes
> about 500ms. The read though, is lightning fast, because of the index.
> It takes only 10ms to retrieve 1000 rows from a 15M row table. As the
> table grows to several billion rows, that might change though.
>
> I would like the insert, with an index, to be a lot faster than 500ms,
> preferrably closer to 150ms. Any advice on what to do?
> Additionally, I dont enough about pg configuring to be sure I have
> included all the important directives and given them proportional
> values, so any help on that as well would be appreciated.
>
> Here are the details:
>
> postgres 8.2.7 on latest kubuntu, running on dual Opteron quad cores,
> with 8GB memory and 8 sata disks on a raid controller (no raid config)

Just on a side note, your system is pretty strangely heavy on CPU
compared to its RAM and disk configuration. Unless your workload in Pg
is computationally intensive or you have something else hosted on the
same machine, those CPUs will probably sit mostly idle.

The first thing you need to do is determine where, during your bulk
loads, the system is bottlenecked. I'd guess it's stuck waiting for disk
writes, personally, but I'd want to investigate anyway.

If you're not satisfied with the results from pg_bulkload you can look
into doing things like moving your indexes to separate tablespaces (so
they don't fight for I/O on the same disk sets as your tables),
separating your bulk load tables from other online/transactional tables,
etc.

Also, to relay common advice from this list:

If you land up considering hardware as a performance answer, getting a
decent SAS RAID controller with a battery backed cache (so you can
enable its write cache) and a set of fast SAS disks might be worth it.
For that matter, a good SATA RAID controller and some 10kRPM SATA disks
could help too. It all appears to depend a lot on the particular
workload and the characteristics of the controller, though.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:
Scott Carey wrote:
> Are you even getting COPY to work with JDBC?  As far as I am aware, COPY
> doesn't work with JDBC at the moment:

I used a patched jdbc driver, provided by someone on the list, dont have
the reference at hand. It works perfectly and its about 5 times faster,
for my job, than insert.

> As suggested, you should determine if you are disk bound or CPU bound.
> My experience with COPY is that it is suprisingly easy to make it CPU
> bound, but the conditions for that can vary quire a bit from schema to
> schema and hardware to hardware.

COPY is not the problem, as far as I see. The problem is the update
speed of the index. I tested the same procedure on a table with and
without an index. Having an index makes it 200-250% slower, than without.

But as you state I should check whether the problem is cpu or disk
bound. In addition, as someone else suggested, I might need to move the
indexes to a different disk, which is not a bad idea considering the
index becomes quite large with up 125K rows a second.

But I haver another consern, which is the db server configuration. I am
not entirely convinced the db is configured prperly. I had one problem
where the disk started thrashing after the table had reached a certainb
size, so when I configured shmmax, and the corresponding in pg, properly
I got rid of the trashing. I will have to read through the documentation
  properly.

regards

thomas

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:
Craig Ringer wrote:
> Just on a side note, your system is pretty strangely heavy on CPU
> compared to its RAM and disk configuration. Unless your workload in Pg
> is computationally intensive or you have something else hosted on the
> same machine, those CPUs will probably sit mostly idle.

Its a devel machine for experimenting with pg and the disk performance
and for experimenting with multithreaded java programs. Its not going to
be particularily demanding on memory, but 8GB is good enough, I think.

> The first thing you need to do is determine where, during your bulk
> loads, the system is bottlenecked. I'd guess it's stuck waiting for disk
> writes, personally, but I'd want to investigate anyway.

Will investigate.

> If you're not satisfied with the results from pg_bulkload you can look
> into doing things like moving your indexes to separate tablespaces (so
> they don't fight for I/O on the same disk sets as your tables),
> separating your bulk load tables from other online/transactional tables,
> etc.

(Btw, its jdbc copy, not commandline.)
I dont think its the bulkload thats the problem, in it self, because
loading it without an index is quite fast (and 5 times faster than
ordinary insert). But of course, the bulkload process affects other
parts of the system which can cause a bottleneck.

> Also, to relay common advice from this list:
>
> If you land up considering hardware as a performance answer, getting a
> decent SAS RAID controller with a battery backed cache (so you can
> enable its write cache) and a set of fast SAS disks might be worth it.
> For that matter, a good SATA RAID controller and some 10kRPM SATA disks
> could help too. It all appears to depend a lot on the particular
> workload and the characteristics of the controller, though.

It does have a sata raid controller, but not have the battery pack,
because its a develmachine and not a production machine, I thought it
was not needed. But if you are saying the battery pack enables a cache
which enables faster disk writes I will consider it.
Its the first time I have worked with a raid controller, so I suspect I
have to read up on the features to understand how to utilise it best.

regards

thomas



Re: slow update of index during insert/copy

От
"Scott Marlowe"
Дата:
On Mon, Sep 1, 2008 at 5:29 AM, Thomas Finneid
<tfinneid@student.matnat.uio.no> wrote:

> It does have a sata raid controller, but not have the battery pack, because
> its a develmachine and not a production machine, I thought it was not
> needed. But if you are saying the battery pack enables a cache which enables
> faster disk writes I will consider it.
> Its the first time I have worked with a raid controller, so I suspect I have
> to read up on the features to understand how to utilise it best.

The model of the controller will have a large impact on performance as
well.  The latest fastest RAID controllers have dual core 1.2GHz CPUs
on them, where some slower ones still in produciton are using 333MHz
single core CPUs.  The quality of the firmware, the linux driver (or
windows, or bsd) all have a large impact on the performance of a raid
controller.

Definitely look into the battery backing unit.  But if it's a $250
card, then it might not be enough.

Re: slow update of index during insert/copy

От
"Scott Carey"
Дата:
On Raid Controllers and Dev machines:

For a dev machine the battery backup is NOT needed.

Battery back up makes a _production_ system faster:  In production, data integrity is everything, and write-back caching is dangerous without a battery back up.

So:
Without BBU:   Write-through cache = data safe in power failure; Write back cache = not safe in power failure.
With BBU :   Both modes are safe on power loss.

Write-back is a lot faster for the WAL log in particular.

For a development box, just enable write-back caching regardless of the battery back up situation.  As long as its not your only copy of critical data you just want to improve performance for the dev box.  Just make sure whatever data on that array can be replaced without condern if you're in the middle of writing to that data when power fails.
-----------

On JDBC and COPY:
Thanks for the info on the patch to support it -- however the versions posted there are rather old, and the 8.3 version is not even the same as the 8 month old current release -- its 3 releases prior and 8 months older than that.  There are quite a few bugfixes between 8.3 - v600 and v603: http://jdbc.postgresql.org/changes.html  and that concerns me.  Is there a patched version of the latest driver?  Or will that have to be undertaken by the user -- I worry about a conflict due to one of the changes since v600 listed.

On the performance impact of using COPY instead of INSERT :  out of curiosity, were you comparing COPY against raw row-by-row inserts (slow) or JDBC batch inserts (faster) or multi-row inserts: INSERT into X (a,b,c) values (1,2,3) , (4,5,6) , (7,8,9 ) , (10,11,12)  .... 
?
Copy should be faster than all of these, but I would not expect 5x faster for the latter two.


On Mon, Sep 1, 2008 at 10:51 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Sep 1, 2008 at 5:29 AM, Thomas Finneid
> It does have a sata raid controller, but not have the battery pack, because
> its a develmachine and not a production machine, I thought it was not
> needed. But if you are saying the battery pack enables a cache which enables
> faster disk writes I will consider it.
> Its the first time I have worked with a raid controller, so I suspect I have
> to read up on the features to understand how to utilise it best.

The model of the controller will have a large impact on performance as
well.  The latest fastest RAID controllers have dual core 1.2GHz CPUs
on them, where some slower ones still in produciton are using 333MHz
single core CPUs.  The quality of the firmware, the linux driver (or
windows, or bsd) all have a large impact on the performance of a raid
controller.

Definitely look into the battery backing unit.  But if it's a $250
card, then it might not be enough.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: slow update of index during insert/copy

От
Gregory Stark
Дата:
"Scott Carey" <scott@richrelevance.com> writes:

> On Raid Controllers and Dev machines:
>
> For a dev machine the battery backup is NOT needed.
>
> Battery back up makes a _production_ system faster:  In production, data
> integrity is everything, and write-back caching is dangerous without a
> battery back up.
>
> So:
> Without BBU:   Write-through cache = data safe in power failure; Write back
> cache = not safe in power failure.
> With BBU :   Both modes are safe on power loss.

This could be read the wrong way. With a BBU it's not that you can run the
drives in write-back mode safely. It's that you can cache in the BBU safely.
The drives still need to have their write caches off (ie, in write-through
mode).

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:
Scott Carey wrote:
> For a development box, just enable write-back caching regardless of the
> battery back up situation.  As long as its not your only copy of

Will have a look at it, the data is not important and can be reproduced
any time on any machine. The controller I have is a Areca ARC-1220
Serial ATA 8 port RAID Controller - PCI-E, SATA II, so I dont know
exactly what it supports of caching.


> On JDBC and COPY:
> Thanks for the info on the patch to support it -- however the versions
> posted there are rather old, and the 8.3 version is not even the same as
> the 8 month old current release -- its 3 releases prior and 8 months
> older than that.  There are quite a few bugfixes between 8.3 - v600 and
> v603: http://jdbc.postgresql.org/changes.html  and that concerns me.  Is
> there a patched version of the latest driver?  Or will that have to be

It was someone on the list who told me about the patch, I dont know the
situation of the patch at the current moment. I am using the patch on an
  PG 8.2.7, and it works fine.

> On the performance impact of using COPY instead of INSERT :  out of
> curiosity, were you comparing COPY against raw row-by-row inserts (slow)
> or JDBC batch inserts (faster) or multi-row inserts: INSERT into X
> (a,b,c) values (1,2,3) , (4,5,6) , (7,8,9 ) , (10,11,12)  ....
> ?

I tested row by row and jdbc batch, but I dont have the measured numbers
any more. But I suppose I could recreate the test if, need be.

regards
thomas



Re: slow update of index during insert/copy

От
"Scott Carey"
Дата:


On the performance impact of using COPY instead of INSERT :  out of curiosity, were you comparing COPY against raw row-by-row inserts (slow) or JDBC batch inserts (faster) or multi-row inserts: INSERT into X (a,b,c) values (1,2,3) , (4,5,6) , (7,8,9 ) , (10,11,12)  .... ?

I tested row by row and jdbc batch, but I dont have the measured numbers any more. But I suppose I could recreate the test if, need be.

regards
thomas


Don't re-create it, I was just curious.

Re: slow update of index during insert/copy

От
"Scott Marlowe"
Дата:
On Mon, Sep 1, 2008 at 2:32 PM, Thomas Finneid
<tfinneid@student.matnat.uio.no> wrote:
>
> Scott Carey wrote:
>>
>> For a development box, just enable write-back caching regardless of the
>> battery back up situation.  As long as its not your only copy of
>
> Will have a look at it, the data is not important and can be reproduced any
> time on any machine. The controller I have is a Areca ARC-1220 Serial ATA 8
> port RAID Controller - PCI-E, SATA II, so I dont know exactly what it
> supports of caching.

It's a pretty good card.  It should support 1G of cache at least, and
definitely supports battery backup.  Have had a pair of 1680 Arecas in
production for a month now and so far I'm very happy with the
reliability and performance.

The other Scott is technically right about the lack of need for
battery back on a dev machine as long as you go in and change the
chaching to write back, and I'm sure the card will give you a red
dialog box saying this is a bad idea. Now, if it would take you a day
of downtime to get a dev database back in place and running after a
power loss, then the bbu may be worth the $200 or so.

While I like to have a machine kitted out just like prod for testing,
for development I do tend to prefer slower machines, so that
developers might notice if they've just laid a big fat bloaty code
egg that's slower than it should be.  Try to optimize for acceptable
performance on such server and you're far more likely to get good
performance behaviour in production.

Re: slow update of index during insert/copy

От
"Scott Marlowe"
Дата:
On Mon, Sep 1, 2008 at 2:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> dialog box saying this is a bad idea. Now, if it would take you a day
> of downtime to get a dev database back in place and running after a
> power loss, then the bbu may be worth the $200 or so.

I just wanted to comment that depending on how many people depend on
the development machine to get their job done the more easy it is to
justify a battery.

If 20 people rely on a machine to do their job, just multiply their
hourly cost to the company times your restore time for a figure that
will be several times higher than the cost of the battery.

Re: slow update of index during insert/copy

От
"Scott Carey"
Дата:


On Mon, Sep 1, 2008 at 12:41 PM, Gregory Stark <stark@enterprisedb.com> wrote:
"Scott Carey" <scott@richrelevance.com> writes:

> On Raid Controllers and Dev machines:
>
> For a dev machine the battery backup is NOT needed.
>
> Battery back up makes a _production_ system faster:  In production, data
> integrity is everything, and write-back caching is dangerous without a
> battery back up.
>
> So:
> Without BBU:   Write-through cache = data safe in power failure; Write back
> cache = not safe in power failure.
> With BBU :   Both modes are safe on power loss.

This could be read the wrong way. With a BBU it's not that you can run the
drives in write-back mode safely. It's that you can cache in the BBU safely.
The drives still need to have their write caches off (ie, in write-through
mode).

--
 Gregory Stark
 EnterpriseDB          http://www.enterprisedb.com
 Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Actually, the drive level write-cache does not have to be disabled, the controller just has to issue a drive write-cache-flush and use write barriers appropriately.  They are only a problem if the controller assumes that data that it sent to the drive has gotten to the platters without checking up on it or issuing a cache flush command to validate that things are on the platter.  The controller, if its any good, should handle this downstream configuration or document that it does not.  What is appropriate will vary, see documentation. 
Drive write caches are 100% safe when used appropriately.  This is true with or without RAID, but in the case of a non-RAID or software raid setup the file system and OS have to do the right thing.  It true that many combinations of file system + OS (Linux LVM, for just one example) don't necessarily do the right thing, and some RAID controllers may also behave badly.  The safe thing is to turn off drive write back caches if in doubt, and the performance degradation caused by disabling it will be less for a good hardware RAID card with a large cache than in other cases.

Likewise, the safe thing is not to bother with write-back cache on the raid controller as well -- it protects against power failure but NOT various hardware failures or raid card death.  I've seen the latter, where upon power loss and restore, the raid card was broken, and thus it could not flush the data it had in RAM (assuming it was still there) to disk.  Luckily, after getting another card and loading up the db, there was no corruption and we went on our way.  Never, ever assume that your raid array + BBU are fail-safe.  All that stuff makes failure a lot less likely, but not 0.

Re: slow update of index during insert/copy

От
Greg Smith
Дата:
On Mon, 1 Sep 2008, Thomas Finneid wrote:

> It does have a sata raid controller, but not have the battery pack, because
> its a develmachine and not a production machine, I thought it was not needed.
> But if you are saying the battery pack enables a cache which enables faster
> disk writes I will consider it.

Some controllers will only let you enable a write-back cache if the
battery if installed, but those are fairly rare.  On a development system,
you usually can turn on write caching even if the battery that makes that
safe for production isn't there.

> The controller I have is a Areca ARC-1220 Serial ATA 8 port RAID
> Controller - PCI-E, SATA II, so I dont know exactly what it supports of
> caching.

On that card I'm not sure you can even turn off the controller write
caching if you wanted to.  There's one thing that looks like that though
but isn't:  go into the BIOS, look at System Configuration, and there will
be an option for "Disk Write Cache Mode".  That actually controls whether
the caches on the individual disks are enabled or not, and the default of
"Auto" sets that based on whethere there is a battery installed or not.
See http://www.gridpp.rl.ac.uk/blog/2008/02/12/areca-cards/ for a good
description of that.  The setting is quite confusing when set to Auto; I'd
recommend just setting it to "Disabled" and be done with it.

You can confirm what each drive is actually set to by drilling down into
the Physical Drives section, you'll find "Cache Mode: Write Back" if the
individual disk write caches are on, and "Write Through" if they're off.

I'd suggest you take a look at
http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html
to find out more about the utilities that come with the card you can
access under Linux.  You may have trouble using them under Ubuntu, I know
I did.  Better to know about that incompatibility before you've got a disk
failure.

I note that nobody has talked about your postgresql.conf yet.  I assume
you've turned autovacuum off because you're not ever deleting things from
these tables.  You'll still need to run VACUUM ANALYZE periodically to
keep good statistics for your tables, but I don't think that's relevant to
your question today.

I'd recommend changing all the memory-based parameters to use computer
units.  Here's what your configuration turned into when I did that:

effective_cache_size = 1000MB
shared_buffers = 1000MB
work_mem = 512MB
maintenance_work_mem = 2000MB
wal_buffers = 256kB

Those are all close enough that I doubt fiddling with them will change
much for your immediate problem.  For a system with 8GB of RAM like yours,
I would suggest replacing the above with the below set instead; see
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more
information.

effective_cache_size = 7000MB
shared_buffers = 2000MB
work_mem = 512MB
maintenance_work_mem = 512MB
wal_buffers = 1024kB
checkpoint_completion_target = 0.9

Note that such a large work_mem setting can run out of memory (which is
very bad on Linux) if you have many clients doing sorts at once.

> wal_sync_method = fdatasync

You should try setting this to open_sync , that can be considerably faster
for some write-heavy situations.  Make sure to test that throughly though,
there are occasional reports of issues with that setting under Linux;
seems to vary based on kernel version.  I haven't had a chance to test the
Ubuntu Hardy heavily in this area yet myself.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:
Greg Smith wrote:
> On Mon, 1 Sep 2008, Thomas Finneid wrote:

Thanks for all the info on the disk controller, I will have to look
through all that now :)

> I note that nobody has talked about your postgresql.conf yet.  I assume
> you've turned autovacuum off because you're not ever deleting things
> from these tables.

That is correct.

> You'll still need to run VACUUM ANALYZE periodically
> to keep good statistics for your tables, but I don't think that's

will look at it.

> You should try setting this to open_sync , that can be considerably
> faster for some write-heavy situations.  Make sure to test that
> throughly though, there are occasional reports of issues with that
> setting under Linux; seems to vary based on kernel version.  I haven't
> had a chance to test the Ubuntu Hardy heavily in this area yet myself.

The production machine is Solaris 10 running on a Sun v980. Do you know
of it has any issues like these?
Additionally, would I need to do any config changes when going from
linux to solaris?

The v980 has got lots of memory and a FC disk system, but I don't know
much more about controllers and disk etc. But I suspects its got at
least the same features as the disks and controller thats in the devel
machine.

regards

thomas

Re: slow update of index during insert/copy

От
Greg Smith
Дата:
On Tue, 2 Sep 2008, Thomas Finneid wrote:

>> You should try setting this to open_sync , that can be considerably faster
>> for some write-heavy situations.  Make sure to test that throughly though,
>> there are occasional reports of issues with that setting under Linux
>
> The production machine is Solaris 10 running on a Sun v980. Do you know of it
> has any issues like these?

On Solaris you can safely use open_datasync which is a bit better than
open_sync.  For best results, you need to separate the xlog onto a
separate partition and mount it using forcedirectio, because Postgres
doesn't know how to use direct I/O directly on Solaris yet.

> Additionally, would I need to do any config changes when going from linux to
> solaris?

Assuming the same amount of memory, the postgresql.conf should be
basically the same, except for the wal_sync_method change mentioned above.
If there's more RAM in the production server you can ramp up
shared_buffers, effective_cache_size, and possibly work_mem
proportionately.  The settings I suggested for maintenance_work_mem and
wal_buffers are already near the useful upper limits for those parameters.

There are a few operating system level things you should consider tweaking
on Solaris 10 for better PostgreSQL performance.  You need to be a bit
more careful about the parameters used for the filesystem than on Linux,
and the settings there vary considerably depending on whether you're using
UFS or ZFS.  The best intro to that I know of is at
http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best ; I added
some clarification to a few points in there and some other Solaris notes
at http://notemagnet.blogspot.com/2008_04_01_archive.html  Those should
get you started.

I hope you're already looking into some sort of repeatable benchmarking
that's representative of your application you can run.  You'll go crazy
playing with all these settings without something like that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: slow update of index during insert/copy

От
Matthew Wakeling
Дата:
On Mon, 1 Sep 2008, Scott Carey wrote:
> Thanks for the info on the patch to support it -- however the versions
> posted there are rather old...

Over here, we're using an extremely old patched version of the JDBC
driver. That's the patch I sent to some mailing list a couple of years
ago. It works very well, but I would be very eager to see the COPY support
make it to the mainstream driver with a consistent interface.

> On the performance impact of using COPY instead of INSERT.

Over here, we use the binary form of COPY, and it is *really* fast. It's
quite easy to saturate the discs.

Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are.                 -- Kyle Hearn

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:
What about filesystem properties?

on linux I am using:

    ext3(with journal) and  auto,rw,async,noatime,nodiratime

on disks for data and journal

I am unsure if I need a journal in the fs or if the db covers that
problem. With regards to that, do I then need to set some linux setting
to force inode syncing (dont rememver the name for the filesystem
structure in unix memory). The same question can be asked about the
async option.

any thoughts?

thomas


Greg Smith wrote:
> On Tue, 2 Sep 2008, Thomas Finneid wrote:
>
>>> You should try setting this to open_sync , that can be considerably
>>> faster for some write-heavy situations.  Make sure to test that
>>> throughly though, there are occasional reports of issues with that
>>> setting under Linux
>>
>> The production machine is Solaris 10 running on a Sun v980. Do you
>> know of it has any issues like these?
>
> On Solaris you can safely use open_datasync which is a bit better than
> open_sync.  For best results, you need to separate the xlog onto a
> separate partition and mount it using forcedirectio, because Postgres
> doesn't know how to use direct I/O directly on Solaris yet.
>
>> Additionally, would I need to do any config changes when going from
>> linux to solaris?
>
> Assuming the same amount of memory, the postgresql.conf should be
> basically the same, except for the wal_sync_method change mentioned
> above. If there's more RAM in the production server you can ramp up
> shared_buffers, effective_cache_size, and possibly work_mem
> proportionately.  The settings I suggested for maintenance_work_mem and
> wal_buffers are already near the useful upper limits for those parameters.
>
> There are a few operating system level things you should consider
> tweaking on Solaris 10 for better PostgreSQL performance.  You need to
> be a bit more careful about the parameters used for the filesystem than
> on Linux, and the settings there vary considerably depending on whether
> you're using UFS or ZFS.  The best intro to that I know of is at
> http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best ; I
> added some clarification to a few points in there and some other Solaris
> notes at http://notemagnet.blogspot.com/2008_04_01_archive.html  Those
> should get you started.
>
> I hope you're already looking into some sort of repeatable benchmarking
> that's representative of your application you can run.  You'll go crazy
> playing with all these settings without something like that.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: slow update of index during insert/copy

От
Greg Smith
Дата:
On Thu, 4 Sep 2008, Thomas Finneid wrote:

> I am unsure if I need a journal in the fs or if the db covers that problem.

There are some theoretical cases where the guarantees of ext3 seems a
little weak unless you've turned the full journal on even in a database
context (we just had a long thread on this last month; see
http://archives.postgresql.org/pgsql-performance/2008-08/msg00136.php for
the part that dives into this subject).  In practice, the "ordered" mode
(the default for ext3) seems sufficient to prevent database corruption.
There is a substantial performance hit to running in full journal mode
like you're doing;

http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

shows ordered mode as nearly 3X faster.

You should always do your own stress testing on your hardware anyway,
including a few rounds of powering off the server abruptly and making sure
it recovers from that.

> With regards to that, do I then need to set some linux setting to force inode
> syncing (dont rememver the name for the filesystem structure in unix memory).
> The same question can be asked about the async option.

In the default mode, the database speaks to the filesystem in terms of
writes followed by fsync, which forces both the data and associated
metadata out.  It works similarly if you switch to sync writes.
PostgreSQL is very precise about what data really needs to be written to
disk and what can sit in the cache until later, you shouldn't need to
worry about the sync parts at the filesystem level (as long as said
filesystem implementation is sane).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: slow update of index during insert/copy

От
Thomas Finneid
Дата:

Greg Smith wrote:
> In practice, the "ordered"
> mode (the default for ext3) seems sufficient to prevent database
> corruption. There is a substantial performance hit to running in full
> journal mode like you're doing;

where do you see which mode I am running in? I havent specified any
modes in any config or commands, so I am assuming its using ext3
defaults, which I am assuming is "ordered".

regards