Обсуждение: Optimizing for writes. Data integrity not critical

Поиск
Список
Период
Сортировка

Optimizing for writes. Data integrity not critical

От
Steve Bergman
Дата:
Hi,

I am using postgresql in small (almost trivial) application in which I
pull some data out of a Cobol C/ISAM file and write it into a pgsl
table.  My users can then use the data however they want by interfacing
to the data from OpenOffice.org.

The amount of data written is about 60MB and takes a few minutes on a
1200Mhz Athlon with a single 60MB IDE drive running Fedora Core 3 with
pgsql 7.4.7.  I'd like to speed up the DB writes a bit if possible.
Data integrity is not at all critical as the database gets dropped,
created, and populated immediately before each use.  Filesystem is ext3,
data=ordered and I need to keep it that way as there is other data in
the filesystem that I do care about.  I have not done any tuning in the
config file yet, and was wondering what things would likely speed up
writes in this situation.

I'm doing the writes individually.  Is there a better way?  Combining
them all into a transaction or something?

Thanks,
Steve Bergman

Re: Optimizing for writes. Data integrity not critical

От
"Steinar H. Gunderson"
Дата:
On Thu, May 19, 2005 at 05:21:07PM -0500, Steve Bergman wrote:
> I'm doing the writes individually.  Is there a better way?  Combining
> them all into a transaction or something?

Batching them all in one or a few transactions will speed it up a _lot_.
Using COPY would help a bit more on top of that.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Optimizing for writes. Data integrity not critical

От
Tom Lane
Дата:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Thu, May 19, 2005 at 05:21:07PM -0500, Steve Bergman wrote:
>> I'm doing the writes individually.  Is there a better way?  Combining
>> them all into a transaction or something?

> Batching them all in one or a few transactions will speed it up a _lot_.
> Using COPY would help a bit more on top of that.

Also, if you really don't need to worry about data integrity, turning
off fsync in the config file will probably help.  (Though since it's
an IDE drive, maybe not, as the drive may be lying about write complete
anyway.)

Increasing checkpoint_segments will help too, at the cost of disk space
(about 32MB per increment in the value, IIRC).  I'd suggest pushing it
up enough so you don't incur a checkpoint while the time-critical
operation runs.  checkpoint_timeout may be too small too.

            regards, tom lane

Re: Optimizing for writes. Data integrity not critical

От
Christopher Kings-Lynne
Дата:
> I'm doing the writes individually.  Is there a better way?  Combining
> them all into a transaction or something?

Use COPY of course :)

Or at worst bundle 1000 inserts at a time in a transation...

And if you seriously do not care about your data at all, set fsync = off
  in you postgresql.conf for a mega speedup.

Chris

Re: Optimizing for writes. Data integrity not critical

От
Alex Stapleton
Дата:
Is using a ramdisk in situations like this entirely ill-advised then?
When data integrity isn't a huge issue and you really need good write
performance it seems like it wouldn't hurt too much. Unless I am
missing something?

On 20 May 2005, at 02:45, Christopher Kings-Lynne wrote:

>> I'm doing the writes individually.  Is there a better way?
>> Combining them all into a transaction or something?
>>
>
> Use COPY of course :)
>
> Or at worst bundle 1000 inserts at a time in a transation...
>
> And if you seriously do not care about your data at all, set fsync
> = off  in you postgresql.conf for a mega speedup.
>
> Chris
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>
>


Re: Optimizing for writes. Data integrity not critical

От
Alex Stapleton
Дата:
I am interested in optimising write performance as well, the machine
I am testing on is maxing out around 450 UPDATEs a second which is
quite quick I suppose. I haven't tried turning fsync off yet. The
table has...a lot of indices as well. They are mostly pretty simple
partial indexes though.

I would usually just shuv stuff into memcached, but I need to store
and sort (in realtime) 10's of thousands of rows. (I am experimenting
with replacing some in house toplist generating stuff with a PG
database.) The partial indexes are basically the only thing which
makes the table usable btw.

The read performance is pretty damn good, but for some reason I chose
to wrote the benchmark script in PHP, which can totally destroy the
accuracy of your results if you decide to call pg_fetch_*(), even
pg_affected_rows() can skew things significantly.

So any ideas how to improve the number of writes I can do a second?
The existing system sorts everything by the desired column when a
request is made, and the data it sorts is updated in realtime (whilst
it isn't being sorted.) And it can sustain the read/write load (to
memory) just fine. If I PG had heap tables this would probably not be
a problem at all, but it does, so it is. Running it in a ramdisk
would be acceptable, it's just annoying to create the db everytime
the machine goes down. And having to run the entire PG instance off
of the ramdisk isn't great either.

On 19 May 2005, at 23:21, Steve Bergman wrote:

> Hi,
>
> I am using postgresql in small (almost trivial) application in
> which I pull some data out of a Cobol C/ISAM file and write it into
> a pgsl table.  My users can then use the data however they want by
> interfacing to the data from OpenOffice.org.
>
> The amount of data written is about 60MB and takes a few minutes on
> a 1200Mhz Athlon with a single 60MB IDE drive running Fedora Core 3
> with pgsql 7.4.7.  I'd like to speed up the DB writes a bit if
> possible.  Data integrity is not at all critical as the database
> gets dropped, created, and populated immediately before each use.
> Filesystem is ext3, data=ordered and I need to keep it that way as
> there is other data in the filesystem that I do care about.  I have
> not done any tuning in the config file yet, and was wondering what
> things would likely speed up writes in this situation.
>
> I'm doing the writes individually.  Is there a better way?
> Combining them all into a transaction or something?
>
> Thanks,
> Steve Bergman
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>