Обсуждение: Really bad insert performance: what did I do wrong?

Поиск
Список
Период
Сортировка

Really bad insert performance: what did I do wrong?

От
Kevin White
Дата:
I've use PostgreSQL for some pretty amazing things, and was a proponent
of using it here.  I set up 7.3.1 on my development RedHat 8 box, and it
was fine.  I need to load about 700,000 rows into one table, the rows
only having 6 columns, and the load on my box happens in just a couple
of minutes (there's some calculation while the data is loading, and that
time was acceptable to me).

My box, however, isn't a production server, so I attempted to create the
database again on a Sun Blade:

SunOS trident 5.8 Generic_108528-17 sun4u sparc SUNW,UltraAX-i2
Status of processor 0 as of: 02/21/03 10:10:10
   Processor has been on-line since 01/13/03 13:53:51.
   The sparcv9 processor operates at 500 MHz,
         and has a sparcv9 floating point processor.

It isn't the world's fastest box, but should be fast enough for this.

However...

It took almost 2 days to load the table on this box.  The postgresql
process was constantly eating up all of the CPU time it could get, while
loading, at times, less than 1 row a second.

Now, I KNOW something is wrong, and it probably isn't postgres.  :)

I turned off the index on the table (there was only one, on three
fields) and made the whole load happen in a transaction.  The 2 day
result was after those changes were made.

I compiled postgres myself, using the gcc 3.2.2 package from sunfreeware.

Any thoughts?  I know that postgres should be more than capable of
loading this data quickly on this box...I figured it could possibly take
twice as long...and I thought my problem would be i/o related, not CPU.

Thanks!

Kevin



Re: Really bad insert performance: what did I do wrong?

От
Kevin White
Дата:
 > Yipes.  We found awhile ago that Solaris' standard qsort() really sucks,
> but 7.3 should work around that --- and I don't think qsort would be
> invoked during data load anyway.
>
> Do you want to rebuild Postgres with profiling enabled, and get a gprof
> trace so we can see where the time is going?  You don't need to run it
> for two days --- a few minutes' worth of runtime should be plenty.

Great...I'd love to...as I've never had a problem with Postgres like
this, I didn't even know where to start...I'll look for how to do that.

Kevin



Re: Really bad insert performance: what did I do wrong?

От
Tom Lane
Дата:
Kevin White <kwhite@digital-ics.com> writes:
> My box, however, isn't a production server, so I attempted to create the
> database again on a Sun Blade:
> ...
> It took almost 2 days to load the table on this box.

Yipes.  We found awhile ago that Solaris' standard qsort() really sucks,
but 7.3 should work around that --- and I don't think qsort would be
invoked during data load anyway.

Do you want to rebuild Postgres with profiling enabled, and get a gprof
trace so we can see where the time is going?  You don't need to run it
for two days --- a few minutes' worth of runtime should be plenty.

            regards, tom lane

Re: Really bad insert performance: what did I do wrong?

От
"Shridhar Daithankar"
Дата:
On Friday 21 Feb 2003 9:02 pm, you wrote:
> Any thoughts?  I know that postgres should be more than capable of
> loading this data quickly on this box...I figured it could possibly take
> twice as long...and I thought my problem would be i/o related, not CPU.

First, check vmstat or similar on SunOS. See what is maxing out. Second tunr
postgresql trace on and see where it is specnding most of the CPU.

Needless to say, did you tune shared buffers?

 Shridhar

Re: Really bad insert performance: what did I do wrong?

От
Andrew Sullivan
Дата:
On Fri, Feb 21, 2003 at 10:32:37AM -0500, Kevin White wrote:
> I've use PostgreSQL for some pretty amazing things, and was a proponent
> of using it here.  I set up 7.3.1 on my development RedHat 8 box, and it
> was fine.  I need to load about 700,000 rows into one table, the rows
> only having 6 columns, and the load on my box happens in just a couple
> of minutes (there's some calculation while the data is loading, and that
> time was acceptable to me).
>
> My box, however, isn't a production server, so I attempted to create the
> database again on a Sun Blade:
>
> SunOS trident 5.8 Generic_108528-17 sun4u sparc SUNW,UltraAX-i2
> Status of processor 0 as of: 02/21/03 10:10:10
>    Processor has been on-line since 01/13/03 13:53:51.
>    The sparcv9 processor operates at 500 MHz,
>          and has a sparcv9 floating point processor.
>
> It isn't the world's fastest box, but should be fast enough for this.


What's the disk subsystem?  Is fsync turned on in both cases?  And is
your IDE drive lying to you about what it's doing.

My experiences in moving from a Linux box to a low-end Sun is pretty
similar.  The problem usually turns out to be a combination of
overhead on fsync (which shows up as processor load instead of i/o,
oddly); and memory contention, especially in case there are too-large
numbers of shared buffers (on a 16 Gig box, we find that 2 Gig of
shared buffers is too large -- the shared memory management is
crummy).

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Really bad insert performance: what did I do wrong?

От
Kevin White
Дата:
Andrew Sullivan wrote:
> What's the disk subsystem?  Is fsync turned on in both cases?  And is
> your IDE drive lying to you about what it's doing.

It is IDE.  How do I turn fsync on or off?  (All I can find in the man
is a function call to fsync...is there something else?)

> My experiences in moving from a Linux box to a low-end Sun is pretty
> similar.  The problem usually turns out to be a combination of
> overhead on fsync (which shows up as processor load instead of i/o,
> oddly); and memory contention, especially in case there are too-large
> numbers of shared buffers

This box only has 1 gig, and I've only set up 200 shared buffers...at
this point, it is only me hitting it.  Increasing the shared buffers
might help, but I haven't yet found the info I need to do that
intelligently.

Shridhar Daithankar wrote:
 > First, check vmstat or similar on SunOS. See what is maxing out.
Second tunr
 > postgresql trace on and see where it is specnding most of the CPU.

Do you mean turning on the statistics generators, or gprof?

 > Needless to say, did you tune shared buffers?

Like I mentioned above, I haven't yet found good info on what to do to
actually tune shared buffers...I know how to change them, but that's
about it.  I'll poke around some more.


Tom Lane wrote:
 > You should be able to find details in the archives, but the key point
 > is to do
 >     cd .../src/backend
 >     gmake clean
 >     gmake PROFILE="-pg" all
 > to build a profile-enabled backend.  You may need a more complex
 > incantation than -pg on Solaris, but it works on other platforms.

I did this, but my gmon.out doesn't appear to have much data from the
actual child postgres process, just the parent.  I might be wrong, and
I'm letting some stats generate.

However, to everyone, I DID find a problem in my code that was making it
take super forever long....the code doesn't just insert.  It is also
written to do updates if it needs to, and because of what I'm doing, I
end up doing selects back against the table during the load to find
previously loaded rows.  In this case, there aren't any, because the
table's been trunced, but...having turned the indexes off, those selects
were really really slow.  Using the stats tools told me that.

So, that may have been a large part of my problem.  I'm still seeing the
process just SIT there, though, for seconds at a time, so there's
probably something else that I can fix.  Maybe I should just randomly
try a larger buffers setting...

Being able to analyze the output of gmon would be nice, but as I said
before, it doesn't appear to actually change much...

Right now, the load has been running continuously for several minutes.
It is 12:20pm, but the time on the gmon.out file is 12:18pm.  I should
be able to let the load finish while I'm at lunch, and I might be able
to get something out of gmon when it is done...maybe writing to gmon
just doesn't happen constantly.

Thanks all...

Kevin


Re: Really bad insert performance: what did I do wrong?

От
Andrew Sullivan
Дата:
On Fri, Feb 21, 2003 at 12:21:38PM -0500, Kevin White wrote:
> Andrew Sullivan wrote:
> > What's the disk subsystem?  Is fsync turned on in both cases?  And is
> > your IDE drive lying to you about what it's doing.
>
> It is IDE.  How do I turn fsync on or off?  (All I can find in the man
> is a function call to fsync...is there something else?)

By default, Postgres calls fsync() at every COMMIT.  Lots of IDE
drives lie about whether the fsync() succeeded, so you get better
performance than you do with SCSI drives; but you're not really
getting that performance, because the fsync isn't effectve.

On Linux, I think you can use hdparm to fix this.  I believe the
write caching is turned off under Solaris, but I'm not sure.

Anyway, you can adjust your postgresql.conf file to turn off fsync.

> This box only has 1 gig, and I've only set up 200 shared buffers...at
> this point, it is only me hitting it.  Increasing the shared buffers
> might help, but I haven't yet found the info I need to do that
> intelligently.

For stright inserts, it shouldn't matter, and that seems low enough
that it shouldn't be a problem.

You should put WAL on another disk, as well, if you can.

Also, try using truss to see what the backend is doing.  (truss -c
gives you a count and gives some time info.)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Really bad insert performance: what did I do wrong?

От
"scott.marlowe"
Дата:
OK, I'm gonna make a couple of observations here that may help out.

1:  sun's performance on IDE hardware is abysmal.  Both Solaris X86 and
Solaris Sparc are utter dogs at IDE, even when you do get DMA and prefetch
setup and running.  Linux or BSD are much much better at handling IDE
interfaces.

2:  Postgresql under Solaris on Sparc is about 1/2 as fast as Postgresql
under Linux on Sparc, all other things being equal.  On 32 bith Sparc the
chasm widens even more.

3:  Inserting ALL 700,000 rows in one transaction is probably not optimal.
Try putting a test in every 1,000 or 10,000 rows to toss a "commit;begin;"
pair at the database while loading.  Inserting all 700,000 rows at once
means postgresql can't recycle the transaction logs, so you'll have
700,000 rows worth of data in the transaction logs waiting for you to
commit at the end.  That's a fair bit of space, and a large set of files
to keep track of.  My experience has been that anything over 1,000 inserts
in a transaction gains little.

4:  If you want to make sure you don't insert any duplicates, it's
probably faster to use a unique multi-column key on all your columns
(there's a limit on columns in an index depending on which flavor of
postgresql you are running, but I think it's 16 on 7.2 and before and 32
on 7.3 and up.  I could be off by a factor of two there.



Re: Really bad insert performance: what did I do wrong?

От
Kevin White
Дата:
> 1:  sun's performance on IDE hardware is abysmal.

OK, good to know.  This box won't always be the production server: an
x86 Dell server with Linux will happen...thanks for the info.

> 2:  Postgresql under Solaris on Sparc is about 1/2 as fast as Postgresql
> under Linux on Sparc, all other things being equal.  On 32 bith Sparc the
> chasm widens even more.

Wow...

> 3:  Inserting ALL 700,000 rows in one transaction is probably not optimal.
> Try putting a test in every 1,000 or 10,000 rows to toss a "commit;begin;"
> pair at the database while loading.

My original test was 700,000 at once...for today's, I realized that was,
um, dumb :) so I fixed it...it commits every 1,000 now...

> 4:  If you want to make sure you don't insert any duplicates, it's
> probably faster to use a unique multi-column key on all your columns

The problem isn't inserting the dupes, but at times I need to update the
data, rather than load a new batch of it...and the rows have a "rank"
(by price)...so when one group of say 5 gets an updated row, it could
change the rank of the other 4 so all 5 need updated...so I have to do
the select first to find the other values so I can calculate the rank.

In THIS specific case, with the table empty, I don't need to do that,
but the code had been changed to do it, because normally, my table won't
be empty.  This is just the initial setup of a new server...

Thanks for all the help...

It looks like the load finished...I might try turning the sync off.

Kevin


Re: Really bad insert performance: what did I do wrong?

От
Tom Lane
Дата:
Kevin White <kwhite@digital-ics.com> writes:
> I did this, but my gmon.out doesn't appear to have much data from the
> actual child postgres process, just the parent.

Are you looking in the right place?  Child processes will dump gmon.out
into $PGDATA/base/yourdbnum/, which is not where the parent process
does.

            regards, tom lane

Re: Really bad insert performance: what did I do wrong?

От
Tom Lane
Дата:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> 3:  Inserting ALL 700,000 rows in one transaction is probably not optimal.
> Try putting a test in every 1,000 or 10,000 rows to toss a "commit;begin;"
> pair at the database while loading.  Inserting all 700,000 rows at once
> means postgresql can't recycle the transaction logs, so you'll have
> 700,000 rows worth of data in the transaction logs waiting for you to
> commit at the end.

That was true in 7.1.0, but we got rid of that behavior *very* quickly
(by 7.1.3, according to the release notes).  Long transactions do not
currently stress the WAL storage any more than the same amount of work
in short transactions.

Which is not to say that there's anything wrong with divvying the work
into 1000-row-or-so transactions.  I agree that that's enough to push
the per-transaction overhead down into the noise.

            regards, tom lane

Faster 'select count(*) from table' ?

От
Matt Mello
Дата:
Does anyone know if there is a fast way to find out how many records are
in a table?

"Select count(*) from table" is very slow.

I would think that PG would keep up with the number of undeleted rows on
a realtime basis.  Is that the case?  If so, how would I query it?

Hope this is the correct list for this question.

Thanks!

--
Matt Mello



Re: Faster 'select count(*) from table' ?

От
"scott.marlowe"
Дата:
On Tue, 25 Feb 2003, Matt Mello wrote:

> Does anyone know if there is a fast way to find out how many records are
> in a table?
>
> "Select count(*) from table" is very slow.
>
> I would think that PG would keep up with the number of undeleted rows on
> a realtime basis.  Is that the case?  If so, how would I query it?

Sorry, it doesn't, and it's one of the areas that having an MVCC style
database costs you.  Also, if postgresql kept up with this automatically,
it would have an overhead for each table, but how often do you use it on
ALL your tables?  Most the time, folks use count(*) on a few tables only,
and it would be a waste to have a seperate counting mechanism for all
tables when you'd only need it for a few.

The general mailing list has several postings in the last 12 months about
how to setup a trigger to a single row table that keeps the current
count(*) of the master table.

If you need a rough count, you can get one from the statistics gathered by
analyze in the pg_* tables.