Обсуждение: COPY into table too slow with index
I’m running postgresql 8.1.0 with postgis 1.0.4 on a FC3 system, 3Ghz, 1 GB memory.
I am using COPY to fill a table that contains one postgis geometry column.
With no geometry index, it takes about 45 seconds to COPY one file.
If I add a geometry index, this time degrades. It keeps getting worse as more records are
added to the table. It was up to over three minutes per file on my most recent test.
The problem is that each file contains about 5 – 10 minutes of data. Eventually, I want to
add the data to the table in “real time”. So the COPY needs to take less time than
actually generating the data.
Here is the relevant section of my postgresql.conf.
# - Memory -
shared_buffers = 5000 # min 16 or max_connections*2, 8KB each
#temp_buffers = 1000 # min 100, 8KB each
#max_prepared_transactions = 5 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 20000 # min 64, size in KB
maintenance_work_mem = 20000 # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
Any suggestions for improvement?
As a follow up to my own question:
I reran the COPY both ways (with the index and without) while running iostat. The following values
are averages:
%user %nice %sys %iowait %idle
no index 39 0 2.8 11 47
index 16 1.5 2.1 34 46
I’m no performance guru, so please indulge a couple of silly questions:
1) Why is there so much idle time? I would think the CPU would either be busy or waiting for IO.
2) It seems that I need to improve my disk situation. Would it help to add another drive to my PC and
keep the input data on a separate drive from my pg tables? If so, some pointers on the best way to set that up
would be appreciated.
Please let me know if anyone has additional ideas.
-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Rick Schumeyer
Sent: Thursday, December 01, 2005 12:58 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] COPY into table too slow with index
I’m running postgresql 8.1.0 with postgis 1.0.4 on a FC3 system, 3Ghz, 1 GB memory.
I am using COPY to fill a table that contains one postgis geometry column.
With no geometry index, it takes about 45 seconds to COPY one file.
If I add a geometry index, this time degrades. It keeps getting worse as more records are
added to the table. It was up to over three minutes per file on my most recent test.
The problem is that each file contains about 5 – 10 minutes of data. Eventually, I want to
add the data to the table in “real time”. So the COPY needs to take less time than
actually generating the data.
Here is the relevant section of my postgresql.conf.
# - Memory -
shared_buffers = 5000 # min 16 or max_connections*2, 8KB each
#temp_buffers = 1000 # min 100, 8KB each
#max_prepared_transactions = 5 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 20000 # min 64, size in KB
maintenance_work_mem = 20000 # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
Any suggestions for improvement?
Rick, On 12/1/05 2:18 PM, "Rick Schumeyer" <rschumeyer@ieee.org> wrote: > As a follow up to my own question: > > I reran the COPY both ways (with the index and without) while running iostat. > The following values > are averages: > %user %nice %sys %iowait %idle > no index 39 0 2.8 11 47 > index 16 1.5 2.1 34 46 > > I¹m no performance guru, so please indulge a couple of silly questions: > > 1) Why is there so much idle time? I would think the CPU would either be > busy or waiting for IO. The 100% represents 2 CPUs. When one CPU is fully busy you should see 50% idle time. > 2) It seems that I need to improve my disk situation. Would it help to > add another drive to my PC and > keep the input data on a separate drive from my pg tables? If so, some > pointers on the best way to set that up > would be appreciated. Putting the index and the table on separate disks will fix this IMO. I think you can do that using the "TABLESPACE" concept for each. The problem I see is nicely shown by the increase in IOWAIT between the two patterns (with and without index). It seems likely that the pattern is: A - insert a tuple into the table B - insert an entry into the index C - fsync the WAL - repeat This can be as bad as having a disk seek to access the table data every time the 8KB page boundary is crossed, then again for the index, then again for the WAL, and random disk seeks happen only as fast as about 10ms, so you can only do those at a rate of 100/s. > Please let me know if anyone has additional ideas. This is a fairly common problem, some people drop the index, load the data, then recreate the index to get around it. - Luke
"Luke Lonergan" <llonergan@greenplum.com> writes:
> The problem I see is nicely shown by the increase in IOWAIT between the two
> patterns (with and without index).  It seems likely that the pattern is:
> A - insert a tuple into the table
> B - insert an entry into the index
> C - fsync the WAL
> - repeat
> This can be as bad as having a disk seek to access the table data every time
> the 8KB page boundary is crossed, then again for the index, then again for
> the WAL, and random disk seeks happen only as fast as about 10ms, so you can
> only do those at a rate of 100/s.
That analysis is far too simplistic, because only the WAL write has to
happen before the transaction can commit.  The table and index writes
will normally happen at some later point in the bgwriter, and with any
luck there will only need to be one write per page, not per tuple.
It is true that having WAL and data on the same spindle is bad news,
because the disk head has to divide its time between synchronous WAL
writes and asynchronous writes of the rest of the files.
            regards, tom lane
			
		I only have one CPU. Is my copy of iostat confused, or does this have something to do with hyperthreading or dual core? (AFAIK, I don't have a dual core!) The problem (for me) with dropping the index during a copy is that it takes tens of minutes (or more) to recreate the geometry index once the table has, say, 50 million rows. > -----Original Message----- > From: Luke Lonergan [mailto:llonergan@greenplum.com] > Sent: Thursday, December 01, 2005 9:27 PM > To: Rick Schumeyer; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] COPY into table too slow with index: now an I/O > question > > Rick, > > On 12/1/05 2:18 PM, "Rick Schumeyer" <rschumeyer@ieee.org> wrote: > > > As a follow up to my own question: > > > > I reran the COPY both ways (with the index and without) while running > iostat. > > The following values > > are averages: > > %user %nice %sys %iowait %idle > > no index 39 0 2.8 11 47 > > index 16 1.5 2.1 34 46 > > > > I¹m no performance guru, so please indulge a couple of silly questions: > > > > 1) Why is there so much idle time? I would think the CPU would > either be > > busy or waiting for IO. > > The 100% represents 2 CPUs. When one CPU is fully busy you should see 50% > idle time. > > > 2) It seems that I need to improve my disk situation. Would it > help to > > add another drive to my PC and > > keep the input data on a separate drive from my pg tables? If so, some > > pointers on the best way to set that up > > would be appreciated. > > Putting the index and the table on separate disks will fix this IMO. I > think you can do that using the "TABLESPACE" concept for each. > > The problem I see is nicely shown by the increase in IOWAIT between the > two > patterns (with and without index). It seems likely that the pattern is: > A - insert a tuple into the table > B - insert an entry into the index > C - fsync the WAL > - repeat > > This can be as bad as having a disk seek to access the table data every > time > the 8KB page boundary is crossed, then again for the index, then again for > the WAL, and random disk seeks happen only as fast as about 10ms, so you > can > only do those at a rate of 100/s. > > > Please let me know if anyone has additional ideas. > > This is a fairly common problem, some people drop the index, load the > data, > then recreate the index to get around it. > > - Luke