Обсуждение: 10+hrs vs 15min because of just one index

Поиск
Список
Период
Сортировка

10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
So I'm trying to figure out how to optimize my PG install (8.0.3) to
get better performance without dropping one of my indexes.

Basically, I have a table of 5M records with 3 columns:

pri_key (SERIAL)
data char(48)
groupid integer

there is an additional unique index on the data column.

The problem is that when I update the groupid column for all the
records, the query takes over 10hrs (after that I just canceled the
update).  Looking at iostat, top, vmstat shows I'm horribly disk IO
bound (for data not WAL, CPU 85-90% iowait) and not swapping.

Dropping the unique index on data (which isn't used in the query),
running the update and recreating the index  runs in under 15 min.
Hence it's pretty clear to me that the index is the problem and
there's really nothing worth optimizing in my query.

As I understand from #postgresql, doing an UPDATE on one column causes
all indexes for the effected row to have to be updated due to the way
PG replaces the old row with a new one for updates.  This seems to
explain why dropping the unique index on data solves the performance
problem.

interesting settings:
shared_buffers = 32768
maintenance_work_mem = 262144
fsync = true
wal_sync_method = open_sync
wal_buffers = 512
checkpoint_segments = 30
effective_cache_size = 10000
work_mem = <default> (1024 i think?)

box:
Linux 2.6.9-11EL (CentOS 4.1)
2x Xeon 3.4 HT
2GB of RAM (but Apache and other services are running)
4 disk raid 10 (74G Raptor) for data
4 disk raid 10 (7200rpm) for WAL

other then throwing more spindles at the problem, any suggestions?

Thanks,
Aaron

--
Aaron Turner
http://synfin.net/

Re: 10+hrs vs 15min because of just one index

От
hubert depesz lubaczewski
Дата:
On 2/10/06, Aaron Turner <synfinatic@gmail.com> wrote:
> So I'm trying to figure out how to optimize my PG install (8.0.3) to
> get better performance without dropping one of my indexes.
> Basically, I have a table of 5M records with 3 columns:
> pri_key (SERIAL)
> data char(48)
> groupid integer
> there is an additional unique index on the data column.
> The problem is that when I update the groupid column for all the
> records, the query takes over 10hrs (after that I just canceled the
> update).  Looking at iostat, top, vmstat shows I'm horribly disk IO
> bound (for data not WAL, CPU 85-90% iowait) and not swapping.
> Dropping the unique index on data (which isn't used in the query),

for such a large update i would suggest to go with different scenario:
split update into packets (10000, or 50000 rows at the time)
and do:
update packet
vacuum table
for all packets. and then reindex the table. should work much nicer.

depesz

Re: 10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
On 2/10/06, hubert depesz lubaczewski <depesz@gmail.com> wrote:
> On 2/10/06, Aaron Turner <synfinatic@gmail.com> wrote:
> > So I'm trying to figure out how to optimize my PG install (8.0.3) to
> > get better performance without dropping one of my indexes.
> > Basically, I have a table of 5M records with 3 columns:
> > pri_key (SERIAL)
> > data char(48)
> > groupid integer
> > there is an additional unique index on the data column.
> > The problem is that when I update the groupid column for all the
> > records, the query takes over 10hrs (after that I just canceled the
> > update).  Looking at iostat, top, vmstat shows I'm horribly disk IO
> > bound (for data not WAL, CPU 85-90% iowait) and not swapping.
> > Dropping the unique index on data (which isn't used in the query),
>
> for such a large update i would suggest to go with different scenario:
> split update into packets (10000, or 50000 rows at the time)
> and do:
> update packet
> vacuum table
> for all packets. and then reindex the table. should work much nicer.

The problem is that all 5M records are being updated by a single
UPDATE statement, not 5M individual statements.   Also, vacuum can't
run inside of a transaction.

On a side note, is there any performance information on updating
indexes (via insert/update) over the size of the column?  Obviously,
char(48) is larger then most for indexing purposes, but I wonder if
performance drops linerally or exponentially as the column width
increases.  Right now my column is hexidecimal... if I stored it as a
binary representation it would be smaller.

Thanks,
Aaron

Re: 10+hrs vs 15min because of just one index

От
"Matthew T. O'Connor"
Дата:
Aaron Turner wrote:
> So I'm trying to figure out how to optimize my PG install (8.0.3) to
> get better performance without dropping one of my indexes.

What about something like this:

begin;
drop slow_index_name;
update;
create index slow_index_name;
commit;
vacuum;

Matt

Re: 10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
On 2/10/06, Matthew T. O'Connor <matthew@zeut.net> wrote:
> Aaron Turner wrote:
> > So I'm trying to figure out how to optimize my PG install (8.0.3) to
> > get better performance without dropping one of my indexes.
>
> What about something like this:
>
> begin;
> drop slow_index_name;
> update;
> create index slow_index_name;
> commit;
> vacuum;

Right.  That's exactly what I'm doing to get the update to occur in 15
minutes.  Unfortunately though, I'm basically at the point of every
time I insert/update into that table I have to drop the index which is
making my life very painful (having to de-dupe records in RAM in my
application is a lot faster but also more complicated/error prone).

Basically, I need some way to optimize PG so that I don't have to drop
that index every time.

Suggestions?

--
Aaron Turner
http://synfin.net/

Re: 10+hrs vs 15min because of just one index

От
"Jim C. Nasby"
Дата:
On Fri, Feb 10, 2006 at 09:24:39AM -0800, Aaron Turner wrote:
> On 2/10/06, Matthew T. O'Connor <matthew@zeut.net> wrote:
> > Aaron Turner wrote:
> > > So I'm trying to figure out how to optimize my PG install (8.0.3) to
> > > get better performance without dropping one of my indexes.
> >
> > What about something like this:
> >
> > begin;
> > drop slow_index_name;
> > update;
> > create index slow_index_name;
> > commit;
> > vacuum;
>
> Right.  That's exactly what I'm doing to get the update to occur in 15
> minutes.  Unfortunately though, I'm basically at the point of every
> time I insert/update into that table I have to drop the index which is
> making my life very painful (having to de-dupe records in RAM in my
> application is a lot faster but also more complicated/error prone).
>
> Basically, I need some way to optimize PG so that I don't have to drop
> that index every time.
>
> Suggestions?

I think you'll have a tough time making this faster; or I'm just not
understanding the problem well enough. It's probably time to start
thinking about re-architecting some things in the application so that
you don't have to do this.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: 10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
On 2/11/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> On Fri, Feb 10, 2006 at 09:24:39AM -0800, Aaron Turner wrote:
> > On 2/10/06, Matthew T. O'Connor <matthew@zeut.net> wrote:
> > > Aaron Turner wrote:
> >
> > Basically, I need some way to optimize PG so that I don't have to drop
> > that index every time.
> >
> > Suggestions?
>
> I think you'll have a tough time making this faster; or I'm just not
> understanding the problem well enough. It's probably time to start
> thinking about re-architecting some things in the application so that
> you don't have to do this.

Well before I go about re-architecting things, it would be good to
have a strong understanding of just what is going on.  Obviously, the
unique index on the char(48) is the killer.  What I don't know is:

1) Is this because the column is so long?
2) Is this because PG is not optimized for char(48) (maybe it wants
powers of 2? or doesn't like even numbers... I don't know, just
throwing it out there)
3) Is there some algorithm I can use to estimate relative UPDATE
speed?  Ie, if I cut the column length in 1/2 does that make it 50%
faster?
4) Does decoding the data (currently base64) and storing the binary
data improve the distribution of the index, thereby masking it more
efficent?

Obviously, one solution would be to store the column to be UPDATED in
a seperate joined table.  That would cost more disk space, and be more
complex, but it would be more efficient for updates (inserts would of
course be more expensive since now I have to do two).

--
Aaron Turner
http://synfin.net/

Re: 10+hrs vs 15min because of just one index

От
Tom Lane
Дата:
Aaron Turner <synfinatic@gmail.com> writes:
> Well before I go about re-architecting things, it would be good to
> have a strong understanding of just what is going on.  Obviously, the
> unique index on the char(48) is the killer.  What I don't know is:

You have another unique index on the integer primary key, so it's not
the mere fact of a unique index that's hurting you.

> 1) Is this because the column is so long?

Possibly.  Allowing for 12 bytes index-entry overhead, the char keys
would be 60 bytes vs 16 for the integer column, so this index is
physically almost 4x larger than the other.  You might say "but that
should only cause 4x more I/O" but it's not necessarily so.  What's
hard to tell is whether you are running out of RAM disk cache space,
resulting in re-reads of pages that could have stayed in memory when
dealing with one-fifth as much index data.  You did not show us the
iostat numbers for the two cases, but it'd be interesting to look at
the proportion of writes to reads on the data drive in both cases.

> 2) Is this because PG is not optimized for char(48) (maybe it wants
> powers of 2? or doesn't like even numbers... I don't know, just
> throwing it out there)

Are the key values really all 48 chars long?  If not, you made a
bad datatype choice: varchar(n) (or even text) would be a lot
smarter.  char(n) wastes space on blank-padding.

Another thing to think about is whether this is C locale or not.
String comparisons in non-C locales can be horrendously expensive
... though I'd expect that to cost CPU not I/O.  (Hmm ... is it
possible your libc is hitting locale config files constantly?
Might be worth strace'ing to confirm exactly where the I/O is
going.)

> 4) Does decoding the data (currently base64) and storing the binary
> data improve the distribution of the index, thereby masking it more
> efficent?

No, but it'd reduce the size of the index, which you certainly want.
Storing as bytea would also eliminate any questions about wasteful
locale-dependent comparisons.

The only one of these effects that looks to me like it could result in
worse-than-linear degradation of I/O demand is maxing out the available
RAM for disk cache.  So while improving the datatype choice would
probably be worth your while, you should first see if fooling with
shared_buffers helps, and if not it's time to buy RAM not disk.

            regards, tom lane

Re: 10+hrs vs 15min because of just one index

От
"Marc Morin"
Дата:
We've done a lot of testing on large DB's with a lot of "inserts" and
have a few comments.

The updates are "treated" as a large "insert" as we all know from pg's
point of view.

We've run into 2 classes of problems: excessing WAL checkpoints and
affects of low correlation.

WAL log write's full 8K block for first modification, then only changes.
This can be the source of "undesireable" behaviour during large batch
inserts like this.

From your config, a check point will be forced when

(checkpoint_segments * 16 M) < rows * (8K/N*h + (1-h)*8K) * B

Where h is the "hitrate" or correlation between the update scan and the
index. Do you have a sense of what this is?  In the limits, we have 100%
correlation or 0% correlation.  N is the lower cost of putting the
change in the WAL entry, not sure what this is, but small, I am
assuming, say N=100.  B is the average number of blocks changed per
updated row (assume B=1.1 for your case, heap,serial index have very
high correlation)

In the 0% correlation case, each updated row will cause the index update
to read/modify the block. The modified block will be entirely written to
the WAL log.  After (30 * 16M) / (8K) / 1.1 ~ 55k rows, a checkpoint
will be forced and all modified blocks in shared buffers will be written
out.

Increasing checkpoint_segments to 300 and seeing if that makes a
difference. If so, the excessive WAL checkpoints are your issue. If
performance is exactly the same, then I would assume that you have close
to 0% correlation between the rows in the heap and index.

Can you increase shared_buffers? With a low correlation index, the only
solution is to hold the working set of blocks in memory.  Also, make
sure that the checkpoint segments are big enough for you to modify them
in place, don't want checkpoints occurring....

Note that the more updates you do, the larger the tables/index become
and the worse the problem becomes.  Vacuuming the table is an "answer"
but unfortunately, it tends to decrease correlation from our
observations. :-(

From our observations, dropping index and rebuilding them is not always
practical, depends on your application; table will be exclusively locked
during the transaction due to drop index.

I haven't looked at pg's code for creating an index, but seriously
suspect it's doing an extern sort then insert into the index.  Such
operations would have 100% correlation from the index insert point of
view and the "sort" could be in memory or the tape variety (more
efficient i/o pattern).

Summary, # of indexes, index correlation, pg's multi versioning,
shared_buffers and checkpoint_segments are interconnected in weird and
wonderful ways... Seldom have found "simple" solutions to performance
problems.

Marc


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> Aaron Turner
> Sent: Friday, February 10, 2006 3:17 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] 10+hrs vs 15min because of just one index
>
> So I'm trying to figure out how to optimize my PG install
> (8.0.3) to get better performance without dropping one of my indexes.
>
> Basically, I have a table of 5M records with 3 columns:
>
> pri_key (SERIAL)
> data char(48)
> groupid integer
>
> there is an additional unique index on the data column.
>
> The problem is that when I update the groupid column for all
> the records, the query takes over 10hrs (after that I just
> canceled the update).  Looking at iostat, top, vmstat shows
> I'm horribly disk IO bound (for data not WAL, CPU 85-90%
> iowait) and not swapping.
>
> Dropping the unique index on data (which isn't used in the
> query), running the update and recreating the index  runs in
> under 15 min.
> Hence it's pretty clear to me that the index is the problem
> and there's really nothing worth optimizing in my query.
>
> As I understand from #postgresql, doing an UPDATE on one
> column causes all indexes for the effected row to have to be
> updated due to the way PG replaces the old row with a new one
> for updates.  This seems to explain why dropping the unique
> index on data solves the performance problem.
>
> interesting settings:
> shared_buffers = 32768
> maintenance_work_mem = 262144
> fsync = true
> wal_sync_method = open_sync
> wal_buffers = 512
> checkpoint_segments = 30
> effective_cache_size = 10000
> work_mem = <default> (1024 i think?)
>
> box:
> Linux 2.6.9-11EL (CentOS 4.1)
> 2x Xeon 3.4 HT
> 2GB of RAM (but Apache and other services are running)
> 4 disk raid 10 (74G Raptor) for data
> 4 disk raid 10 (7200rpm) for WAL
>
> other then throwing more spindles at the problem, any suggestions?
>
> Thanks,
> Aaron
>
> --
> Aaron Turner
> http://synfin.net/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: 10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
On 2/12/06, Marc Morin <marc@sandvine.com> wrote:
> From your config, a check point will be forced when
>
> (checkpoint_segments * 16 M) < rows * (8K/N*h + (1-h)*8K) * B
>
> Where h is the "hitrate" or correlation between the update scan and the
> index. Do you have a sense of what this is?

I know my checkpoints happen > 30 secs apart, since PG isn't
complaining in my log.  I have no clue what the correlation is.

> In the limits, we have 100%
> correlation or 0% correlation.  N is the lower cost of putting the
> change in the WAL entry, not sure what this is, but small, I am
> assuming, say N=100.  B is the average number of blocks changed per
> updated row (assume B=1.1 for your case, heap,serial index have very
> high correlation)
>
> In the 0% correlation case, each updated row will cause the index update
> to read/modify the block. The modified block will be entirely written to
> the WAL log.  After (30 * 16M) / (8K) / 1.1 ~ 55k rows, a checkpoint
> will be forced and all modified blocks in shared buffers will be written
> out.
>
> Increasing checkpoint_segments to 300 and seeing if that makes a
> difference. If so, the excessive WAL checkpoints are your issue. If
> performance is exactly the same, then I would assume that you have close
> to 0% correlation between the rows in the heap and index.

Ok, i'll have to give that a try.

> Can you increase shared_buffers? With a low correlation index, the only
> solution is to hold the working set of blocks in memory.  Also, make
> sure that the checkpoint segments are big enough for you to modify them
> in place, don't want checkpoints occurring....

I'll have to look at my memory usage on this server... with only 2GB
and a bunch of other processes running around I'm not sure if I can go
up much more without causing swapping.  Of course RAM is cheap...

> Note that the more updates you do, the larger the tables/index become
> and the worse the problem becomes.  Vacuuming the table is an "answer"
> but unfortunately, it tends to decrease correlation from our
> observations. :-(

Good to know.

> From our observations, dropping index and rebuilding them is not always
> practical, depends on your application; table will be exclusively locked
> during the transaction due to drop index.

Yep.  In my case it's not a huge problem right now, but I know it will
become a serious one sooner or later.

Thanks a lot Marc.  Lots of useful info.

--
Aaron Turner
http://synfin.net/

Re: 10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
On 2/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aaron Turner <synfinatic@gmail.com> writes:
> > Well before I go about re-architecting things, it would be good to
> > have a strong understanding of just what is going on.  Obviously, the
> > unique index on the char(48) is the killer.  What I don't know is:
>
> You have another unique index on the integer primary key, so it's not
> the mere fact of a unique index that's hurting you.

Understood.  I just wasn't sure if in general unique indexes are some
how more expensive then non-unique indexes.

> > 1) Is this because the column is so long?
>
> Possibly.  Allowing for 12 bytes index-entry overhead, the char keys
> would be 60 bytes vs 16 for the integer column, so this index is
> physically almost 4x larger than the other.  You might say "but that
> should only cause 4x more I/O" but it's not necessarily so.  What's
> hard to tell is whether you are running out of RAM disk cache space,
> resulting in re-reads of pages that could have stayed in memory when
> dealing with one-fifth as much index data.  You did not show us the
> iostat numbers for the two cases, but it'd be interesting to look at
> the proportion of writes to reads on the data drive in both cases.

Sounds a lot like what Marc mentioned.

> > 2) Is this because PG is not optimized for char(48) (maybe it wants
> > powers of 2? or doesn't like even numbers... I don't know, just
> > throwing it out there)
>
> Are the key values really all 48 chars long?  If not, you made a
> bad datatype choice: varchar(n) (or even text) would be a lot
> smarter.  char(n) wastes space on blank-padding.

Yep, everything exactly 48.   Looks like I'll be storing it as a bytea
in the near future though.

> The only one of these effects that looks to me like it could result in
> worse-than-linear degradation of I/O demand is maxing out the available
> RAM for disk cache.  So while improving the datatype choice would
> probably be worth your while, you should first see if fooling with
> shared_buffers helps, and if not it's time to buy RAM not disk.

Yeah, that's what it's beginning to sound like.  Thanks Tom.

--
Aaron Turner
http://synfin.net/

Re: 10+hrs vs 15min because of just one index

От
Markus Schaber
Дата:
Hi, Aaron,

Aaron Turner wrote:

> 4) Does decoding the data (currently base64) and storing the binary
> data improve the distribution of the index, thereby masking it more
> efficent?

Yes, but then you should not use varchar, but a bytea.

If your data is some numer internally, numeric or decimal may be even
better.

If most of your data is different in the first 8 bytes, it may also make
sense to duplicate them into a bigint, and create the bigint on them.
Then you can use AND in your query to test for the 8 bytes (uses index)
and the bytea. Ugly, but may work.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: 10+hrs vs 15min because of just one index

От
PFC
Дата:
>> Are the key values really all 48 chars long?  If not, you made a
>> bad datatype choice: varchar(n) (or even text) would be a lot
>> smarter.  char(n) wastes space on blank-padding.
>
> Yep, everything exactly 48.   Looks like I'll be storing it as a bytea
> in the near future though.

    It's a good idea not to bloat a column by base64 encoding it if you want
to index it. BYTEA should be your friend.
    If your values are not random, you might want to exploit the correlation.
But if they are already quite uncorrelated, and you don't need the index
for < >, just for =, you can create an index on the md5 of your column and
use it to search. It will use a lot less data but the data will be more
random. With a functional index, you don't need to modify your application
too much.

Re: 10+hrs vs 15min because of just one index

От
Aaron Turner
Дата:
Well just a little update:

1) Looks like I'm definately RAM constrained.   Just placed an order
for another 4GB.
2) I ended up dropping the primary key too which helped with disk
thrashing a lot (average disk queue wait was between 500ms and 8500ms
before and 250-500ms after)
3) Playing with most of the settings in the postgresql.conf actually
dropped performance significantly.  Looks like I'm starving the disk
cache.
4) I'm going to assume going to a bytea helped some (width is 54 vs
66) but nothing really measurable

Thanks everyone for your help!

--
Aaron Turner
http://synfin.net/