Обсуждение: MVCC performance issue

Поиск

Список

Период

Сортировка

MVCC performance issue

От

"Kyriacos Kyriacou"

Дата:

12 ноября 2010 г., 09:47:39

This is my first post in this mailing list and I would like to raise an
issue that in my opinion is causing performance issues of PostgreSQL
especially in a transaction processing environment. In my company we are
using PostgreSQL for the last 8 year for our in-house developed billing
system (telecom). The last few months we started considering moving to
another RDBMS just because of this issue.

After all these years, I believe that the biggest improvement that could
be done and will boost overall performance especially for enterprise
application will be to improve Multiversion Concurrency Control (MVCC)
mechanism. In theory this seems to be improving performance for SELECT
queries but on tables with very intensive and frequent updates, even
that is not fully true because of the fragmentation of data caused by
MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
as a buffer) took more than 40min to return a result! VACUUM is not a
solution in my opinion even though after the introduction of autovacuum
daemon situation got much better.

PROBLEM DECRIPTION
------------------
By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
new copy of the row in a new location. Any SELECT queries within the
same session are accessing the new version of the raw and all other
queries from other users are still accessing the old version. When
transaction is COMMIT PostgreSQL makes the a new version of the row as
the "active" row and expires the old row that remains "dead" and then is
up to VACUUM procedure to recover the "dead" rows space and make it
available to the database engine. In case that transaction is ROLLBACK
then space reserved for the new version of the row is released. The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates. As an example, consider updating the "live" balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value!

SUGGESTION
--------------
1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space
   or in a reserved space at the end of each table (can be allocated
dynamically) etc
2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version
4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location
5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).
6) Similar mechanism can be applied on INSERTS and DELETES
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step.
8) After that VACUUM will have to deal only with deletions!!!


I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware.

I strongly believe that the outcome of the discussion regarding this
issue will be helpful.

Best Regards,

Kyriacos Kyriacou
Senior Developer/DBA

Re: MVCC performance issue

От

Kenneth Marshall

Дата:

12 ноября 2010 г., 09:52:46

On Fri, Nov 12, 2010 at 03:47:30PM +0200, Kyriacos Kyriacou wrote:
> This is my first post in this mailing list and I would like to raise an
> issue that in my opinion is causing performance issues of PostgreSQL
> especially in a transaction processing environment. In my company we are
> using PostgreSQL for the last 8 year for our in-house developed billing
> system (telecom). The last few months we started considering moving to
> another RDBMS just because of this issue.
>
> After all these years, I believe that the biggest improvement that could
> be done and will boost overall performance especially for enterprise
> application will be to improve Multiversion Concurrency Control (MVCC)
> mechanism. In theory this seems to be improving performance for SELECT
> queries but on tables with very intensive and frequent updates, even
> that is not fully true because of the fragmentation of data caused by
> MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
> as a buffer) took more than 40min to return a result! VACUUM is not a
> solution in my opinion even though after the introduction of autovacuum
> daemon situation got much better.
>
> PROBLEM DECRIPTION
> ------------------
> By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
> new copy of the row in a new location. Any SELECT queries within the
> same session are accessing the new version of the raw and all other
> queries from other users are still accessing the old version. When
> transaction is COMMIT PostgreSQL makes the a new version of the row as
> the "active" row and expires the old row that remains "dead" and then is
> up to VACUUM procedure to recover the "dead" rows space and make it
> available to the database engine. In case that transaction is ROLLBACK
> then space reserved for the new version of the row is released. The
> result is to have huge fragmentation on table space, unnecessary updates
> in all affected indexes, unnecessary costly I/O operations, poor
> performance on SELECT that retrieves big record sets (i.e. reports etc)
> and slower updates. As an example, consider updating the "live" balance
> of a customer for each phone call where the entire customer record has
> to be duplicated again and again upon each call just for modifying a
> numeric value!
>
> SUGGESTION
> --------------
> 1) When a raw UPDATE is performed, store all "new raw versions" either
> in separate temporary table space
>    or in a reserved space at the end of each table (can be allocated
> dynamically) etc
> 2) Any SELECT queries within the same session will be again accessing
> the new version of the row
> 3) Any SELECT queries from other users will still be accessing the old
> version
> 4) When UPDATE transaction is ROLLBACK just release the space used in
> new temporary location
> 5) When UPDATE transaction is COMMIT then try to LOCK the old version
> and overwrite it at the same physical location (NO FRAGMENTATION).
> 6) Similar mechanism can be applied on INSERTS and DELETES
> 7) In case that transaction was COMMIT, the temporary location can be
> either released or archived/cleaned on a pre-scheduled basis. This will
> possibly allow the introduction of a TRANSACTION LOG backup mechanism as
> a next step.
> 8) After that VACUUM will have to deal only with deletions!!!
>
>
> I understand that my suggestion seems to be too simplified and also that
> there are many implementation details and difficulties that I am not
> aware.
>
> I strongly believe that the outcome of the discussion regarding this
> issue will be helpful.
>
> Best Regards,
>
> Kyriacos Kyriacou
> Senior Developer/DBA
>

I cannot speak to your suggestion, but it sounds like you are not
vacuuming enough and a lot of the bloat/randomization would be helped
by making use of HOT updates in which the updates are all in the same
page and are reclaimed almost immediately.

Regards,
Ken

Re: MVCC performance issue

От

Thom Brown

Дата:

12 ноября 2010 г., 09:55:42

On 12 November 2010 13:47, Kyriacos Kyriacou <kyriacosk@prime-tel.com> wrote:

This is my first post in this mailing list and I would like to raise an
issue that in my opinion is causing performance issues of PostgreSQL
especially in a transaction processing environment. In my company we are
using PostgreSQL for the last 8 year for our in-house developed billing
system (telecom). The last few months we started considering moving to
another RDBMS just because of this issue.

After all these years, I believe that the biggest improvement that could
be done and will boost overall performance especially for enterprise
application will be to improve Multiversion Concurrency Control (MVCC)
mechanism. In theory this seems to be improving performance for SELECT
queries but on tables with very intensive and frequent updates, even
that is not fully true because of the fragmentation of data caused by
MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
as a buffer) took more than 40min to return a result! VACUUM is not a
solution in my opinion even though after the introduction of autovacuum
daemon situation got much better.

PROBLEM DECRIPTION
------------------
By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
new copy of the row in a new location. Any SELECT queries within the
same session are accessing the new version of the raw and all other
queries from other users are still accessing the old version. When
transaction is COMMIT PostgreSQL makes the a new version of the row as
the "active" row and expires the old row that remains "dead" and then is
up to VACUUM procedure to recover the "dead" rows space and make it
available to the database engine. In case that transaction is ROLLBACK
then space reserved for the new version of the row is released. The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates. As an example, consider updating the "live" balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value!

SUGGESTION
--------------
1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space
or in a reserved space at the end of each table (can be allocated
dynamically) etc
2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version
4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location
5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).
6) Similar mechanism can be applied on INSERTS and DELETES
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step.
8) After that VACUUM will have to deal only with deletions!!!

I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware.

I strongly believe that the outcome of the discussion regarding this
issue will be helpful.

Which version of PostgreSQL are you basing this on?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: MVCC performance issue

От

bricklen

Дата:

12 ноября 2010 г., 11:34:46

On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall <ktm@rice.edu> wrote:
>
> I cannot speak to your suggestion, but it sounds like you are not
> vacuuming enough and a lot of the bloat/randomization would be helped
> by making use of HOT updates in which the updates are all in the same
> page and are reclaimed almost immediately.
>
> Regards,
> Ken

IIRC, HOT only operates on non-indexed columns, so if you the tables
are heavily indexed you won't get the full benefit of HOT. I could be
wrong though.

Re: MVCC performance issue

От

Kenneth Marshall

Дата:

12 ноября 2010 г., 11:37:17

On Fri, Nov 12, 2010 at 07:34:36AM -0800, bricklen wrote:
> On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall <ktm@rice.edu> wrote:
> >
> > I cannot speak to your suggestion, but it sounds like you are not
> > vacuuming enough and a lot of the bloat/randomization would be helped
> > by making use of HOT updates in which the updates are all in the same
> > page and are reclaimed almost immediately.
> >
> > Regards,
> > Ken
>
> IIRC, HOT only operates on non-indexed columns, so if you the tables
> are heavily indexed you won't get the full benefit of HOT. I could be
> wrong though.
>

That is true, but if they are truly having as big a bloat problem
as the message indicated, it would be worth designing the schema
to leverage HOT for the very frequent updates.

Cheers,
Ken

Re: MVCC performance issue

От

Vitalii Tymchyshyn

Дата:

12 ноября 2010 г., 11:53:47

12.11.10 15:47, Kyriacos Kyriacou написав(ла):
> PROBLEM DECRIPTION
> ------------------
> As an example, consider updating the "live" balance
> of a customer for each phone call where the entire customer record has
> to be duplicated again and again upon each call just for modifying a
> numeric value!
>
Have you considered splitting customer record into two tables with
mostly read-only data and with data that is updated often? Such 1-1
relationship can make a huge difference to performance in your case. You
can even try to simulate old schema by using an updateable view.

Best regards, Vitalii Tymchyshyn

Re: MVCC performance issue

От

"Kyriacos Kyriacou"

Дата:

12 ноября 2010 г., 12:14:16

We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision!

I have it in my plans so in next few months I will setup new servers and upgrade to version 9.

>> Which version of PostgreSQL are you basing this on?

>>
>>--
>>Thom Brown
>>Twitter: @darkixion
>>IRC (freenode): dark_ixion
>>Registered Linux user: #516935

Re: MVCC performance issue

От

"Kyriacos Kyriacou"

Дата:

12 ноября 2010 г., 12:18:27

This was done already as a workaround after identifying this problem. 
I just gave it as an example.

-----Original Message-----
From: Vitalii Tymchyshyn [mailto:tivv00@gmail.com] 
Sent: Friday, November 12, 2010 5:54 PM
To: Kyriacos Kyriacou
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] MVCC performance issue

12.11.10 15:47, Kyriacos Kyriacou написав(ла):
> PROBLEM DECRIPTION
> ------------------
> As an example, consider updating the "live" balance
> of a customer for each phone call where the entire customer record has
> to be duplicated again and again upon each call just for modifying a
> numeric value!
>    
Have you considered splitting customer record into two tables with 
mostly read-only data and with data that is updated often? Such 1-1 
relationship can make a huge difference to performance in your case. You 
can even try to simulate old schema by using an updateable view.

Best regards, Vitalii Tymchyshyn

Re: MVCC performance issue

От

Andy Colson

Дата:

12 ноября 2010 г., 12:24:57

On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote:
>
> SUGGESTION
> --------------
> 1) When a raw UPDATE is performed, store all "new raw versions" either
> in separate temporary table space
>     or in a reserved space at the end of each table (can be allocated
> dynamically) etc

Your use of "raw" is confusing.  I'll just ignore the word.  New row
versions are already stored in a dynamically allocated spot, right along
with the other versions of the table.  You are assuming that getting to
the "correct" version of the row is very slow?  That's only going to be
the case if you have lots and lots of versions.  And your solution will
not actually help if there are lots of versions.  While one person who
is hitting the most recent version might be ok, everyone else will still
have to search for theirs.  Just as they do now.

> 2) Any SELECT queries within the same session will be again accessing
> the new version of the row

I don't see how this is different from what we currently have.  "same
session" could have been dropped from your separate table space, and
then you'd have to go search through previous versions of the row...
exactly like you do now.

And worse, if you dont want to drop your version of the row from the
separate table space until you commit/rollback, then no other user can
start a transaction on that table until your done!  oh no!  You have
reads and writes blocking each other.

> 3) Any SELECT queries from other users will still be accessing the old
> version

Again.. the same.

> 4) When UPDATE transaction is ROLLBACK just release the space used in
> new temporary location

current layout makes rollback very very fast.

> 5) When UPDATE transaction is COMMIT then try to LOCK the old version
> and overwrite it at the same physical location (NO FRAGMENTATION).

Not sure what you mean by lock, but lock requires single user access and
slow's things down.  Right now we just bump the "most active transaction
number", which is very efficient, and requires no locks.  As soon as you
lock anything, somebody, by definition, has to wait.

> 6) Similar mechanism can be applied on INSERTS and DELETES
> 7) In case that transaction was COMMIT, the temporary location can be
> either released or archived/cleaned on a pre-scheduled basis. This will
> possibly allow the introduction of a TRANSACTION LOG backup mechanism as
> a next step.

You are kind of assuming there will only ever be one new transaction,
and one old transaction.  What about a case where 10 people start a
transaction, and there are 10 versions of the row?

It seems to me like you are using very long transactions, which is
causing lots of row versions to show up.  Have you run explain analyze
on your slow querys to find out the problems?

Have you checked to see if you are cpu bound or io bound?  If you are
dealing with lots of row versions, I'd assume you are cpu bound.  If you
check your system though, and see you are io bound, I think that might
invalidate your assumptions above.

MVCC makes multi user access very nice because readers and writers dont
block each other, and there are very few locks.  It does come with some
kinks (gotta vacuum, keep transactions short, you must commit, etc).

select count(*) for example is always going to be slow... just expect
it, lets not destroy what works well about the database just to make it
fast.  Instead, find a better alternative so you dont have to run it.

Just like any database, you have to work within MVCC's good points and
try to avoid the bad spots.

-Andy

Re: MVCC performance issue

От

Kenneth Marshall

Дата:

12 ноября 2010 г., 12:27:15

Ah, this is a very old version. If you can take advantage of
a version with HOT support, you should be much, much happier.

Cheers,
Ken

On Fri, Nov 12, 2010 at 06:14:00PM +0200, Kyriacos Kyriacou wrote:
>
>
> We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
> database size is over 200Gb so upgrade is not an easy decision!
>
> I have it in my plans so in next few months I will setup new servers and
> upgrade to version 9.
>
>
> >> Which version of PostgreSQL are you basing this on?
>
> >>
> >>--
> >>Thom Brown
> >>Twitter: @darkixion
> >>IRC (freenode): dark_ixion
> >>Registered Linux user: #516935
>

Re: MVCC performance issue

От

Thom Brown

Дата:

12 ноября 2010 г., 12:30:19

On 12 November 2010 16:14, Kyriacos Kyriacou <kyriacosk@prime-tel.com> wrote:

We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision!
I have it in my plans so in next few months I will setup new servers and upgrade to version 9.

Everything changed, performance-wise, in 8.3, and there have also been improvements since then too. So rather than completely changing your database platform, at least take a look at what work has gone into Postgres since the version you're using. http://www.postgresql.org/docs/8.3/static/release-8-3.html#AEN87319

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: MVCC performance issue

От

Ben Chobot

Дата:

12 ноября 2010 г., 12:41:29

On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote:

We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision!

This is why we have slony, so you can slowly upgrade your 200Gb while you're live and then only suffer a minute or so of downtime while you switchover. Even if you only install slony for the point of the upgrade and then uninstall it after you're done, that seems well worth it to me rather than running on 8.2.4 for a while.

Note there were some changes between 8.2 and 8.3 in regards to casting that might make you revisit your application.

Re: MVCC performance issue

От

Tom Lane

Дата:

12 ноября 2010 г., 12:41:46

"Kyriacos Kyriacou" <kyriacosk@prime-tel.com> writes:
> We are still using PostgreSQL 8.2.4.

In that case you don't have HOT updates, so it seems to me to be a
little premature to be proposing a 100% rewrite of the system to fix
your problems.

            regards, tom lane

Re: MVCC performance issue

От

"Kyriacos Kyriacou"

Дата:

12 ноября 2010 г., 12:43:27

To be honest I just now read about HOT (Heap Overflow Tuple) and it
seems that will help a lot. Thanks for your point.

Kyriacos

-----Original Message-----
From: Kenneth Marshall [mailto:ktm@rice.edu]
Sent: Friday, November 12, 2010 6:22 PM
To: Kyriacos Kyriacou
Cc: Thom Brown; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] MVCC performance issue

Ah, this is a very old version. If you can take advantage of
a version with HOT support, you should be much, much happier.

Cheers,
Ken

On Fri, Nov 12, 2010 at 06:14:00PM +0200, Kyriacos Kyriacou wrote:
>
>
> We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
> database size is over 200Gb so upgrade is not an easy decision!
>
> I have it in my plans so in next few months I will setup new servers
and
> upgrade to version 9.
>
>
> >> Which version of PostgreSQL are you basing this on?
>
> >>
> >>--
> >>Thom Brown
> >>Twitter: @darkixion
> >>IRC (freenode): dark_ixion
> >>Registered Linux user: #516935
>

Re: MVCC performance issue

От

"Kyriacos Kyriacou"

Дата:

12 ноября 2010 г., 13:13:26

My suggestion had just a single difference from what currently MVCC is
doing (btw I never said that MVCC is bad).

NOW ===> on COMMIT previous version record is expired and the
         new version record (created in new dynamically allocated
         spot, as you said) is set as "active"

MY  ===> on COMMIT, to update new version data over the same physical
location that initial version was
         and release the space used to keep the new version (that was
dynamically allocated).

The rest are all the same! I do not think that this is breaking anything
and I still believe that this might help.

I will try to plan upgrade the soonest possible to the newest version.
Reading few words about HOT updates
it seems that more or less is similar to what I have described and will
be very helpful.

Kyriacos

> -----Original Message-----
> From: Andy Colson [mailto:andy@squeakycode.net]
> Sent: Friday, November 12, 2010 6:22 PM
> To: Kyriacos Kyriacou
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] MVCC performance issue
>
> On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote:
> >
> > SUGGESTION
> > --------------
> > 1) When a raw UPDATE is performed, store all "new raw versions"
either
> > in separate temporary table space
> >     or in a reserved space at the end of each table (can be
allocated
> > dynamically) etc
>
> Your use of "raw" is confusing.  I'll just ignore the word.  New row
> versions are already stored in a dynamically allocated spot, right
along
> with the other versions of the table.  You are assuming that getting
to
> the "correct" version of the row is very slow?  That's only going to
be
> the case if you have lots and lots of versions.  And your solution
will
> not actually help if there are lots of versions.  While one person who
> is hitting the most recent version might be ok, everyone else will
still
> have to search for theirs.  Just as they do now.
>
> > 2) Any SELECT queries within the same session will be again
accessing
> > the new version of the row
>
> I don't see how this is different from what we currently have.  "same
> session" could have been dropped from your separate table space, and
> then you'd have to go search through previous versions of the row...
> exactly like you do now.
>
> And worse, if you dont want to drop your version of the row from the
> separate table space until you commit/rollback, then no other user can
> start a transaction on that table until your done!  oh no!  You have
> reads and writes blocking each other.
>
> > 3) Any SELECT queries from other users will still be accessing the
old
> > version
>
> Again.. the same.
>
> > 4) When UPDATE transaction is ROLLBACK just release the space used
in
> > new temporary location
>
> current layout makes rollback very very fast.
>
> > 5) When UPDATE transaction is COMMIT then try to LOCK the old
version
> > and overwrite it at the same physical location (NO FRAGMENTATION).
>
> Not sure what you mean by lock, but lock requires single user access
and
> slow's things down.  Right now we just bump the "most active
transaction
> number", which is very efficient, and requires no locks.  As soon as
you
> lock anything, somebody, by definition, has to wait.
>
>
> > 6) Similar mechanism can be applied on INSERTS and DELETES
> > 7) In case that transaction was COMMIT, the temporary location can
be
> > either released or archived/cleaned on a pre-scheduled basis. This
will
> > possibly allow the introduction of a TRANSACTION LOG backup
mechanism as
> > a next step.
>
> You are kind of assuming there will only ever be one new transaction,
> and one old transaction.  What about a case where 10 people start a
> transaction, and there are 10 versions of the row?
>
>
> It seems to me like you are using very long transactions, which is
> causing lots of row versions to show up.  Have you run explain analyze
> on your slow querys to find out the problems?
>
> Have you checked to see if you are cpu bound or io bound?  If you are
> dealing with lots of row versions, I'd assume you are cpu bound.  If
you
> check your system though, and see you are io bound, I think that might
> invalidate your assumptions above.
>
> MVCC makes multi user access very nice because readers and writers
dont
> block each other, and there are very few locks.  It does come with
some
> kinks (gotta vacuum, keep transactions short, you must commit, etc).
>
> select count(*) for example is always going to be slow... just expect
> it, lets not destroy what works well about the database just to make
it
> fast.  Instead, find a better alternative so you dont have to run it.
>
> Just like any database, you have to work within MVCC's good points and
> try to avoid the bad spots.
>
> -Andy

Re: MVCC performance issue

От

Scott Marlowe

Дата:

12 ноября 2010 г., 13:27:24

On Fri, Nov 12, 2010 at 9:22 AM, Thom Brown <thom@linux.com> wrote:
> On 12 November 2010 16:14, Kyriacos Kyriacou <kyriacosk@prime-tel.com>
> wrote:
>>
>>
>>
>> We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
>> database size is over 200Gb so upgrade is not an easy decision!
>>
>> I have it in my plans so in next few months I will setup new servers and
>> upgrade to version 9.
>
> Everything changed, performance-wise, in 8.3, and there have also been
> improvements since then too.  So rather than completely changing your
> database platform, at least take a look at what work has gone into Postgres
> since the version you're using.

Agreed.  8.3 was a colossal step forward for pg performance.  8.4 was
a huge step ahead in maintenance with on disk fsm.  If I was upgrading
from 8.2 today I would go straight to 8.4 and skip 8.3 since it's a
much bigger pain in the butt to configure for fsm stuff.

Re: MVCC performance issue

От

Scott Marlowe

Дата:

12 ноября 2010 г., 13:40:07

OK, in general you have to pay for MVCC one way or another.  Many
databases make you pay as you go, so to speak, by storing all the MVCC
info in a log to be applied at some future date.  Other databases you
can pay later, by storing all the MVCC in the table itself.  Both have
similar costs, but one can punish you harshly if you let the MVCC data
stored in the database get out of hand.

8.3 and above are much more aggresive about autovacuuming, and on
bigger hardware you can make it VERY aggressive and keep the bloat out
while keeping up good throughput.  On some servers I set up 4 or 6 or
8 autovacuum threads to keep up.    If you were on another db you
might be adding more drives to make some other part faster.

For batch processing storing all MVCC data in the data store can be
problematic, but for more normal work where you're changing <1% of a
table all the time it can be very fast.

Some other databases will just run out of space to store transactions
and roll back everything you've done.  PostgreSQL will gladly let you
shoot yourself in the foot with bloating the data store by running
successive whole table updates without vacuuming in between.

Bottom line, if your hardware can't keep up, it can't keep up.  If
vacuum capsizes your IO and still can't keep up then you need more
disks and / or better storage subsystems.  A 32 disk array with single
controller goes for ~$7 to $10k, and you can sustain some pretty
amazing thgouhput on that kind of IO subsystem.

If you're doing batch processing you can get a lot return by just
making sure you vacuum after each mass update.  Especially if you are
on a single use machine with no cost delays for vacuum, running a
vacuum on a freshly worked table should be pretty fast.

Re: MVCC performance issue

От

Scott Marlowe

Дата:

12 ноября 2010 г., 13:48:49

On Fri, Nov 12, 2010 at 9:19 AM, Ben Chobot <bench@silentmedia.com> wrote:
> On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote:
>
> We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
> database size is over 200Gb so upgrade is not an easy decision!
>
> This is why we have slony, so you can slowly upgrade your 200Gb while you're
> live and then only suffer a minute or so of downtime while you switchover.
> Even if you only install slony for the point of the upgrade and then
> uninstall it after you're done, that seems well worth it to me rather than
> running on 8.2.4 for a while.
> Note there were some changes between 8.2 and 8.3 in regards to casting that
> might make you revisit your application.

I work in a slony shop and we used slony to upgrade from 8.2 to 8.3
and it was a breeze.  Course we practiced on some test machines first,
but it went really smoothly.  Our total downtime, due to necessary
testing before going live again, was less than 20 mintues.

Re: MVCC performance issue

От

Scott Carey

Дата:

12 ноября 2010 г., 14:12:06

HOT also usually requires setting FILLFACTOR to something other than the default for your table, so that there is
guaranteedroom in the page to modify data without allocating a new page. 

If you have fillfactor=75, then basically this proposal is already done -- each page has 25% temp space for updates in
it. With the caveat that that is only true if the updates are to columns without indexes. 
On Nov 12, 2010, at 7:37 AM, Kenneth Marshall wrote:

> On Fri, Nov 12, 2010 at 07:34:36AM -0800, bricklen wrote:
>> On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall <ktm@rice.edu> wrote:
>>>
>>> I cannot speak to your suggestion, but it sounds like you are not
>>> vacuuming enough and a lot of the bloat/randomization would be helped
>>> by making use of HOT updates in which the updates are all in the same
>>> page and are reclaimed almost immediately.
>>>
>>> Regards,
>>> Ken
>>
>> IIRC, HOT only operates on non-indexed columns, so if you the tables
>> are heavily indexed you won't get the full benefit of HOT. I could be
>> wrong though.
>>
>
> That is true, but if they are truly having as big a bloat problem
> as the message indicated, it would be worth designing the schema
> to leverage HOT for the very frequent updates.
>
> Cheers,
> Ken
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: MVCC performance issue

От

Scott Carey

Дата:

12 ноября 2010 г., 14:18:42

On Nov 12, 2010, at 9:13 AM, Kyriacos Kyriacou wrote:

> My suggestion had just a single difference from what currently MVCC is
> doing (btw I never said that MVCC is bad).
>
> NOW ===> on COMMIT previous version record is expired and the
>         new version record (created in new dynamically allocated
>         spot, as you said) is set as "active"
>
> MY  ===> on COMMIT, to update new version data over the same physical
> location that initial version was
>         and release the space used to keep the new version (that was
> dynamically allocated).

But what about other transactions that can still see the old version?

You can't overwrite the old data if there are any other transactions open in the system at all.  You have to have a
mechanismto keep the old copy around for a while. 

>
> The rest are all the same! I do not think that this is breaking anything
> and I still believe that this might help.
>
> I will try to plan upgrade the soonest possible to the newest version.
> Reading few words about HOT updates
> it seems that more or less is similar to what I have described and will
> be very helpful.
>
> Kyriacos

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: MVCC performance issue