Обсуждение: Introducing an advanced Frequent Update Optimization

Поиск
Список
Период
Сортировка

Introducing an advanced Frequent Update Optimization

От
"Simon Riggs"
Дата:
EnterpriseDB has been running a research project to improve the
performance of heavily updated tables. We have a number of approaches
prototyped and we'd like to discuss the best of these now on -hackers
for community input and patch submission to PostgreSQL core.

The most important step with any proposal is to agree that we have an
issue that needs improvement, discuss how widespread that issue is and
find some clear test cases that show up the problems. Tests are:

1. pgbench reveals performance that will degrade over a long period.

2. DBT-2 reveals performance that will degrade over a long period. Many
tests over a 2 hour period don't fully show this, especially when the
test is cafeully tuned.

3. Some common scenarios in applications are where some rows of a table
are "hot" from being constantly updated, while others are not. An
example of such a test case is the truckin' test, included here. It's
based directly on a specific customer application, but its been
generalised to make sure the underlying design pattern is clear.

These tests reveal the following issues, all of which are known:

- update performs inserts into indexes, as well as into heap blocks

- VACUUM can remove heap blocks easily, but performs much worse on
indexes, making VACUUM a less good solution. We have now been able to
speed up index VACUUM, but this require us to scan the whole index for
correct locking. VACUUM scans the whole table, whereas dead rows may
well be localised. Heap-needs-vacuum-bitmap has been proposed here, but
no solution currently exists for vacuuming only parts of indexes and so
proposals for concurrent vacuums are now being considered.

- indexes that have been stretched apart by updates do not ever coalesce
again and require regular REINDEX, which is not yet possible
concurrently; the contention caused by this would be catastrophic for
performance, even if anybody knew of a way to do this concurrently.

- There are specific issues with the optimizer's ability to understand
dead row numbers, which can in some cases lead to SeqScan plans that are
inappropriate when tables grow because of updates. This is a red-herring
that can lead to people thinking the situation is worse than it is; that
needs fixing, but the core issues mentioned above remain.

To alleviate these problems we've added features such as WITH fillfactor
for heaps and table-level autovacuum tuning. Tuning all of these
features to good effect is an art form that is beyond the reasonable for
most users. Many internal optimizations have been made in this area and
as can be seen, many are still required to achieve better performance.

The proposal about to be made takes a more radical approach and
re-examines the architecture of the heap, to allow us to consider much
faster designs for heavy UPDATEs. Although initially radical, the
proposal appears to be fully MVCC correct, crash safe as well as being
much faster under heavy updates, while approximately neutral in other
cases with no major downsides.

Why should we care? The UPDATE case has obvious use-cases in a business
design pattern I'll call CustomerAccountDebit which is pervasive in
pay-per-use websites, banks, telephone companies, road traffic
monitoring etc etc. It's also pervasive in Data Warehousing where
summary tables/materialized views are regularly updated to maintain a
current picture of spending, movements or any other accumulation of
event detail. It's everywhere, basically.

Your various viewpoints on the above are welcome, but assuming for the
moment that you agree so far, we can move towards the proposal...

These discussions will likely be lengthy if taken seriously and need to
cover a range of different topics to ensure we cover what we know and
ensure we listen to all the feedback everybody gives. To that end, I'd
like to introduce two colleagues of mine to the community, Pavan
Deolasee and Nikhil Sontakke who have been working hard on developing
the prototypes and measuring/tuning them respectively.

I would stress that we are not bringing our first prototype to the
table, but actually design #5. We think you'll be interested, but we
won't take that for granted.

Our next steps will be to

- discuss various other approaches to the problem, and why we are now
proposing one specific approach and receive "why dont we..." feedback
and additional ideas (Simon)

- discuss the proposal in technical depth, explain the challenges that
remain and ask for feedback and input on those, with specific regard to
low-level coding (Pavan)

- present details of performance testing done so far (Nikhil)

- explain the measures we have taken to prove the correctness of our
approach for MVCC, crash safety and PITR (Simon)

Each of these areas will be started as a separate thread of discussion
on -hackers, to allow us to stay focused on those topics.

But before we do that, any comments on the above?

---

The truckin test case included here consists of a complex update
function that is executed by a custom pgbench script, on the postgres
database.

psql -f truckin.sql postgres

for each test

pgbench -n -f truckin.pgb postgres

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Вложения

Re: Introducing an advanced Frequent Update Optimization

От
Mark Kirkwood
Дата:
Simon Riggs wrote:
> EnterpriseDB has been running a research project to improve the
> performance of heavily updated tables. We have a number of approaches
> prototyped and we'd like to discuss the best of these now on -hackers
> for community input and patch submission to PostgreSQL core.
> 

Excellent! It would certainly be good for use cases like:

- session data from web (or similar) applications
- real time summary tables maintained by triggers

to "just work", as (certainly in the case of the first one) quite a few 
folks have been bitten by exactly the issue you describe.

best wishes

Mark


Re: Introducing an advanced Frequent Update Optimization

От
ITAGAKI Takahiro
Дата:
"Simon Riggs" <simon@2ndquadrant.com> wrote:

> EnterpriseDB has been running a research project to improve the
> performance of heavily updated tables. We have a number of approaches
> prototyped and we'd like to discuss the best of these now on -hackers
> for community input and patch submission to PostgreSQL core.

I'm very interested in your proposal! NTT is also working for OLTP workloads,
especially on improvements of VACUUM. Maybe we have similar problems.


> - VACUUM can remove heap blocks easily, but performs much worse on
> indexes, making VACUUM a less good solution. We have now been able to
> speed up index VACUUM, but this require us to scan the whole index for
> correct locking. VACUUM scans the whole table, whereas dead rows may
> well be localised. Heap-needs-vacuum-bitmap has been proposed here, but
> no solution currently exists for vacuuming only parts of indexes and so
> proposals for concurrent vacuums are now being considered.
> 
> The proposal about to be made takes a more radical approach and
> re-examines the architecture of the heap, to allow us to consider much
> faster designs for heavy UPDATEs. Although initially radical, the
> proposal appears to be fully MVCC correct, crash safe as well as being
> much faster under heavy updates, while approximately neutral in other
> cases with no major downsides.

I made a prototypes of Heap-needs-vacuum-bitmap and per-entry-index-deletion.
The test result shows that it saves vacuuming time. I'm refining and making
it robust now.

We can make use of the present structures with the approach, so I have
thought it is a relatively good direction. However, you seem to propose
a whole new storage engine or on-disk-structure. Do you have any viewpoints
that some kinds of extending-VACUUM approach are not enough?
It would be very nice if you could give us some more background.

> - discuss various other approaches to the problem, and why we are now
> proposing one specific approach and receive "why dont we..." feedback
> and additional ideas (Simon)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Introducing an advanced Frequent Update Optimization

От
"Simon Riggs"
Дата:
On Tue, 2006-11-07 at 13:02 +0900, ITAGAKI Takahiro wrote:
> "Simon Riggs" <simon@2ndquadrant.com> wrote:
> 
> > EnterpriseDB has been running a research project to improve the
> > performance of heavily updated tables. We have a number of approaches
> > prototyped and we'd like to discuss the best of these now on -hackers
> > for community input and patch submission to PostgreSQL core.
> 
> I'm very interested in your proposal! NTT is also working for OLTP workloads,
> especially on improvements of VACUUM. Maybe we have similar problems.

Seems very likely.

> I made a prototypes of Heap-needs-vacuum-bitmap and per-entry-index-deletion.
> The test result shows that it saves vacuuming time. I'm refining and making
> it robust now.
> 
> We can make use of the present structures with the approach, so I have
> thought it is a relatively good direction. However, you seem to propose
> a whole new storage engine or on-disk-structure. Do you have any viewpoints
> that some kinds of extending-VACUUM approach are not enough?

Thats been something we have considered, with good results.

We still need to VACUUM, but in a modified way.

> It would be very nice if you could give us some more background.

Certainly. We'll be posting a full design description on Wednesday; I'm
just editing that now.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Introducing an advanced Frequent Update Optimization

От
Andrew Sullivan
Дата:
On Mon, Nov 06, 2006 at 09:50:53PM +0000, Simon Riggs wrote:
> - There are specific issues with the optimizer's ability to understand 
> dead row numbers, which can in some cases lead to SeqScan plans that are
> inappropriate when tables grow because of updates. This is a red-herring
> that can lead to people thinking the situation is worse than it is; that
> needs fixing, but the core issues mentioned above remain.

I don't disagree with much of what you say, but I'm slightly
concerned about the wave-away answer that you give here.  In my
experience on high-update tables -- especially ones with the ones
with few rows, but lots of open transactions over the lifetime of the
row -- accurate understanding of dead rows would be a _dramatic_
improvement (perhaps at least as significant as the improvement being
discussed).

That said, I'm not opposed to the line you're taking.  I just don't
want this problem to sink forever, because it's a big problem.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.                --Brad Holland


Re: Introducing an advanced Frequent Update Optimization

От
"Simon Riggs"
Дата:
On Thu, 2006-11-09 at 04:09 -0500, Andrew Sullivan wrote:
> On Mon, Nov 06, 2006 at 09:50:53PM +0000, Simon Riggs wrote:
> > - There are specific issues with the optimizer's ability to understand 
> > dead row numbers, which can in some cases lead to SeqScan plans that are
> > inappropriate when tables grow because of updates. This is a red-herring
> > that can lead to people thinking the situation is worse than it is; that
> > needs fixing, but the core issues mentioned above remain.
> 
> I don't disagree with much of what you say, but I'm slightly
> concerned about the wave-away answer that you give here.  In my
> experience on high-update tables -- especially ones with the ones
> with few rows, but lots of open transactions over the lifetime of the
> row -- accurate understanding of dead rows would be a _dramatic_
> improvement (perhaps at least as significant as the improvement being
> discussed).

Understood. I just wanted to make sure people understand that the
underlying problem would still be there even if we fix that.

> That said, I'm not opposed to the line you're taking.  I just don't
> want this problem to sink forever, because it's a big problem.

So, yeh, we should still fix that. The current prototype has a different
cost model for SeqScans as a result. Summary arriving anytime now.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Introducing an advanced Frequent Update Optimization

От
"Simon Riggs"
Дата:
On Tue, 2006-11-07 at 15:00 +1300, Mark Kirkwood wrote:
> Simon Riggs wrote:
> > EnterpriseDB has been running a research project to improve the
> > performance of heavily updated tables. We have a number of approaches
> > prototyped and we'd like to discuss the best of these now on -hackers
> > for community input and patch submission to PostgreSQL core.
> > 
> 
> Excellent! It would certainly be good for use cases like:
> 
> - session data from web (or similar) applications
> - real time summary tables maintained by triggers
> 
> to "just work", as (certainly in the case of the first one) quite a few 
> folks have been bitten by exactly the issue you describe.

...and of course it would be good if LISTEN/NOTIFY were able to use this
concept also, to help Slony along also.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Introducing an advanced Frequent Update Optimization

От
Christopher Browne
Дата:
Quoth simon@2ndquadrant.com ("Simon Riggs"):
> On Tue, 2006-11-07 at 15:00 +1300, Mark Kirkwood wrote:
>> Simon Riggs wrote:
>> > EnterpriseDB has been running a research project to improve the
>> > performance of heavily updated tables. We have a number of approaches
>> > prototyped and we'd like to discuss the best of these now on -hackers
>> > for community input and patch submission to PostgreSQL core.
>> > 
>> 
>> Excellent! It would certainly be good for use cases like:
>> 
>> - session data from web (or similar) applications
>> - real time summary tables maintained by triggers
>> 
>> to "just work", as (certainly in the case of the first one) quite a few 
>> folks have been bitten by exactly the issue you describe.
>
> ...and of course it would be good if LISTEN/NOTIFY were able to use this
> concept also, to help Slony along also.

That should be much less relevant as people migrate to version 1.2, as
1.2 uses LISTEN/NOTIFY a whole lot less than earlier versions:
1.  Number of events generated is cut in 1/2 because we don't    generate NOTIFIES for confirmations anymore
2.  When a thread is busy, it shuts off LISTEN, and polls.  That    will cut pg_listener bloat further...  
-- 
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/internet.html
"Ah,  fall  - when  leaves  turn  to  burnished colors  upon  darkling
branches,  collars are  turned  up  against a  wind  which murmurs  of
winter, and homework assignments appear on Usenet.  <sigh>"
-- Bob Jarvis


Re: Introducing an advanced Frequent Update Optimization

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> ...and of course it would be good if LISTEN/NOTIFY were able to use this
> concept also, to help Slony along also.

LISTEN/NOTIFY are overdue to be rewritten to not use a table at all,
so I'm not particularly worried about whether this idea is applicable
to them.
        regards, tom lane