Обсуждение: One tuple per transaction

Поиск
Список
Период
Сортировка

One tuple per transaction

От
"Tambet Matiisen"
Дата:
Hi!

In one of our applications we have a database function, which
recalculates COGS (cost of good sold) for certain period. This involves
deleting bunch of rows from one table, inserting them again in correct
order and updating them one-by-one (sometimes one row twice) to reflect
current state. The problem is, that this generates an enormous amount of
tuples in that table.

If I'm correct, the dead tuples must be scanned also during table and
index scan, so a lot of dead tuples slows down queries considerably,
especially when the table doesn't fit into shared buffers any more. And
as I'm in transaction, I can't VACUUM to get rid of those tuples. In one
occasion the page count for a table went from 400 to 22000 at the end.

All this made me wonder, why is new tuple created after every update?
One tuple per transaction should be enough, because you always commit or
rollback transaction as whole. And my observations seem to indicate,
that new index tuple is created after column update even if this column
is not indexed.

One tuple per transaction would save a loads of I/O bandwidth, so I
believe there must be a reason why it isn't implemented as such. Or were
my assumptions wrong, that dead tuples must be read from disk?

  Tambet

Re: One tuple per transaction

От
Josh Berkus
Дата:
Tambet,

> In one of our applications we have a database function, which
> recalculates COGS (cost of good sold) for certain period. This involves
> deleting bunch of rows from one table, inserting them again in correct
> order and updating them one-by-one (sometimes one row twice) to reflect
> current state. The problem is, that this generates an enormous amount of
> tuples in that table.

Sounds like you have an application design problem ...  how about re-writing
your function so it's a little more sensible?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: One tuple per transaction

От
"Tambet Matiisen"
Дата:

> -----Original Message-----
> From: Josh Berkus [mailto:josh@agliodbs.com]
> Sent: Sunday, March 13, 2005 12:05 AM
> To: Tambet Matiisen
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] One tuple per transaction
>
>
> Tambet,
>
> > In one of our applications we have a database function, which
> > recalculates COGS (cost of good sold) for certain period. This
> > involves deleting bunch of rows from one table, inserting
> them again
> > in correct order and updating them one-by-one (sometimes one row
> > twice) to reflect current state. The problem is, that this
> generates
> > an enormous amount of tuples in that table.
>
> Sounds like you have an application design problem ...  how
> about re-writing
> your function so it's a little more sensible?
>

I agree, that I have violated the no 1 rule of transactions - don't make
the transaction last too long. But imagine a situation, where a table is
updated twice in transaction. Why? Perhaps programmer felt, that the
code is more modular in this way. Now if you have tons of those
transactions, the I/O throughput is twice as big as it could be, because
every transaction creates two tuples instead of one. One tuple per
transaction could allow the programmer to keep his modular code and
benefit from the increased performance.

  Tambet

Re: One tuple per transaction

От
"Qingqing Zhou"
Дата:
""Tambet Matiisen"" <t.matiisen@aprote.ee> writes
> Hi!
>
> In one of our applications we have a database function, which
> recalculates COGS (cost of good sold) for certain period. This involves
> deleting bunch of rows from one table, inserting them again in correct
> order and updating them one-by-one (sometimes one row twice) to reflect
> current state. The problem is, that this generates an enormous amount of
> tuples in that table.
>
> If I'm correct, the dead tuples must be scanned also during table and
> index scan, so a lot of dead tuples slows down queries considerably,
> especially when the table doesn't fit into shared buffers any more. And
> as I'm in transaction, I can't VACUUM to get rid of those tuples. In one
> occasion the page count for a table went from 400 to 22000 at the end.

Not exactly. The dead tuple in the index will be scanned the first time (and
its pointed heap tuple as well), then we will mark it dead, then next time
we came here, we will know that the index tuple actually points to a uesless
tuple, so we will not scan its pointed heap tuple.

>
> All this made me wonder, why is new tuple created after every update?
> One tuple per transaction should be enough, because you always commit or
> rollback transaction as whole. And my observations seem to indicate,
> that new index tuple is created after column update even if this column
> is not indexed.

This is one cost of MVCC. A good thing of MVCC is there is no conflict
between read and write - maybe some applications need this.

A reference could be found here:

http://www.postgresql.org/docs/8.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE

>
> One tuple per transaction would save a loads of I/O bandwidth, so I
> believe there must be a reason why it isn't implemented as such. Or were
> my assumptions wrong, that dead tuples must be read from disk?
>
>   Tambet
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>



Re: One tuple per transaction

От
"Tambet Matiisen"
Дата:
> ------------------------------
>
> Date: Mon, 14 Mar 2005 09:41:30 +0800
> From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
> To: pgsql-performance@postgresql.org
> Subject: Re: One tuple per transaction
> Message-ID: <d12qcr$a1r$1@news.hub.org>
>
> ""Tambet Matiisen"" <t.matiisen@aprote.ee> writes
...
> > If I'm correct, the dead tuples must be scanned also during
> table and
> > index scan, so a lot of dead tuples slows down queries
> considerably,
> > especially when the table doesn't fit into shared buffers any more.
> > And as I'm in transaction, I can't VACUUM to get rid of
> those tuples.
> > In one occasion the page count for a table went from 400 to
> 22000 at
> > the end.
>
> Not exactly. The dead tuple in the index will be scanned the
> first time (and its pointed heap tuple as well), then we will
> mark it dead, then next time we came here, we will know that
> the index tuple actually points to a uesless tuple, so we
> will not scan its pointed heap tuple.
>

But the dead index tuple will still be read from disk next time? Maybe
really the performance loss will be neglible, but if most of tuples in
your table/index are dead, then it might be significant.

Consider the often suggested solution for speeding up "select count(*)
from table" query: make another table rowcounts and for each of the
original tables add insert and delete triggers to update row count in
rowcounts table. Actually this is standard denormalization technique,
which I use often. For example to ensure that order.total =
sum(order_line.total).

Now, if typical inserts into your most active table occur in batches of
3 rows, in one transaction, then row count for this table is updated 3
times during transaction. 3 updates generate 3 tuples, while 2 of them
are dead from the very start. You effectively commit 2 useless tuples.
After millions of inserts you end up with rowcounts table having 2/3 of
dead tuples and queries start to slow down.

Current solution is to vacuum often. My proposal was to create new tuple
only with first update. The next updates in the same transaction would
update the existing tuple, not create a new.

But as I'm writing this, I'm starting to get some of the associated
implementation problems. The updated tuple might not be the same size as
previous tuple. Tuple updates are probably not implemented anyway. And
for a reason, as disk write takes the same time, regardless if you
update or write new data. And tons of other problems, which developers
are probably more aware of.

But one thing still bothers me. Why is new index tuple generated when I
update non-indexed column? OK, I get it again. Index tuple points to
heap tuple, thus after update it would point to dead tuple. And as it
takes the same time to update pointer or to write a new tuple, it's
easier to write a new.

Case closed.

  Tambet

Re: One tuple per transaction

От
Richard Huxton
Дата:
Tambet Matiisen wrote:
>>
>>Not exactly. The dead tuple in the index will be scanned the
>>first time (and its pointed heap tuple as well), then we will
>>mark it dead, then next time we came here, we will know that
>>the index tuple actually points to a uesless tuple, so we
>>will not scan its pointed heap tuple.
>>
>
>
> But the dead index tuple will still be read from disk next time? Maybe
> really the performance loss will be neglible, but if most of tuples in
> your table/index are dead, then it might be significant.

When a block is read from disk, any dead tuples in that block will be
read in. Vacuum recovers these.

> Consider the often suggested solution for speeding up "select count(*)
> from table" query: make another table rowcounts and for each of the
> original tables add insert and delete triggers to update row count in
> rowcounts table. Actually this is standard denormalization technique,
> which I use often. For example to ensure that order.total =
> sum(order_line.total).

This does of course completely destroy concurrency. Since you need to
lock the summary table, other clients have to wait until you are done.

> Now, if typical inserts into your most active table occur in batches of
> 3 rows, in one transaction, then row count for this table is updated 3
> times during transaction. 3 updates generate 3 tuples, while 2 of them
> are dead from the very start. You effectively commit 2 useless tuples.
> After millions of inserts you end up with rowcounts table having 2/3 of
> dead tuples and queries start to slow down.
>
> Current solution is to vacuum often. My proposal was to create new tuple
> only with first update. The next updates in the same transaction would
> update the existing tuple, not create a new.

How do you roll back to a savepoint with this model?

--
   Richard Huxton
   Archonet Ltd

Re: One tuple per transaction

От
"Tambet Matiisen"
Дата:

> -----Original Message-----
> From: Richard Huxton [mailto:dev@archonet.com]
> Sent: Tuesday, March 15, 2005 11:38 AM
> To: Tambet Matiisen
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] One tuple per transaction
>
...
>
> > Consider the often suggested solution for speeding up
> "select count(*)
> > from table" query: make another table rowcounts and for each of the
> > original tables add insert and delete triggers to update
> row count in
> > rowcounts table. Actually this is standard denormalization
> technique,
> > which I use often. For example to ensure that order.total =
> > sum(order_line.total).
>
> This does of course completely destroy concurrency. Since you need to
> lock the summary table, other clients have to wait until you are done.
>

Yes, it does for rowcounts table. But consider the orders example - it
only locks the order which I add lines. As there is mostly one client
dealing with one order, but possibly thousands dealing with different
orders, it should not pose any concurrency restrictions.

> > Now, if typical inserts into your most active table occur
> in batches
> > of 3 rows, in one transaction, then row count for this table is
> > updated 3 times during transaction. 3 updates generate 3
> tuples, while
> > 2 of them are dead from the very start. You effectively commit 2
> > useless tuples. After millions of inserts you end up with rowcounts
> > table having 2/3 of dead tuples and queries start to slow down.
> >
> > Current solution is to vacuum often. My proposal was to create new
> > tuple only with first update. The next updates in the same
> transaction
> > would update the existing tuple, not create a new.
>
> How do you roll back to a savepoint with this model?
>

Every savepoint initiates a new (sub)transaction.

  Tambet

Re: One tuple per transaction

От
Robert Treat
Дата:
On Tuesday 15 March 2005 04:37, Richard Huxton wrote:
> Tambet Matiisen wrote:
> > Now, if typical inserts into your most active table occur in batches of
> > 3 rows, in one transaction, then row count for this table is updated 3
> > times during transaction. 3 updates generate 3 tuples, while 2 of them
> > are dead from the very start. You effectively commit 2 useless tuples.
> > After millions of inserts you end up with rowcounts table having 2/3 of
> > dead tuples and queries start to slow down.
> >
> > Current solution is to vacuum often. My proposal was to create new tuple
> > only with first update. The next updates in the same transaction would
> > update the existing tuple, not create a new.
>
> How do you roll back to a savepoint with this model?
>

You can't, but you could add the caveat to just do this auto-reuse within any
given nested transaction.   Then as long as you aren't using savepoints you
get to reclaim all the space/

 On a similar note I was just wondering if it would be possible to mark any of
these dead tuples as ready to be reused at transaction commit time, since we
know that they are dead to any and all other transactions currently going on.
This would save you from having to vacuum to get the tuples marked ready for
reuse.  In the above scenario this could be a win, whether it would be
overall is hard to say.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: One tuple per transaction

От
Tom Lane
Дата:
Robert Treat <xzilla@users.sourceforge.net> writes:
>  On a similar note I was just wondering if it would be possible to
> mark any of these dead tuples as ready to be reused at transaction
> commit time, since we know that they are dead to any and all other
> transactions currently going on.

I believe VACUUM already knows that xmin = xmax implies the tuple
is dead to everyone.

> This would save you from having to vacuum to get the tuples marked
> ready for reuse.

No; you forgot about reclaiming associated index entries.

            regards, tom lane

Re: One tuple per transaction

От
Alvaro Herrera
Дата:
On Tue, Mar 15, 2005 at 06:51:19PM -0500, Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
> >  On a similar note I was just wondering if it would be possible to
> > mark any of these dead tuples as ready to be reused at transaction
> > commit time, since we know that they are dead to any and all other
> > transactions currently going on.
>
> I believe VACUUM already knows that xmin = xmax implies the tuple
> is dead to everyone.

Huh, that is too simplistic in a subtransactions' world, isn't it?

One way to solve this would be that a transaction that kills a tuple
checks whether it was created by itself (not necessarily the same Xid),
and somehow report it to the FSM right away.

That'd mean physically moving a lot of tuples in the page, so ISTM it's
too expensive an "optimization."  Oh, and also delete the tuple from
indexes.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte" (Andre Breton)

Re: One tuple per transaction

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Tue, Mar 15, 2005 at 06:51:19PM -0500, Tom Lane wrote:
>> I believe VACUUM already knows that xmin = xmax implies the tuple
>> is dead to everyone.

> Huh, that is too simplistic in a subtransactions' world, isn't it?

Well, it's still correct as a fast-path check.  There are extensions
you could imagine making ... but offhand I agree that it's not worth
the trouble.  Maybe in a few years when everyone and his sister is
using subtransactions constantly, we'll feel a need to optimize these
cases.

            regards, tom lane

Re: One tuple per transaction

От
Hannu Krosing
Дата:
On L, 2005-03-12 at 14:05 -0800, Josh Berkus wrote:
> Tambet,
>
> > In one of our applications we have a database function, which
> > recalculates COGS (cost of good sold) for certain period. This involves
> > deleting bunch of rows from one table, inserting them again in correct
> > order and updating them one-by-one (sometimes one row twice) to reflect
> > current state. The problem is, that this generates an enormous amount of
> > tuples in that table.
>
> Sounds like you have an application design problem ...  how about re-writing
> your function so it's a little more sensible?

Also, you could at least use a temp table for intermediate steps. This
will at least save WAL traffic.

--
Hannu Krosing <hannu@tm.ee>