Обсуждение: HOT is applied

Поиск
Список
Период
Сортировка

HOT is applied

От
Tom Lane
Дата:
I've committed the HOT patch.  I'd still like to think about whether we
can be smarter about when to invoke pruning, but that's a small enough
issue that the patch can go in without it.
        regards, tom lane


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> I've committed the HOT patch.

Thanks, much easier to work with it now that it's in.

>  I'd still like to think about whether we
> can be smarter about when to invoke pruning, but that's a small enough
> issue that the patch can go in without it.

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Instead of/in addition to avoiding pruning when it doesn't help, maybe
we could make HeapTupleSatisfiesVacuum cheaper.

I'm going to continue testing, this is just a heads-up that HOT as
committed seriously hurts performance in some cases. (though one can
argue that this test case isn't a very realistic one.)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
DROP TABLE IF EXISTS testtable;
CREATE TABLE testtable (key integer);

-- Note that there's no indexes, so updates have to do a seq scan.

CREATE OR REPLACE FUNCTION testfunc(data int, key1 int) RETURNS int AS $$
DECLARE
BEGIN
  FOR cnt IN 1..10000 LOOP
    UPDATE testtable SET key = data WHERE key = key1;
  END LOOP;
  RETURN 1;
END;
$$ LANGUAGE plpgsql;

INSERT INTO testtable VALUES (1);
BEGIN;
SELECT testfunc(1,1);
COMMIT;

Re: HOT is applied

От
"Merlin Moncure"
Дата:
On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> Tom Lane wrote:
> > I've committed the HOT patch.
>
> Thanks, much easier to work with it now that it's in.
>
> >  I'd still like to think about whether we
> > can be smarter about when to invoke pruning, but that's a small enough
> > issue that the patch can go in without it.
>
> Yeah. I'm doing some micro-benchmarking, and the attached test case is
> much slower with HOT. It's spending a lot of time trying to prune, only
> to find out that it can't.
>
> Instead of/in addition to avoiding pruning when it doesn't help, maybe
> we could make HeapTupleSatisfiesVacuum cheaper.
>
> I'm going to continue testing, this is just a heads-up that HOT as
> committed seriously hurts performance in some cases. (though one can
> argue that this test case isn't a very realistic one.)

well, I ran your test on my box and here are the results:
pre hot:
run 1: 3617.641 ms
run 2: 5195.215 ms
run 3: 6760.449 ms
after vacuum:
run 1: 4171.362 ms
run 2: 5513.317 ms
run 3: 6884.125 ms
post hot:
run 1: Time: 7286.292 ms
run 2: Time: 7477.089 ms
run 3: Time: 7701.229 ms

those results aren't exactly terrible, and this case is highly artificial.

merlin


Re: HOT is applied

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> I'd still like to think about whether we
>> can be smarter about when to invoke pruning, but that's a small enough
>> issue that the patch can go in without it.

> Yeah. I'm doing some micro-benchmarking, and the attached test case is
> much slower with HOT. It's spending a lot of time trying to prune, only
> to find out that it can't.

Not sure if that's an appropriate description or not.  oprofile
(on a dual Xeon running Fedora 6) shows me this:

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 100000
samples  %        symbol name
1070003  29.8708  LWLockAcquire
1015097  28.3380  LWLockRelease
283514    7.9147  heap_page_prune
252270    7.0425  AllocSetAlloc
148981    4.1590  HeapTupleSatisfiesMVCC
146442    4.0882  TransactionIdIsInProgress
92673     2.5871  AllocSetFree
84966     2.3720  HeapTupleSatisfiesVacuum
83097     2.3198  TransactionIdGetStatus
80737     2.2539  SimpleLruReadPage_ReadOnly
52803     1.4741  TransactionLogFetch
43406     1.2117  heapgetpage
42536     1.1875  HeapTupleHeaderGetCmin
34842     0.9727  TransactionIdIsCurrentTransactionId
27761     0.7750  TransactionIdDidAbort
24833     0.6933  MemoryContextAlloc
16446     0.4591  TransactionIdPrecedes
16089     0.4491  HeapTupleHeaderGetCmax
12919     0.3607  hash_search_with_hash_value
11857     0.3310  pfree
4964      0.1386  heap_page_prune_opt
3683      0.1028  hash_any
3215      0.0898  LWLockConditionalAcquire
3086      0.0862  PinBuffer
2573      0.0718  UnpinBuffer
2009      0.0561  ConditionalLockBufferForCleanup
1854      0.0518  ReadBuffer_common
934       0.0261  XLogInsert
784       0.0219  heapgettup_pagemode

so basically it's all about the locking.  Maybe the problem is that with
HOT we lock the buffer too often?  heap_page_prune_opt is designed to
not take the buffer lock unless there's a good probability of needing
to prune, but maybe that's not working as intended.

[ pokes at it a bit more... ]  Actually the disturbing part is that
pruning doesn't seem to be working at all: after the test finishes,
I see

regression=# vacuum verbose testtable;
INFO:  vacuuming "public.testtable"
INFO:  "testtable": removed 10000 row versions in 44 pages
INFO:  "testtable": found 10000 removable, 1 nonremovable row versions in 45 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 9955 unused item pointers.
45 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction?  I'd have hoped to see this example use only
one heap page ...
        regards, tom lane


Re: HOT is applied

От
Tom Lane
Дата:
I wrote:
> ... so basically it's all about the locking.  Maybe the problem is that with
> HOT we lock the buffer too often?  heap_page_prune_opt is designed to
> not take the buffer lock unless there's a good probability of needing
> to prune, but maybe that's not working as intended.

Indeed it seems it's not; gprof shows
               0.00    0.00     111/1389276     index_getnext <cycle 7> [89]               0.05   49.52 1389165/1389276
   heapgetpage [6]
 
[8]     50.9    0.05   49.52 1389276         heap_page_prune_opt [8]               7.17   42.31 1366696/1366696
heap_page_prune[9]               0.01    0.03 1366696/1366696     ConditionalLockBufferForCleanup [106]
0.01   0.00 2755558/2780795     PageGetHeapFreeSpace [177]
 

so this example is getting past the heuristic tests in
heap_page_prune_opt almost every time.  Why is that?  Too tired to poke
at it more tonight.
        regards, tom lane


Re: HOT is applied

От
"Pavan Deolasee"
Дата:


On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction?  I'd have hoped to see this example use only
one heap page ...


Not before the transaction commits ? In the test, we update a single tuple
10000 times in the same transaction. So there is no opportunity to prune.

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: HOT is applied

От
Tom Lane
Дата:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Shouldn't we be able to prune rows that have been inserted and deleted
>> by the same transaction?  I'd have hoped to see this example use only
>> one heap page ...
>> 
> Not before the transaction commits ? In the test, we update a single tuple
> 10000 times in the same transaction. So there is no opportunity to prune.

[ looks a bit more ... ]  Hm, the test I was thinking of was this one
at the end of HeapTupleSatisfiesVacuum:
   if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),                           HeapTupleHeaderGetXmax(tuple)))   {
     /*        * Inserter also deleted it, so it was never visible to anyone else.        * However, we can only remove
itearly if it's not an updated tuple;        * else its parent tuple is linking to it via t_ctid, and this tuple
*mustn't go away before the parent does.        */       if (!(tuple->t_infomask & HEAP_UPDATED))           return
HEAPTUPLE_DEAD;  }
 

but control never gets that far because neither xmin nor xmax is
committed yet.  We could fix that, probably, by considering the
xmin=xmax case in the xmin-in-progress case further up; but the
HEAP_UPDATED exclusion is still a problem.  Still, it seems like this
is leaving some money on the table when you think about pruning a HOT
chain.  Can we improve on it easily?
        regards, tom lane


Re: HOT is applied

От
"Pavan Deolasee"
Дата:


On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:


but control never gets that far because neither xmin nor xmax is
committed yet.  We could fix that, probably, by considering the
xmin=xmax case in the xmin-in-progress case further up; but the
HEAP_UPDATED exclusion is still a problem.  Still, it seems like this
is leaving some money on the table when you think about pruning a HOT
chain.  Can we improve on it easily?


May be we can, but it would get a bit tricky. There might be a transaction
looking at the first tuple in the chain and waiting for this (inserting-deleting)
transaction to finish. If the waiting transaction is running in READ COMMITTED
mode, it would then follow the update chain. Removing any intermediate
tuples without fixing the previous tuple's xmax/ctid (or redirected line pointer)
would be tricky.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: HOT is applied

От
Bruce Momjian
Дата:
Heikki Linnakangas wrote:
> Tom Lane wrote:
> > I've committed the HOT patch.
> 
> Thanks, much easier to work with it now that it's in.
> 
> >  I'd still like to think about whether we
> > can be smarter about when to invoke pruning, but that's a small enough
> > issue that the patch can go in without it.
> 
> Yeah. I'm doing some micro-benchmarking, and the attached test case is
> much slower with HOT. It's spending a lot of time trying to prune, only
> to find out that it can't.
> 
> Instead of/in addition to avoiding pruning when it doesn't help, maybe
> we could make HeapTupleSatisfiesVacuum cheaper.
> 
> I'm going to continue testing, this is just a heads-up that HOT as
> committed seriously hurts performance in some cases. (though one can
> argue that this test case isn't a very realistic one.)

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: HOT is applied

От
"Pavan Deolasee"
Дата:


On 9/21/07, Bruce Momjian <bruce@momjian.us> wrote:

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?


The way it stands today, yes. Thats one reason why we are seeing
the performance drop in this particular case.

I liked the earlier proposed idea of storing the xid in the page header
to quickly tell us whether its worth pruning the page.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Bruce Momjian wrote:
> This might be a simplistic question but if the page is +90% full and
> there is a long-lived transaction, isn't Postgres going to try pruning
> on each page read access?

Yes :(. That's why we earlier talked about stored the xid of the oldest
deleted tuple on the page in the page header. That way we could skip the
fruitless pruning attempts until that xid < OldestXmin.

Another approach is to try to make HeapTupleSatisfiesVacuum cheaper, so
that the fruitless pruning attempts wouldn't hurt that much.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> I'd still like to think about whether we
>>> can be smarter about when to invoke pruning, but that's a small enough
>>> issue that the patch can go in without it.
> 
>> Yeah. I'm doing some micro-benchmarking, and the attached test case is
>> much slower with HOT. It's spending a lot of time trying to prune, only
>> to find out that it can't.
> 
> Not sure if that's an appropriate description or not.  oprofile
> (on a dual Xeon running Fedora 6) shows me this:
> 
> ...
> samples  %        symbol name
> 1070003  29.8708  LWLockAcquire
> 1015097  28.3380  LWLockRelease
> 283514    7.9147  heap_page_prune
> ...
> so basically it's all about the locking.  Maybe the problem is that with
> HOT we lock the buffer too often?  heap_page_prune_opt is designed to
> not take the buffer lock unless there's a good probability of needing
> to prune, but maybe that's not working as intended.

If you look at the callgraph, you'll see that those
LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum ->
TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A
"if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in
TransactionIdIsInProgress would speed that up, but I wonder if there's a
more general solution to make HeapTupleSatisfiesVacuum cheaper. For
example, we could cache the in-progress status of tuples.

> Shouldn't we be able to prune rows that have been inserted and deleted
> by the same transaction?  I'd have hoped to see this example use only
> one heap page ...

Well maybe, but that's a separate issue. Wouldn't we need the "snapshot
bookkeeping" we've discussed in the past, to notice that there's no
snapshot in our own backend that needs to see the tuples?

Nevertheless, the fruitless pruning attempts would still hurt us in
slightly more complex scenarios, even if we fixed the above.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Merlin Moncure wrote:
> On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> Yeah. I'm doing some micro-benchmarking, and the attached test case is
>> much slower with HOT. It's spending a lot of time trying to prune, only
>> to find out that it can't.
>>
>> Instead of/in addition to avoiding pruning when it doesn't help, maybe
>> we could make HeapTupleSatisfiesVacuum cheaper.
>>
>> I'm going to continue testing, this is just a heads-up that HOT as
>> committed seriously hurts performance in some cases. (though one can
>> argue that this test case isn't a very realistic one.)
> 
> well, I ran your test on my box and here are the results:
> pre hot:
> run 1: 3617.641 ms
> run 2: 5195.215 ms
> run 3: 6760.449 ms
> after vacuum:
> run 1: 4171.362 ms
> run 2: 5513.317 ms
> run 3: 6884.125 ms
> post hot:
> run 1: Time: 7286.292 ms
> run 2: Time: 7477.089 ms
> run 3: Time: 7701.229 ms
> 
> those results aren't exactly terrible, and this case is highly artificial.

Your runtimes seem to be increasing as you repeat the test. Did you
remove the "DROP TABLE" from the beginning? On my laptop, post hot takes
~2x as long as pre hot, even when repeated, which matches the results of
your first runs.

Yeah, it's highly artificial.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Merlin Moncure wrote:
> On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> Yeah. I'm doing some micro-benchmarking, and the attached test case is
>> much slower with HOT. It's spending a lot of time trying to prune, only
>> to find out that it can't.
>>
>> Instead of/in addition to avoiding pruning when it doesn't help, maybe
>> we could make HeapTupleSatisfiesVacuum cheaper.
>>
>> I'm going to continue testing, this is just a heads-up that HOT as
>> committed seriously hurts performance in some cases. (though one can
>> argue that this test case isn't a very realistic one.)
> 
> well, I ran your test on my box and here are the results:
> pre hot:
> run 1: 3617.641 ms
> run 2: 5195.215 ms
> run 3: 6760.449 ms
> after vacuum:
> run 1: 4171.362 ms
> run 2: 5513.317 ms
> run 3: 6884.125 ms
> post hot:
> run 1: Time: 7286.292 ms
> run 2: Time: 7477.089 ms
> run 3: Time: 7701.229 ms
> 
> those results aren't exactly terrible,

Your runtimes seem to be increasing as you repeat the test. Did you
remove the "DROP TABLE" from the beginning? On my laptop, post hot takes
~2x as long as pre hot, even when repeated, which matches the results of
your first runs.

> and this case is highly artificial.

Yeah.

I'm going through a bunch of CPU test cases a colleague of mine wrote.
Some of the test cases shows much worse performance with HOT.
Unfortunately they're quite complex and overlapping, so instead of
posting them as is, I'm going through the ones that HOT performs badly
at, and try to reduce them to simpler test cases like the above.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
"Pavan Deolasee"
Дата:

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

so this example is getting past the heuristic tests in
heap_page_prune_opt almost every time.  Why is that?  Too tired to poke
at it more tonight.


I guess you already know the answer now, but anyways: Since we are
updating a single tuple in a single transaction, each update is preceded
by a sequential scan. All but last pages are completely full and marked
prunable, so HOT would try to (unsuccessfully) prune every (except may
be last) page.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: HOT is applied

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Bruce Momjian wrote:
>> This might be a simplistic question but if the page is +90% full and
>> there is a long-lived transaction, isn't Postgres going to try pruning
>> on each page read access?

> Yes :(

It shouldn't, though --- the hint bit should get cleared on the first
try.  I think I probably broke something in the last round of revisions
to heap_page_prune_opt, but haven't looked yet ...
        regards, tom lane


Re: HOT is applied

От
"Pavan Deolasee"
Дата:


On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It shouldn't, though --- the hint bit should get cleared on the first
try.  I think I probably broke something in the last round of revisions
to heap_page_prune_opt, but haven't looked yet ...


We set the hint bit (prunable) again when we see a RECENTLY_DEAD
or DELETE_IN_PROGRESS tuple. This is correct in normal circumstances
because otherwise we would never be able to prune the page.

On a second thought, may be we can choose not to set the bit again. In
the worst case, the next update on the page would be a COLD update.
That would set the bit again and we shall retry prune in the next visit to
the page.

This scheme would work as long as we don't put in mechanism to
update FSM after pruning. But if we choose to update FSM (I am
inclined to do this to avoid repeated relation extension because
of COLD updates), we would need to improve this. Otherwise a
page full of only DEAD tuples may never be pruned and vacuum
would be required to reclaim that space.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Bruce Momjian wrote:
>>> This might be a simplistic question but if the page is +90% full and
>>> there is a long-lived transaction, isn't Postgres going to try pruning
>>> on each page read access?
> 
>> Yes :(
> 
> It shouldn't, though --- the hint bit should get cleared on the first
> try.

Think so? That would mean that you only get a single chance to prune
after an update. Not sure how big a problem that is, but I do feel that
would make HOT a lot less effective in some usage patterns.

The idea of having a "prunable xmin" in the page header sounds more and
more attractive to me...

>  I think I probably broke something in the last round of revisions
> to heap_page_prune_opt, but haven't looked yet ...

Pavan's patch was like that as well; you don't clear the flag (or
rather, you set it again while pruning) until there's nothing left to
prune on the page.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
Tom Lane
Дата:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 
>> so this example is getting past the heuristic tests in
>> heap_page_prune_opt almost every time.  Why is that?  Too tired to poke
>> at it more tonight.
>> 
> I guess you already know the answer now, but anyways: Since we are
> updating a single tuple in a single transaction, each update is preceded
> by a sequential scan. All but last pages are completely full and marked
> prunable, so HOT would try to (unsuccessfully) prune every (except may
> be last) page.

Hmm ... the problem really is that heap_page_prune turns the hint back
on when it sees a DELETE_IN_PROGRESS tuple.  Maybe that's a bad idea.

I don't much like the idea of adding an xid to the page header --- for
one thing, *which* xid would you put there, and what would you test it
against?
        regards, tom lane


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> I don't much like the idea of adding an xid to the page header --- for
> one thing, *which* xid would you put there, and what would you test it
> against?

I was thinking that you would put the smallest in-progress xmax on the
page there, and you would test it against OldestXmin. If all
transactions commit, there surely isn't anything to prune until that xid
falls beyond the OldestXmin horizon. If an inserting transaction aborts,
we could prune the aborted tuple earlier, but optimizing for aborts
doesn't seem that important.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> If you look at the callgraph, you'll see that those
> LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum ->
> TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A
> "if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in
> TransactionIdIsInProgress would speed that up, but I wonder if there's a
> more general solution to make HeapTupleSatisfiesVacuum cheaper. For
> example, we could cache the in-progress status of tuples.

Dunno about "more general", but your idea reduces the runtime of this
example by about 50% (22.2s to 10.5s) for me.  I'm worried though that
it would be a net negative in more typical situations, especially if
you've got a lot of open subtransactions.
        regards, tom lane

*** src/backend/storage/ipc/procarray.c.orig    Sat Sep  8 16:31:15 2007
--- src/backend/storage/ipc/procarray.c    Fri Sep 21 11:08:34 2007
***************
*** 341,346 ****
--- 341,353 ----         return false;     } 
+     /*
+      * Also, we can detect our own transaction without any access to shared
+      * memory.
+      */
+     if (TransactionIdIsCurrentTransactionId(xid))
+         return true;
+      /* Get workspace to remember main XIDs in */     xids = (TransactionId *) palloc(sizeof(TransactionId) *
arrayP->maxProcs);
 


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> If you look at the callgraph, you'll see that those
>> LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum ->
>> TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A
>> "if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in
>> TransactionIdIsInProgress would speed that up, but I wonder if there's a
>> more general solution to make HeapTupleSatisfiesVacuum cheaper. For
>> example, we could cache the in-progress status of tuples.
> 
> Dunno about "more general", but your idea reduces the runtime of this
> example by about 50% (22.2s to 10.5s) for me.  I'm worried though that
> it would be a net negative in more typical situations, especially if
> you've got a lot of open subtransactions.

Yeah. I played with this a bit more, and came up with a couple of other
micro-optimizations:

1. Instead of pallocing and pfreeing a new array in
TransactionIdIsInProgress, we could just malloc the array once and reuse
it. That palloc/pfree alone was consuming around 8% of CPU time in the
test case.

2. About ~30% of CPU time is spent in HeapTupleSatisfiesMVCC ->
TransactionIdDidAbort. This is in fact not related to HOT, but it caught
my eye while looking at the profile. We have this piece of code in
HeapTupleSatisfiesMVCC:

>    /* deleting subtransaction aborted? */
>    /* FIXME -- is this correct w.r.t. the cmax of the tuple? */
>    if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
>    {
>        SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
>             InvalidTransactionId);
>         return true;
>     }
> 
>     Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));

We've already checked that the xmin is our own transaction id, so we
check if the xmax is an aborted subtransaction of our own transaction. A
TransactionIdDidAbort call seems like an awfully expensive way to check
that. We could call TransactionIdIsCurrentTransactionId instead, which
doesn't need to access any shared memory structures (but might be
expensive if you have a lot of active subxacts, as you pointed out). Or
we could keep a list of aborted subtransactions in backend private
memory, and just check against that. In this particular case, a simple
"if(xmin==xmax)" check would be effective as well, since we know that
the xmin is not aborted at that point.

3. One more general alternative to the little patch I sent earlier would
be to build an array of in-progress-xids in TransactionIdInProgress,
like we do in GetSnapshotData, and use that for the in-progress checks
in HeapTupleSatisfiesVacuum. That should work, if we build the cached
array when we lock the page for pruning, and use it until we unlock. If
a transaction commits/abort after we build the cached array, we might
return DELETE/INSERT_IN_PROGRESS for a tuple that is in fact already
DEAD, but that's ok. Since we're holding a lock on the page, it's not
possible for new transaction to start and subsequently put it's xid on
the page where we would bump into it. I believe this would have roughly
the same performance effect in this test case as calling
TransactionIdIsCurrentTransactionId from TransactionIdIsInProgress.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
Tom Lane
Дата:
I wrote:
> Dunno about "more general", but your idea reduces the runtime of this
> example by about 50% (22.2s to 10.5s) for me.  I'm worried though that
> it would be a net negative in more typical situations, especially if
> you've got a lot of open subtransactions.

Actually ... the only way that TransactionIdIsCurrentTransactionId can
take a meaningful amount of time is if you've got lots of
subtransactions, and in that case your own subxids cache has certainly
overflowed, which is likely to force TransactionIdIsInProgress into the
"slow answer" path.  But if we use TransactionIdIsCurrentTransactionId
to handle our own xids then we can just ignore MyProc altogether inside
the loop, thus very possibly (if we are the only overflowed-subxids proc)
saving us from going through the slow answer path.  So on reflection
I can't see how this isn't a win.  I'll clean it up and apply it.

I'm also starting to come around to liking the page-header-xid field
a bit more.  I suggest that it could replace the "page is prunable"
flag bit altogether --- to mark the page prunable, you must store
some appropriate xid into the header field.  This would avoid a useless
prune attempt immediately after a page is marked prunable.

A possible problem with it, if we treat it as a non-WAL-logged hint,
is that corruption of the value could lead to a page being marked with
an xid that's way in the future, keeping it from getting pruned.
However, if the page header gets corrupted this is the least of your
worries, so it doesn't seem like an enormous objection.
        regards, tom lane


Re: HOT is applied

От
"Merlin Moncure"
Дата:
On 9/21/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> Merlin Moncure wrote:
> > pre hot:
> > run 1: 3617.641 ms
> > run 2: 5195.215 ms
> > run 3: 6760.449 ms
> > after vacuum:
> > run 1: 4171.362 ms
> > run 2: 5513.317 ms
> > run 3: 6884.125 ms
> > post hot:
> > run 1: Time: 7286.292 ms
> > run 2: Time: 7477.089 ms
> > run 3: Time: 7701.229 ms
> >
> > those results aren't exactly terrible, and this case is highly artificial.
>
> Your runtimes seem to be increasing as you repeat the test. Did you
> remove the "DROP TABLE" from the beginning? On my laptop, post hot takes
> ~2x as long as pre hot, even when repeated, which matches the results of
> your first runs.

correct.

Well, my first round of results are so far not showing the big gains I
saw with hot in some of the earlier patches...so far, it looks
approximately to be a wash although with the reduced need to vacuum.
i'll test some more when things settle down.

merlin


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Merlin Moncure wrote:
> Well, my first round of results are so far not showing the big gains I
> saw with hot in some of the earlier patches...so far, it looks
> approximately to be a wash although with the reduced need to vacuum.
> i'll test some more when things settle down.

Oh... Which version did you test earlier? What are you testing, pgbench?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> Actually ... the only way that TransactionIdIsCurrentTransactionId can
> take a meaningful amount of time is if you've got lots of
> subtransactions, and in that case your own subxids cache has certainly
> overflowed, which is likely to force TransactionIdIsInProgress into the
> "slow answer" path.  But if we use TransactionIdIsCurrentTransactionId
> to handle our own xids then we can just ignore MyProc altogether inside
> the loop, thus very possibly (if we are the only overflowed-subxids proc)
> saving us from going through the slow answer path.  So on reflection
> I can't see how this isn't a win.  I'll clean it up and apply it.

Good point.

I noted that most callers of TransactionIdIsInProgress in tqual.c
already call TransactionIdIsCurrentTransactionId before
TransactionIdIsInProgress. In those cases we could just skip the test
for our own xids altogether, if it's worth code mangling to tell
TransactionIdIsInProgress whether it's safe to skip it or not.

> I'm also starting to come around to liking the page-header-xid field
> a bit more.  I suggest that it could replace the "page is prunable"
> flag bit altogether --- to mark the page prunable, you must store
> some appropriate xid into the header field.  This would avoid a useless
> prune attempt immediately after a page is marked prunable.

Yeah, we don't need the flag if we have an xid.

> A possible problem with it, if we treat it as a non-WAL-logged hint,
> is that corruption of the value could lead to a page being marked with
> an xid that's way in the future, keeping it from getting pruned.
> However, if the page header gets corrupted this is the least of your
> worries, so it doesn't seem like an enormous objection.

Agreed. Next attempt to prune or vacuum the page will fix it anyway.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HOT is applied

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Yeah. I played with this a bit more, and came up with a couple of other
> micro-optimizations:

> 1. Instead of pallocing and pfreeing a new array in
> TransactionIdIsInProgress, we could just malloc the array once and reuse
> it. That palloc/pfree alone was consuming around 8% of CPU time in the
> test case.

Good idea --- not only faster, but we can get rid of all the goto's and
the "locked" flag, if we're willing to have a couple more LWLockRelease
calls here.  I'll incorporate this in the patch I'm working up.

> We've already checked that the xmin is our own transaction id, so we
> check if the xmax is an aborted subtransaction of our own transaction. A
> TransactionIdDidAbort call seems like an awfully expensive way to check
> that. We could call TransactionIdIsCurrentTransactionId instead, which
> doesn't need to access any shared memory structures (but might be
> expensive if you have a lot of active subxacts, as you pointed out).

I like that idea too ...

> 3. One more general alternative to the little patch I sent earlier would
> be to build an array of in-progress-xids in TransactionIdInProgress,
> like we do in GetSnapshotData, and use that for the in-progress checks
> in HeapTupleSatisfiesVacuum. That should work, if we build the cached
> array when we lock the page for pruning, and use it until we unlock.

This seems like it uglifies a whole lot of APIs, though, for what's
probably not going to be a lot of gain.
        regards, tom lane


Re: HOT is applied

От
"Pavan Deolasee"
Дата:


On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:


I'm also starting to come around to liking the page-header-xid field
a bit more.  I suggest that it could replace the "page is prunable"
flag bit altogether --- to mark the page prunable, you must store
some appropriate xid into the header field.  This would avoid a useless
prune attempt immediately after a page is marked prunable.


Agree. I was thinking of minimum of the following:

1 xmin of INSERT_IN_PROGRESS tuple
2 xmax of DELETE_IN_PROGRESS tuple
3 xmax of RECENTLY_DEAD_TUPLE

When we attempt a prune, we can set xid to the minimum of the above
three. If none of the above tuples exist in the page, xid can be set to FrozenXid
A page need not be pruned if its xid is set to FrozenXid or is greater
than OldtestXmin.

In addition, I guess we can even set the xid also when we insert,
delete or update from the page unless its already set to a lower value.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: HOT is applied

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> We've already checked that the xmin is our own transaction id, so we
> check if the xmax is an aborted subtransaction of our own transaction. A
> TransactionIdDidAbort call seems like an awfully expensive way to check
> that. We could call TransactionIdIsCurrentTransactionId instead, which
> doesn't need to access any shared memory structures (but might be
> expensive if you have a lot of active subxacts, as you pointed out).

Applied --- seems to buy another 50% savings in the example with all the
pruning activity.  I'll get after the pruning activity itself in a bit,
but it seems worth squeezing the tqual functions first while we have
this example that stresses them so much.
        regards, tom lane


Re: HOT is applied

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> I noted that most callers of TransactionIdIsInProgress in tqual.c
> already call TransactionIdIsCurrentTransactionId before
> TransactionIdIsInProgress. In those cases we could just skip the test
> for our own xids altogether, if it's worth code mangling to tell
> TransactionIdIsInProgress whether it's safe to skip it or not.

At least in this example, it seems it wouldn't buy anything.
HeapTupleSatisfiesVacuum doesn't call
TransactionIdIsCurrentTransactionId, and while HeapTupleSatisfiesMVCC
does, it falls out at that point and doesn't get to
TransactionIdIsInProgress.  So I'm not inclined to contort the API
unless we can come up with an example where it's important.

I'm off to work on the page-header-xid idea ...
        regards, tom lane


Re: HOT is applied

От
Robert Treat
Дата:
On Friday 21 September 2007 13:02, Tom Lane wrote:
> I wrote:
> > Dunno about "more general", but your idea reduces the runtime of this
> > example by about 50% (22.2s to 10.5s) for me.  I'm worried though that
> > it would be a net negative in more typical situations, especially if
> > you've got a lot of open subtransactions.
>
> Actually ... the only way that TransactionIdIsCurrentTransactionId can
> take a meaningful amount of time is if you've got lots of
> subtransactions, and in that case your own subxids cache has certainly
> overflowed, which is likely to force TransactionIdIsInProgress into the
> "slow answer" path.  But if we use TransactionIdIsCurrentTransactionId
> to handle our own xids then we can just ignore MyProc altogether inside
> the loop, thus very possibly (if we are the only overflowed-subxids proc)
> saving us from going through the slow answer path.  So on reflection
> I can't see how this isn't a win.  I'll clean it up and apply it.
>

Just curious, but does this apply to subtransactions that are the result of 
plpgsql try...exception blocks? And how many is lots? Presumably looping 50k 
times in a single procedure would be enough? 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: HOT is applied

От
Tom Lane
Дата:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Just curious, but does this apply to subtransactions that are the result of 
> plpgsql try...exception blocks?

Only if they changed the database; else they won't have XIDs.
        regards, tom lane


Re: HOT is applied

От
"Merlin Moncure"
Дата:
On 9/21/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> Well, my first round of results are so far not showing the big gains I
> saw with hot in some of the earlier patches...so far, it looks
> approximately to be a wash although with the reduced need to vacuum.

let me correct myself here.  I did some longer runs and came up with
the following results  (these runs are still not really long enough, I
need to run some more tests).  The smaller, cpu bound cases were
blowout victories for hot.  The largest run (i/o bound) was very close
but there were not enough transactions to force a vacuum, after which
hot would probably pull away by some degree.

one small aside: I am suspicious that 8.3 improvements to the stats
collector overhead are going to reap big benefits.  This merits some
extra investigation, i think.

hardware:
2xintel 5160@3ghz, 4 cores total
8gb ram
5x15krpm sas, raid 0 data xfs
5x15krpm sas, raid 0, wal xfs

fsync on, asynch commit, partial page writes, autovac on

not hot build cvs dated 9/14
hot build cvs dated 9/21

merlin (results follow)

********* without hot: *********
scaling factor: 1
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 2345.225742 (including connections establishing)
tps = 2345.264846 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 3038.119776 (including connections establishing)
tps = 3038.185492 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 25
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 3671.987348 (including connections establishing)
tps = 3672.083077 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 4240.424756 (including connections establishing)
tps = 4240.542851 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 1000
number of clients: 8
number of transactions per client: 100000
number of transactions actually processed: 800000/800000
tps = 889.890173 (including connections establishing)
tps = 889.905128 (excluding connections establishing)

********* witht hot: *********
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 3186.553423 (including connections establishing)
tps = 3186.622178 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 5123.153371 (including connections establishing)
tps = 5123.331343 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 25
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 4994.897937 (including connections establishing)
tps = 4995.075480 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 8
number of transactions per client: 250000
number of transactions actually processed: 2000000/2000000
tps = 4675.778153 (including connections establishing)
tps = 4675.936096 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 1000
number of clients: 8
number of transactions per client: 100000
number of transactions actually processed: 800000/800000
tps = 893.904762 (including connections establishing)
tps = 893.919032 (excluding connections establishing)