Обсуждение: HOT is applied
I've committed the HOT patch. I'd still like to think about whether we can be smarter about when to invoke pruning, but that's a small enough issue that the patch can go in without it. regards, tom lane
Tom Lane wrote: > I've committed the HOT patch. Thanks, much easier to work with it now that it's in. > I'd still like to think about whether we > can be smarter about when to invoke pruning, but that's a small enough > issue that the patch can go in without it. Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Instead of/in addition to avoiding pruning when it doesn't help, maybe we could make HeapTupleSatisfiesVacuum cheaper. I'm going to continue testing, this is just a heads-up that HOT as committed seriously hurts performance in some cases. (though one can argue that this test case isn't a very realistic one.) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com DROP TABLE IF EXISTS testtable; CREATE TABLE testtable (key integer); -- Note that there's no indexes, so updates have to do a seq scan. CREATE OR REPLACE FUNCTION testfunc(data int, key1 int) RETURNS int AS $$ DECLARE BEGIN FOR cnt IN 1..10000 LOOP UPDATE testtable SET key = data WHERE key = key1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; INSERT INTO testtable VALUES (1); BEGIN; SELECT testfunc(1,1); COMMIT;
On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Tom Lane wrote: > > I've committed the HOT patch. > > Thanks, much easier to work with it now that it's in. > > > I'd still like to think about whether we > > can be smarter about when to invoke pruning, but that's a small enough > > issue that the patch can go in without it. > > Yeah. I'm doing some micro-benchmarking, and the attached test case is > much slower with HOT. It's spending a lot of time trying to prune, only > to find out that it can't. > > Instead of/in addition to avoiding pruning when it doesn't help, maybe > we could make HeapTupleSatisfiesVacuum cheaper. > > I'm going to continue testing, this is just a heads-up that HOT as > committed seriously hurts performance in some cases. (though one can > argue that this test case isn't a very realistic one.) well, I ran your test on my box and here are the results: pre hot: run 1: 3617.641 ms run 2: 5195.215 ms run 3: 6760.449 ms after vacuum: run 1: 4171.362 ms run 2: 5513.317 ms run 3: 6884.125 ms post hot: run 1: Time: 7286.292 ms run 2: Time: 7477.089 ms run 3: Time: 7701.229 ms those results aren't exactly terrible, and this case is highly artificial. merlin
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> I'd still like to think about whether we >> can be smarter about when to invoke pruning, but that's a small enough >> issue that the patch can go in without it. > Yeah. I'm doing some micro-benchmarking, and the attached test case is > much slower with HOT. It's spending a lot of time trying to prune, only > to find out that it can't. Not sure if that's an appropriate description or not. oprofile (on a dual Xeon running Fedora 6) shows me this: CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % symbol name 1070003 29.8708 LWLockAcquire 1015097 28.3380 LWLockRelease 283514 7.9147 heap_page_prune 252270 7.0425 AllocSetAlloc 148981 4.1590 HeapTupleSatisfiesMVCC 146442 4.0882 TransactionIdIsInProgress 92673 2.5871 AllocSetFree 84966 2.3720 HeapTupleSatisfiesVacuum 83097 2.3198 TransactionIdGetStatus 80737 2.2539 SimpleLruReadPage_ReadOnly 52803 1.4741 TransactionLogFetch 43406 1.2117 heapgetpage 42536 1.1875 HeapTupleHeaderGetCmin 34842 0.9727 TransactionIdIsCurrentTransactionId 27761 0.7750 TransactionIdDidAbort 24833 0.6933 MemoryContextAlloc 16446 0.4591 TransactionIdPrecedes 16089 0.4491 HeapTupleHeaderGetCmax 12919 0.3607 hash_search_with_hash_value 11857 0.3310 pfree 4964 0.1386 heap_page_prune_opt 3683 0.1028 hash_any 3215 0.0898 LWLockConditionalAcquire 3086 0.0862 PinBuffer 2573 0.0718 UnpinBuffer 2009 0.0561 ConditionalLockBufferForCleanup 1854 0.0518 ReadBuffer_common 934 0.0261 XLogInsert 784 0.0219 heapgettup_pagemode so basically it's all about the locking. Maybe the problem is that with HOT we lock the buffer too often? heap_page_prune_opt is designed to not take the buffer lock unless there's a good probability of needing to prune, but maybe that's not working as intended. [ pokes at it a bit more... ] Actually the disturbing part is that pruning doesn't seem to be working at all: after the test finishes, I see regression=# vacuum verbose testtable; INFO: vacuuming "public.testtable" INFO: "testtable": removed 10000 row versions in 44 pages INFO: "testtable": found 10000 removable, 1 nonremovable row versions in 45 pages DETAIL: 0 dead row versions cannot be removed yet. There were 9955 unused item pointers. 45 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Shouldn't we be able to prune rows that have been inserted and deleted by the same transaction? I'd have hoped to see this example use only one heap page ... regards, tom lane
I wrote: > ... so basically it's all about the locking. Maybe the problem is that with > HOT we lock the buffer too often? heap_page_prune_opt is designed to > not take the buffer lock unless there's a good probability of needing > to prune, but maybe that's not working as intended. Indeed it seems it's not; gprof shows 0.00 0.00 111/1389276 index_getnext <cycle 7> [89] 0.05 49.52 1389165/1389276 heapgetpage [6] [8] 50.9 0.05 49.52 1389276 heap_page_prune_opt [8] 7.17 42.31 1366696/1366696 heap_page_prune[9] 0.01 0.03 1366696/1366696 ConditionalLockBufferForCleanup [106] 0.01 0.00 2755558/2780795 PageGetHeapFreeSpace [177] so this example is getting past the heuristic tests in heap_page_prune_opt almost every time. Why is that? Too tired to poke at it more tonight. regards, tom lane
On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction? I'd have hoped to see this example use only
one heap page ...
Not before the transaction commits ? In the test, we update a single tuple
10000 times in the same transaction. So there is no opportunity to prune.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Shouldn't we be able to prune rows that have been inserted and deleted >> by the same transaction? I'd have hoped to see this example use only >> one heap page ... >> > Not before the transaction commits ? In the test, we update a single tuple > 10000 times in the same transaction. So there is no opportunity to prune. [ looks a bit more ... ] Hm, the test I was thinking of was this one at the end of HeapTupleSatisfiesVacuum: if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple), HeapTupleHeaderGetXmax(tuple))) { /* * Inserter also deleted it, so it was never visible to anyone else. * However, we can only remove itearly if it's not an updated tuple; * else its parent tuple is linking to it via t_ctid, and this tuple *mustn't go away before the parent does. */ if (!(tuple->t_infomask & HEAP_UPDATED)) return HEAPTUPLE_DEAD; } but control never gets that far because neither xmin nor xmax is committed yet. We could fix that, probably, by considering the xmin=xmax case in the xmin-in-progress case further up; but the HEAP_UPDATED exclusion is still a problem. Still, it seems like this is leaving some money on the table when you think about pruning a HOT chain. Can we improve on it easily? regards, tom lane
On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
but control never gets that far because neither xmin nor xmax is
committed yet. We could fix that, probably, by considering the
xmin=xmax case in the xmin-in-progress case further up; but the
HEAP_UPDATED exclusion is still a problem. Still, it seems like this
is leaving some money on the table when you think about pruning a HOT
chain. Can we improve on it easily?
May be we can, but it would get a bit tricky. There might be a transaction
looking at the first tuple in the chain and waiting for this (inserting-deleting)
transaction to finish. If the waiting transaction is running in READ COMMITTED
mode, it would then follow the update chain. Removing any intermediate
tuples without fixing the previous tuple's xmax/ctid (or redirected line pointer)
would be tricky.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Tom Lane wrote: > > I've committed the HOT patch. > > Thanks, much easier to work with it now that it's in. > > > I'd still like to think about whether we > > can be smarter about when to invoke pruning, but that's a small enough > > issue that the patch can go in without it. > > Yeah. I'm doing some micro-benchmarking, and the attached test case is > much slower with HOT. It's spending a lot of time trying to prune, only > to find out that it can't. > > Instead of/in addition to avoiding pruning when it doesn't help, maybe > we could make HeapTupleSatisfiesVacuum cheaper. > > I'm going to continue testing, this is just a heads-up that HOT as > committed seriously hurts performance in some cases. (though one can > argue that this test case isn't a very realistic one.) This might be a simplistic question but if the page is +90% full and there is a long-lived transaction, isn't Postgres going to try pruning on each page read access? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 9/21/07, Bruce Momjian <bruce@momjian.us> wrote:
This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?
The way it stands today, yes. Thats one reason why we are seeing
the performance drop in this particular case.
I liked the earlier proposed idea of storing the xid in the page header
to quickly tell us whether its worth pruning the page.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Bruce Momjian wrote: > This might be a simplistic question but if the page is +90% full and > there is a long-lived transaction, isn't Postgres going to try pruning > on each page read access? Yes :(. That's why we earlier talked about stored the xid of the oldest deleted tuple on the page in the page header. That way we could skip the fruitless pruning attempts until that xid < OldestXmin. Another approach is to try to make HeapTupleSatisfiesVacuum cheaper, so that the fruitless pruning attempts wouldn't hurt that much. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> I'd still like to think about whether we >>> can be smarter about when to invoke pruning, but that's a small enough >>> issue that the patch can go in without it. > >> Yeah. I'm doing some micro-benchmarking, and the attached test case is >> much slower with HOT. It's spending a lot of time trying to prune, only >> to find out that it can't. > > Not sure if that's an appropriate description or not. oprofile > (on a dual Xeon running Fedora 6) shows me this: > > ... > samples % symbol name > 1070003 29.8708 LWLockAcquire > 1015097 28.3380 LWLockRelease > 283514 7.9147 heap_page_prune > ... > so basically it's all about the locking. Maybe the problem is that with > HOT we lock the buffer too often? heap_page_prune_opt is designed to > not take the buffer lock unless there's a good probability of needing > to prune, but maybe that's not working as intended. If you look at the callgraph, you'll see that those LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum -> TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A "if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in TransactionIdIsInProgress would speed that up, but I wonder if there's a more general solution to make HeapTupleSatisfiesVacuum cheaper. For example, we could cache the in-progress status of tuples. > Shouldn't we be able to prune rows that have been inserted and deleted > by the same transaction? I'd have hoped to see this example use only > one heap page ... Well maybe, but that's a separate issue. Wouldn't we need the "snapshot bookkeeping" we've discussed in the past, to notice that there's no snapshot in our own backend that needs to see the tuples? Nevertheless, the fruitless pruning attempts would still hurt us in slightly more complex scenarios, even if we fixed the above. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Merlin Moncure wrote: > On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: >> Yeah. I'm doing some micro-benchmarking, and the attached test case is >> much slower with HOT. It's spending a lot of time trying to prune, only >> to find out that it can't. >> >> Instead of/in addition to avoiding pruning when it doesn't help, maybe >> we could make HeapTupleSatisfiesVacuum cheaper. >> >> I'm going to continue testing, this is just a heads-up that HOT as >> committed seriously hurts performance in some cases. (though one can >> argue that this test case isn't a very realistic one.) > > well, I ran your test on my box and here are the results: > pre hot: > run 1: 3617.641 ms > run 2: 5195.215 ms > run 3: 6760.449 ms > after vacuum: > run 1: 4171.362 ms > run 2: 5513.317 ms > run 3: 6884.125 ms > post hot: > run 1: Time: 7286.292 ms > run 2: Time: 7477.089 ms > run 3: Time: 7701.229 ms > > those results aren't exactly terrible, and this case is highly artificial. Your runtimes seem to be increasing as you repeat the test. Did you remove the "DROP TABLE" from the beginning? On my laptop, post hot takes ~2x as long as pre hot, even when repeated, which matches the results of your first runs. Yeah, it's highly artificial. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Merlin Moncure wrote: > On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: >> Yeah. I'm doing some micro-benchmarking, and the attached test case is >> much slower with HOT. It's spending a lot of time trying to prune, only >> to find out that it can't. >> >> Instead of/in addition to avoiding pruning when it doesn't help, maybe >> we could make HeapTupleSatisfiesVacuum cheaper. >> >> I'm going to continue testing, this is just a heads-up that HOT as >> committed seriously hurts performance in some cases. (though one can >> argue that this test case isn't a very realistic one.) > > well, I ran your test on my box and here are the results: > pre hot: > run 1: 3617.641 ms > run 2: 5195.215 ms > run 3: 6760.449 ms > after vacuum: > run 1: 4171.362 ms > run 2: 5513.317 ms > run 3: 6884.125 ms > post hot: > run 1: Time: 7286.292 ms > run 2: Time: 7477.089 ms > run 3: Time: 7701.229 ms > > those results aren't exactly terrible, Your runtimes seem to be increasing as you repeat the test. Did you remove the "DROP TABLE" from the beginning? On my laptop, post hot takes ~2x as long as pre hot, even when repeated, which matches the results of your first runs. > and this case is highly artificial. Yeah. I'm going through a bunch of CPU test cases a colleague of mine wrote. Some of the test cases shows much worse performance with HOT. Unfortunately they're quite complex and overlapping, so instead of posting them as is, I'm going through the ones that HOT performs badly at, and try to reduce them to simpler test cases like the above. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
so this example is getting past the heuristic tests in
heap_page_prune_opt almost every time. Why is that? Too tired to poke
at it more tonight.
I guess you already know the answer now, but anyways: Since we are
updating a single tuple in a single transaction, each update is preceded
by a sequential scan. All but last pages are completely full and marked
prunable, so HOT would try to (unsuccessfully) prune every (except may
be last) page.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Bruce Momjian wrote: >> This might be a simplistic question but if the page is +90% full and >> there is a long-lived transaction, isn't Postgres going to try pruning >> on each page read access? > Yes :( It shouldn't, though --- the hint bit should get cleared on the first try. I think I probably broke something in the last round of revisions to heap_page_prune_opt, but haven't looked yet ... regards, tom lane
On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We set the hint bit (prunable) again when we see a RECENTLY_DEAD
or DELETE_IN_PROGRESS tuple. This is correct in normal circumstances
because otherwise we would never be able to prune the page.
On a second thought, may be we can choose not to set the bit again. In
the worst case, the next update on the page would be a COLD update.
That would set the bit again and we shall retry prune in the next visit to
the page.
This scheme would work as long as we don't put in mechanism to
It shouldn't, though --- the hint bit should get cleared on the first
try. I think I probably broke something in the last round of revisions
to heap_page_prune_opt, but haven't looked yet ...
We set the hint bit (prunable) again when we see a RECENTLY_DEAD
or DELETE_IN_PROGRESS tuple. This is correct in normal circumstances
because otherwise we would never be able to prune the page.
On a second thought, may be we can choose not to set the bit again. In
the worst case, the next update on the page would be a COLD update.
That would set the bit again and we shall retry prune in the next visit to
the page.
update FSM after pruning. But if we choose to update FSM (I am
inclined to do this to avoid repeated relation extension because
of COLD updates), we would need to improve this. Otherwise a
page full of only DEAD tuples may never be pruned and vacuum
would be required to reclaim that space.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> Bruce Momjian wrote: >>> This might be a simplistic question but if the page is +90% full and >>> there is a long-lived transaction, isn't Postgres going to try pruning >>> on each page read access? > >> Yes :( > > It shouldn't, though --- the hint bit should get cleared on the first > try. Think so? That would mean that you only get a single chance to prune after an update. Not sure how big a problem that is, but I do feel that would make HOT a lot less effective in some usage patterns. The idea of having a "prunable xmin" in the page header sounds more and more attractive to me... > I think I probably broke something in the last round of revisions > to heap_page_prune_opt, but haven't looked yet ... Pavan's patch was like that as well; you don't clear the flag (or rather, you set it again while pruning) until there's nothing left to prune on the page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> so this example is getting past the heuristic tests in >> heap_page_prune_opt almost every time. Why is that? Too tired to poke >> at it more tonight. >> > I guess you already know the answer now, but anyways: Since we are > updating a single tuple in a single transaction, each update is preceded > by a sequential scan. All but last pages are completely full and marked > prunable, so HOT would try to (unsuccessfully) prune every (except may > be last) page. Hmm ... the problem really is that heap_page_prune turns the hint back on when it sees a DELETE_IN_PROGRESS tuple. Maybe that's a bad idea. I don't much like the idea of adding an xid to the page header --- for one thing, *which* xid would you put there, and what would you test it against? regards, tom lane
Tom Lane wrote: > I don't much like the idea of adding an xid to the page header --- for > one thing, *which* xid would you put there, and what would you test it > against? I was thinking that you would put the smallest in-progress xmax on the page there, and you would test it against OldestXmin. If all transactions commit, there surely isn't anything to prune until that xid falls beyond the OldestXmin horizon. If an inserting transaction aborts, we could prune the aborted tuple earlier, but optimizing for aborts doesn't seem that important. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > If you look at the callgraph, you'll see that those > LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum -> > TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A > "if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in > TransactionIdIsInProgress would speed that up, but I wonder if there's a > more general solution to make HeapTupleSatisfiesVacuum cheaper. For > example, we could cache the in-progress status of tuples. Dunno about "more general", but your idea reduces the runtime of this example by about 50% (22.2s to 10.5s) for me. I'm worried though that it would be a net negative in more typical situations, especially if you've got a lot of open subtransactions. regards, tom lane *** src/backend/storage/ipc/procarray.c.orig Sat Sep 8 16:31:15 2007 --- src/backend/storage/ipc/procarray.c Fri Sep 21 11:08:34 2007 *************** *** 341,346 **** --- 341,353 ---- return false; } + /* + * Also, we can detect our own transaction without any access to shared + * memory. + */ + if (TransactionIdIsCurrentTransactionId(xid)) + return true; + /* Get workspace to remember main XIDs in */ xids = (TransactionId *) palloc(sizeof(TransactionId) * arrayP->maxProcs);
Tom Lane wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> If you look at the callgraph, you'll see that those >> LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum -> >> TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A >> "if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in >> TransactionIdIsInProgress would speed that up, but I wonder if there's a >> more general solution to make HeapTupleSatisfiesVacuum cheaper. For >> example, we could cache the in-progress status of tuples. > > Dunno about "more general", but your idea reduces the runtime of this > example by about 50% (22.2s to 10.5s) for me. I'm worried though that > it would be a net negative in more typical situations, especially if > you've got a lot of open subtransactions. Yeah. I played with this a bit more, and came up with a couple of other micro-optimizations: 1. Instead of pallocing and pfreeing a new array in TransactionIdIsInProgress, we could just malloc the array once and reuse it. That palloc/pfree alone was consuming around 8% of CPU time in the test case. 2. About ~30% of CPU time is spent in HeapTupleSatisfiesMVCC -> TransactionIdDidAbort. This is in fact not related to HOT, but it caught my eye while looking at the profile. We have this piece of code in HeapTupleSatisfiesMVCC: > /* deleting subtransaction aborted? */ > /* FIXME -- is this correct w.r.t. the cmax of the tuple? */ > if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple))) > { > SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, > InvalidTransactionId); > return true; > } > > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))); We've already checked that the xmin is our own transaction id, so we check if the xmax is an aborted subtransaction of our own transaction. A TransactionIdDidAbort call seems like an awfully expensive way to check that. We could call TransactionIdIsCurrentTransactionId instead, which doesn't need to access any shared memory structures (but might be expensive if you have a lot of active subxacts, as you pointed out). Or we could keep a list of aborted subtransactions in backend private memory, and just check against that. In this particular case, a simple "if(xmin==xmax)" check would be effective as well, since we know that the xmin is not aborted at that point. 3. One more general alternative to the little patch I sent earlier would be to build an array of in-progress-xids in TransactionIdInProgress, like we do in GetSnapshotData, and use that for the in-progress checks in HeapTupleSatisfiesVacuum. That should work, if we build the cached array when we lock the page for pruning, and use it until we unlock. If a transaction commits/abort after we build the cached array, we might return DELETE/INSERT_IN_PROGRESS for a tuple that is in fact already DEAD, but that's ok. Since we're holding a lock on the page, it's not possible for new transaction to start and subsequently put it's xid on the page where we would bump into it. I believe this would have roughly the same performance effect in this test case as calling TransactionIdIsCurrentTransactionId from TransactionIdIsInProgress. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
I wrote: > Dunno about "more general", but your idea reduces the runtime of this > example by about 50% (22.2s to 10.5s) for me. I'm worried though that > it would be a net negative in more typical situations, especially if > you've got a lot of open subtransactions. Actually ... the only way that TransactionIdIsCurrentTransactionId can take a meaningful amount of time is if you've got lots of subtransactions, and in that case your own subxids cache has certainly overflowed, which is likely to force TransactionIdIsInProgress into the "slow answer" path. But if we use TransactionIdIsCurrentTransactionId to handle our own xids then we can just ignore MyProc altogether inside the loop, thus very possibly (if we are the only overflowed-subxids proc) saving us from going through the slow answer path. So on reflection I can't see how this isn't a win. I'll clean it up and apply it. I'm also starting to come around to liking the page-header-xid field a bit more. I suggest that it could replace the "page is prunable" flag bit altogether --- to mark the page prunable, you must store some appropriate xid into the header field. This would avoid a useless prune attempt immediately after a page is marked prunable. A possible problem with it, if we treat it as a non-WAL-logged hint, is that corruption of the value could lead to a page being marked with an xid that's way in the future, keeping it from getting pruned. However, if the page header gets corrupted this is the least of your worries, so it doesn't seem like an enormous objection. regards, tom lane
On 9/21/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Merlin Moncure wrote: > > pre hot: > > run 1: 3617.641 ms > > run 2: 5195.215 ms > > run 3: 6760.449 ms > > after vacuum: > > run 1: 4171.362 ms > > run 2: 5513.317 ms > > run 3: 6884.125 ms > > post hot: > > run 1: Time: 7286.292 ms > > run 2: Time: 7477.089 ms > > run 3: Time: 7701.229 ms > > > > those results aren't exactly terrible, and this case is highly artificial. > > Your runtimes seem to be increasing as you repeat the test. Did you > remove the "DROP TABLE" from the beginning? On my laptop, post hot takes > ~2x as long as pre hot, even when repeated, which matches the results of > your first runs. correct. Well, my first round of results are so far not showing the big gains I saw with hot in some of the earlier patches...so far, it looks approximately to be a wash although with the reduced need to vacuum. i'll test some more when things settle down. merlin
Merlin Moncure wrote: > Well, my first round of results are so far not showing the big gains I > saw with hot in some of the earlier patches...so far, it looks > approximately to be a wash although with the reduced need to vacuum. > i'll test some more when things settle down. Oh... Which version did you test earlier? What are you testing, pgbench? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Actually ... the only way that TransactionIdIsCurrentTransactionId can > take a meaningful amount of time is if you've got lots of > subtransactions, and in that case your own subxids cache has certainly > overflowed, which is likely to force TransactionIdIsInProgress into the > "slow answer" path. But if we use TransactionIdIsCurrentTransactionId > to handle our own xids then we can just ignore MyProc altogether inside > the loop, thus very possibly (if we are the only overflowed-subxids proc) > saving us from going through the slow answer path. So on reflection > I can't see how this isn't a win. I'll clean it up and apply it. Good point. I noted that most callers of TransactionIdIsInProgress in tqual.c already call TransactionIdIsCurrentTransactionId before TransactionIdIsInProgress. In those cases we could just skip the test for our own xids altogether, if it's worth code mangling to tell TransactionIdIsInProgress whether it's safe to skip it or not. > I'm also starting to come around to liking the page-header-xid field > a bit more. I suggest that it could replace the "page is prunable" > flag bit altogether --- to mark the page prunable, you must store > some appropriate xid into the header field. This would avoid a useless > prune attempt immediately after a page is marked prunable. Yeah, we don't need the flag if we have an xid. > A possible problem with it, if we treat it as a non-WAL-logged hint, > is that corruption of the value could lead to a page being marked with > an xid that's way in the future, keeping it from getting pruned. > However, if the page header gets corrupted this is the least of your > worries, so it doesn't seem like an enormous objection. Agreed. Next attempt to prune or vacuum the page will fix it anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Yeah. I played with this a bit more, and came up with a couple of other > micro-optimizations: > 1. Instead of pallocing and pfreeing a new array in > TransactionIdIsInProgress, we could just malloc the array once and reuse > it. That palloc/pfree alone was consuming around 8% of CPU time in the > test case. Good idea --- not only faster, but we can get rid of all the goto's and the "locked" flag, if we're willing to have a couple more LWLockRelease calls here. I'll incorporate this in the patch I'm working up. > We've already checked that the xmin is our own transaction id, so we > check if the xmax is an aborted subtransaction of our own transaction. A > TransactionIdDidAbort call seems like an awfully expensive way to check > that. We could call TransactionIdIsCurrentTransactionId instead, which > doesn't need to access any shared memory structures (but might be > expensive if you have a lot of active subxacts, as you pointed out). I like that idea too ... > 3. One more general alternative to the little patch I sent earlier would > be to build an array of in-progress-xids in TransactionIdInProgress, > like we do in GetSnapshotData, and use that for the in-progress checks > in HeapTupleSatisfiesVacuum. That should work, if we build the cached > array when we lock the page for pruning, and use it until we unlock. This seems like it uglifies a whole lot of APIs, though, for what's probably not going to be a lot of gain. regards, tom lane
On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm also starting to come around to liking the page-header-xid field
a bit more. I suggest that it could replace the "page is prunable"
flag bit altogether --- to mark the page prunable, you must store
some appropriate xid into the header field. This would avoid a useless
prune attempt immediately after a page is marked prunable.
Agree. I was thinking of minimum of the following:
1 xmin of INSERT_IN_PROGRESS tuple
2 xmax of DELETE_IN_PROGRESS tuple
3 xmax of RECENTLY_DEAD_TUPLE
When we attempt a prune, we can set xid to the minimum of the above
three. If none of the above tuples exist in the page, xid can be set to FrozenXid
A page need not be pruned if its xid is set to FrozenXid or is greater
than OldtestXmin.
In addition, I guess we can even set the xid also when we insert,
delete or update from the page unless its already set to a lower value.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > We've already checked that the xmin is our own transaction id, so we > check if the xmax is an aborted subtransaction of our own transaction. A > TransactionIdDidAbort call seems like an awfully expensive way to check > that. We could call TransactionIdIsCurrentTransactionId instead, which > doesn't need to access any shared memory structures (but might be > expensive if you have a lot of active subxacts, as you pointed out). Applied --- seems to buy another 50% savings in the example with all the pruning activity. I'll get after the pruning activity itself in a bit, but it seems worth squeezing the tqual functions first while we have this example that stresses them so much. regards, tom lane
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > I noted that most callers of TransactionIdIsInProgress in tqual.c > already call TransactionIdIsCurrentTransactionId before > TransactionIdIsInProgress. In those cases we could just skip the test > for our own xids altogether, if it's worth code mangling to tell > TransactionIdIsInProgress whether it's safe to skip it or not. At least in this example, it seems it wouldn't buy anything. HeapTupleSatisfiesVacuum doesn't call TransactionIdIsCurrentTransactionId, and while HeapTupleSatisfiesMVCC does, it falls out at that point and doesn't get to TransactionIdIsInProgress. So I'm not inclined to contort the API unless we can come up with an example where it's important. I'm off to work on the page-header-xid idea ... regards, tom lane
On Friday 21 September 2007 13:02, Tom Lane wrote: > I wrote: > > Dunno about "more general", but your idea reduces the runtime of this > > example by about 50% (22.2s to 10.5s) for me. I'm worried though that > > it would be a net negative in more typical situations, especially if > > you've got a lot of open subtransactions. > > Actually ... the only way that TransactionIdIsCurrentTransactionId can > take a meaningful amount of time is if you've got lots of > subtransactions, and in that case your own subxids cache has certainly > overflowed, which is likely to force TransactionIdIsInProgress into the > "slow answer" path. But if we use TransactionIdIsCurrentTransactionId > to handle our own xids then we can just ignore MyProc altogether inside > the loop, thus very possibly (if we are the only overflowed-subxids proc) > saving us from going through the slow answer path. So on reflection > I can't see how this isn't a win. I'll clean it up and apply it. > Just curious, but does this apply to subtransactions that are the result of plpgsql try...exception blocks? And how many is lots? Presumably looping 50k times in a single procedure would be enough? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > Just curious, but does this apply to subtransactions that are the result of > plpgsql try...exception blocks? Only if they changed the database; else they won't have XIDs. regards, tom lane
On 9/21/07, Merlin Moncure <mmoncure@gmail.com> wrote: > Well, my first round of results are so far not showing the big gains I > saw with hot in some of the earlier patches...so far, it looks > approximately to be a wash although with the reduced need to vacuum. let me correct myself here. I did some longer runs and came up with the following results (these runs are still not really long enough, I need to run some more tests). The smaller, cpu bound cases were blowout victories for hot. The largest run (i/o bound) was very close but there were not enough transactions to force a vacuum, after which hot would probably pull away by some degree. one small aside: I am suspicious that 8.3 improvements to the stats collector overhead are going to reap big benefits. This merits some extra investigation, i think. hardware: 2xintel 5160@3ghz, 4 cores total 8gb ram 5x15krpm sas, raid 0 data xfs 5x15krpm sas, raid 0, wal xfs fsync on, asynch commit, partial page writes, autovac on not hot build cvs dated 9/14 hot build cvs dated 9/21 merlin (results follow) ********* without hot: ********* scaling factor: 1 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 2345.225742 (including connections establishing) tps = 2345.264846 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 3038.119776 (including connections establishing) tps = 3038.185492 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 25 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 3671.987348 (including connections establishing) tps = 3672.083077 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 4240.424756 (including connections establishing) tps = 4240.542851 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 1000 number of clients: 8 number of transactions per client: 100000 number of transactions actually processed: 800000/800000 tps = 889.890173 (including connections establishing) tps = 889.905128 (excluding connections establishing) ********* witht hot: ********* transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 3186.553423 (including connections establishing) tps = 3186.622178 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 5123.153371 (including connections establishing) tps = 5123.331343 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 25 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 4994.897937 (including connections establishing) tps = 4995.075480 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 8 number of transactions per client: 250000 number of transactions actually processed: 2000000/2000000 tps = 4675.778153 (including connections establishing) tps = 4675.936096 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 1000 number of clients: 8 number of transactions per client: 100000 number of transactions actually processed: 800000/800000 tps = 893.904762 (including connections establishing) tps = 893.919032 (excluding connections establishing)