Обсуждение: _bt_check_unique checks every row in table when doing update??
Hi, I'm running 7.2.1 on a sparc/solaris box. I get performance problems when doing updates in one of my tables (inserts are faster), so I tried to run it with profiling enabled. With reservations for my non-existent knowledge of the code, the results do look like the bad performance is because of some bug. When doing ~1000 inserts into the testdata table, gprof says this about _bt_check_unique which looks reasonable: ----------------------------------------------- 0.00 0.01 1002/1002 _bt_doinsert [86] [264] 0.1 0.00 0.01 1002 _bt_check_unique [264] 0.00 0.01 1006/1006 _bt_isequal [271] 0.00 0.00 1002/67941 _bt_binsrch <cycle 1> [4417] 0.00 0.00 6/429 heap_fetch [331] 0.00 0.00 2/857310 ReleaseBuffer [50] ----------------------------------------------- But when doing ~1000 updates (i.e. setting val0 and val1 with a where on an existing key0/key1/key2 triplet), I get this which seems very strange to me: ----------------------------------------------- 0.10 6.43 1002/1002 _bt_doinsert [21] [22] 17.3 0.10 6.43 1002 _bt_check_unique [22] 0.40 3.21 505436/505436 _bt_isequal [26] 0.57 2.22 500509/1000450 heap_fetch [23] 0.00 0.01 4926/27777 _bt_getbuf [157] 0.00 0.00 4926/25330 _bt_relbuf [262] 0.00 0.00 1002/2213707 _bt_binsrch <cycle 1> [4425] 0.00 0.00 1001/1878622 ReleaseBuffer [41] ----------------------------------------------- The schema is as follows: mats=# \d testdata Table "testdata" Column | Type | Modifiers --------+-----------------------+----------- key0 | character varying(32) | not null key1 | character varying(32) | not null key2 | character varying(32) | not null val0 | character varying(64) | not null val1 | text | not null Indexes: testdataval0index Unique keys: testdataindex mats=# \d testdataindex Index "testdataindex" Column | Type --------+----------------------- key0 | character varying(32) key1 | character varying(32) key2 | character varying(32) unique btree mats=# \d testdataval0index Index "testdataval0index" Column | Type --------+----------------------- val0 | character varying(64) btree mats=# select count(*) from testdata; count -------- 435614 (1 row) Note that he number of calls to _bt_isequal is very close to the number of rows in the table plus the number of dead rows. Vacuum analyze and/or recreating the unique index makes no difference as far as I can tell. (The fact that _bt_check_unique is called at all when doing an update seems strange by itself, but shouldn't be a problem if it was just done as efficiently as when doing an insert.) _ Mats Lofkvist mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes: > But when doing ~1000 updates (i.e. setting val0 and val1 with > a where on an existing key0/key1/key2 triplet), I get this which > seems very strange to me: I suppose you repeatedly updated the same row 1000 times? That creates an O(N^2) behavior because the dead tuples have to be rechecked again and again. 7.3 will be smarter about this. regards, tom lane
tgl@sss.pgh.pa.us (Tom Lane) writes: > Mats Lofkvist <mal@algonet.se> writes: > > But when doing ~1000 updates (i.e. setting val0 and val1 with > > a where on an existing key0/key1/key2 triplet), I get this which > > seems very strange to me: > > I suppose you repeatedly updated the same row 1000 times? That creates > an O(N^2) behavior because the dead tuples have to be rechecked again > and again. > > 7.3 will be smarter about this. > > regards, tom lane Yes, I did update the same row so your explanation is correct. (Note that I also read the data wrong, for some reason I read it as 500k checks per 1 update, not per 1000 which made it look a lot worse than it is.) Thanks for the explanation, I'll try 7.3 beta to see how it works there. _ Mats Lofkvist mal@algonet.se
tgl@sss.pgh.pa.us (Tom Lane) writes: > Mats Lofkvist <mal@algonet.se> writes: > > But when doing ~1000 updates (i.e. setting val0 and val1 with > > a where on an existing key0/key1/key2 triplet), I get this which > > seems very strange to me: > > I suppose you repeatedly updated the same row 1000 times? That creates > an O(N^2) behavior because the dead tuples have to be rechecked again > and again. > > 7.3 will be smarter about this. > Seems like I get the same behaviour with 7.3 beta1, updating the same row ~20k times and then 1k times more with profiling enabled (and with no vacuum in between) gives: ----------------------------------------------- 2.72 166.12 1002/1002 _bt_doinsert [17] [18] 53.7 2.72 166.12 1002 _bt_check_unique [18] 15.81 149.01 21721926/21721926 _bt_isequal [19] 0.05 1.00 221414/412979 _bt_getbuf [40] 0.01 0.21 221414/409772 _bt_relbuf [91] 0.01 0.02 2709/6241 heap_fetch [187] 0.00 0.00 5418/2726620 LockBuffer [50] 0.00 0.00 1002/65406369 _bt_binsrch <cycle 1> [270] 0.00 0.00 2709/1333460 ReleaseBuffer [76] 0.00 0.00 2709/4901 HeapTupleSatisfiesVacuum [519] 0.00 0.00 1707/4910 SetBufferCommitInfoNeedsSave [652] ----------------------------------------------- (In my case, I think the call to _bt_check_unique could be avoided altogether since the update isn't changing any of the columns present in the unique key. But doing this optimization maybe is much harder than just trying to avoid checking the dead tuples over and over again?) _ Mats Lofkvist mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes: > tgl@sss.pgh.pa.us (Tom Lane) writes: >> 7.3 will be smarter about this. > Seems like I get the same behaviour with 7.3 beta1, updating > the same row ~20k times and then 1k times more with profiling > enabled (and with no vacuum in between) gives: > 2.72 166.12 1002/1002 _bt_doinsert [17] > [18] 53.7 2.72 166.12 1002 _bt_check_unique [18] > 15.81 149.01 21721926/21721926 _bt_isequal [19] > 0.05 1.00 221414/412979 _bt_getbuf [40] > 0.01 0.21 221414/409772 _bt_relbuf [91] > 0.01 0.02 2709/6241 heap_fetch [187] Yeah, but you'll notice there is no heap_fetch for most of 'em, unlike before... The loop in _bt_check_unique tests _bt_isequal before checking ItemIdDeleted, so the short-circuit for deleted items doesn't serve to reduce the number of key comparisons, only the number of heap tuple fetches. I do not think that reversing this logic would be a net improvement in typical cases: we want to fall out of the loop as soon as we've run off the set of equal keys, whether the current index entry is dead or alive. If we switched the test order then we'd not get out of the loop until we found a live entry that's bigger than the insertion key. > (In my case, I think the call to _bt_check_unique could be > avoided altogether since the update isn't changing any of > the columns present in the unique key. It's fairly difficult for the index AM to know that; in general we don't have access to the old tuple to check, at the time the index update is running. regards, tom lane
tgl@sss.pgh.pa.us (Tom Lane) writes: > Mats Lofkvist <mal@algonet.se> writes: > > tgl@sss.pgh.pa.us (Tom Lane) writes: > >> 7.3 will be smarter about this. > > > Seems like I get the same behaviour with 7.3 beta1, updating > > the same row ~20k times and then 1k times more with profiling > > enabled (and with no vacuum in between) gives: > > > 2.72 166.12 1002/1002 _bt_doinsert [17] > > [18] 53.7 2.72 166.12 1002 _bt_check_unique [18] > > 15.81 149.01 21721926/21721926 _bt_isequal [19] > > 0.05 1.00 221414/412979 _bt_getbuf [40] > > 0.01 0.21 221414/409772 _bt_relbuf [91] > > 0.01 0.02 2709/6241 heap_fetch [187] > > Yeah, but you'll notice there is no heap_fetch for most of 'em, unlike > before... Ah, missed that one. > > The loop in _bt_check_unique tests _bt_isequal before checking > ItemIdDeleted, so the short-circuit for deleted items doesn't serve to > reduce the number of key comparisons, only the number of heap tuple > fetches. I do not think that reversing this logic would be a net > improvement in typical cases: we want to fall out of the loop as soon as > we've run off the set of equal keys, whether the current index entry is > dead or alive. If we switched the test order then we'd not get out of > the loop until we found a live entry that's bigger than the insertion > key. In my case the problem is that my unique index consists of three varchar(32) columns which I suppose is making the comparison rather expensive. But I agree that it is not obvious that changing the logic would be an improvement in general. > > > (In my case, I think the call to _bt_check_unique could be > > avoided altogether since the update isn't changing any of > > the columns present in the unique key. > > It's fairly difficult for the index AM to know that; in general we don't > have access to the old tuple to check, at the time the index update is > running. But you really shouldn't need the old tuple to know this since none of the columns present in the unique index are 'set' by the update? I.e. the 'not changing the unique key part' is not data dependent, it is guarantied by the form of the update statement. Isn't it a very common case to do updates with sets only on non-key columns (only selecting on the key columns) ? > > regards, tom lane _ Mats Lofkvist mal@algonet.se
Mats Lofkvist <mal@algonet.se> writes: > tgl@sss.pgh.pa.us (Tom Lane) writes: >>> (In my case, I think the call to _bt_check_unique could be >>> avoided altogether since the update isn't changing any of >>> the columns present in the unique key. >> >> It's fairly difficult for the index AM to know that; in general we don't >> have access to the old tuple to check, at the time the index update is >> running. > But you really shouldn't need the old tuple to know this since none > of the columns present in the unique index are 'set' by the update? > I.e. the 'not changing the unique key part' is not data dependent, > it is guarantied by the form of the update statement. (a) that's even further upstream from the index AM, and (b) what about BEFORE triggers that change the tuple contents? regards, tom lane
tgl@sss.pgh.pa.us (Tom Lane) writes: > > But you really shouldn't need the old tuple to know this since none > > of the columns present in the unique index are 'set' by the update? > > I.e. the 'not changing the unique key part' is not data dependent, > > it is guarantied by the form of the update statement. > > (a) that's even further upstream from the index AM, and (b) what about > BEFORE triggers that change the tuple contents? > > regards, tom lane Ok, I rest my case. I obviously don't know enough of the postgres internals to suggest a usable solution for this problem. Thanks for your quick answers. _ Mats Lofkvist mal@algonet.se
Tom Lane wrote: > > (In my case, I think the call to _bt_check_unique could be > > avoided altogether since the update isn't changing any of > > the columns present in the unique key. > > It's fairly difficult for the index AM to know that; in general we don't > have access to the old tuple to check, at the time the index update is > running. We do have a TODO item: * Prevent index uniqueness checks when UPDATE does not modifying column -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073