Обсуждение: Bogus "Non-functional update" notices

Поиск
Список
Период
Сортировка

Bogus "Non-functional update" notices

От
Tom Lane
Дата:
With fairly current sources (last cvs update on 7/20), I am seeing
occasional occurrences of
    NOTICE:  Non-functional update, only first update is performed
I think this is a bug.  The update commands that are triggering this
message *are* getting executed.  I looked at the sources and couldn't
even understand what condition was being tested to generate the message.
The source code looks like it's trying to disallow more than one update
to the same tuple within a transaction, which is so silly that I have to
be misreading it...

Here is an example trace of my application's interaction with the
server:

// Tuple 134537 is created here:

QUERY: BEGIN TRANSACTION; LOCK marketorderhistory
RESULT: DELETE 0
// several other tuples inserted or updated in this transaction
QUERY: INSERT INTO marketorderhistory (accountID, instrumentID, orderType, numContracts, orderTime, simStatus,
realStatus,sequenceNo, orderPrice, orderDivisor, ifDonePrice) VALUES(5, 62, 'S', 5, '1998-05-20 15:20:00 GMT', 'P',
'-',nextval('marketorderhistory_Seq'), 11969, 100, 11849) 
RESULT: INSERT 134537 1
QUERY: END TRANSACTION; NOTIFY marketorderhistory
RESULT: NOTIFY

//  many transactions later, the app wants to update this tuple:

QUERY: BEGIN TRANSACTION; LOCK marketorderhistory
RESULT: DELETE 0
QUERY: UPDATE marketorderhistory SET completionTime = '1998-05-21 15:20:00 GMT' WHERE oid = 134537::oid AND
completionTimeIS NULL; UPDATE marketorderhistory SET simStatus = 'X', sequenceNo = nextval('marketorderhistory_Seq')
WHEREoid = 134537::oid 
NOTICE:  Non-functional update, only first update is performed
RESULT: UPDATE 1
// a couple other tuples inserted or updated
QUERY: END TRANSACTION; NOTIFY marketorderhistory
RESULT: NOTIFY

External inspection verifies that both updates did take effect.

The thing that's weird is that this only happens occasionally, say about
twice out of every thousand essentially identical updates.  I don't
know enough about the backend innards to have much chance of figuring
out what's going on.  Any ideas?  Is anyone else even seeing this?

            regards, tom lane

Re: [HACKERS] Bogus "Non-functional update" notices

От
"Thomas G. Lockhart"
Дата:
> With fairly current sources (last cvs update on 7/20), I am seeing
> occasional occurrences of
>         NOTICE:  Non-functional update, only first update is performed
> I think this is a bug.  The update commands that are triggering this
> message *are* getting executed.  I looked at the sources and couldn't
> even understand what condition was being tested to generate the
> message.
> The source code looks like it's trying to disallow more than one
> update to the same tuple within a transaction, which is so silly that
> I have to be misreading it...

I recall seeing this in the past when two conditions in an update are
such that the second condition will never be significant. Can't remember
an example, but the regression test has at least one case which provokes
this.

                            - Tom

Re: [HACKERS] Bogus "Non-functional update" notices

От
Tom Lane
Дата:
I wrote:
>> With fairly current sources (last cvs update on 7/20), I am seeing
>> occasional occurrences of
>>    NOTICE:  Non-functional update, only first update is performed

I have been digging into this some more, and I am getting more and more
convinced that there is a significant underlying bug.

What I've discovered is that in the cases where this message appears
(which, again, is only once every few hundred tries) the update scan
is *finding the same tuple twice*.  The second time through, the tuple
has already been marked as deleted by the current command, and it is
this marking that causes heap_replace to emit the "Non-functional
update" warning and return without processing the tuple.

An example trace is

QUERY: BEGIN TRANSACTION; LOCK marketorderhistory
RESULT: DELETE 0
QUERY: UPDATE marketorderhistory SET completionTime = '1998-05-11 20:00:00 GMT' WHERE oid = 34900::oid AND
completionTimeIS NULL 
NOTICE:  heap_replace OID 34900  t_xmin 20270 t_xmax 0  t_cmin 6 t_cmax 0
NOTICE:  heap_replace OID 34900  t_xmin 20270 t_xmax 20496  t_cmin 6 t_cmax 3
NOTICE:  Non-functional update, only first update is performed
NOTICE:  current trans ID 20496  cmd id 3 scan id 3
RESULT: UPDATE 1

(The "NOTICE:  heap_replace" lines are from debug code I added to print
ID info about the tuple found by heap_replace.  This is printed every
time through the routine, just before the non-functional-update test.
The "NOTICE: current trans" line is printed only if the test triggers.)

In this particular situation, the only bad consequence is the display
of a bogus notice message, but it seems to me that having a scan find
the same tuple multiple times is a Very Bad Thing.  (If the test in
heap_replace really is intended to clean up after this condition,
then it ought not be emitting a message.)

I have only seen this happen when the UPDATE was using an index scan to
find the tuples to update (the table in this example has a btree index
on oid).  So, somehow the index is returning the same tuple more than
once.

I have managed to construct a simple, if not quick, test case that
repeatably causes an instance of the bogus message --- it's attached in
the form of a pgTcl script.  The trace (from my backend with extra
printout) looks like

...
NOTICE:  heap_replace OID 87736  t_xmin 113200 t_xmax 0  t_cmin 0 t_cmax 0
NOTICE:  heap_replace OID 87735  t_xmin 113199 t_xmax 0  t_cmin 0 t_cmax 0
NOTICE:  heap_replace OID 87734  t_xmin 113198 t_xmax 0  t_cmin 0 t_cmax 0
NOTICE:  heap_replace OID 87734  t_xmin 113198 t_xmax 113601  t_cmin 0 t_cmax 0
NOTICE:  Non-functional update, only first update is performed
NOTICE:  current trans ID 113601  cmd id 0 scan id 0
NOTICE:  heap_replace OID 87733  t_xmin 113197 t_xmax 0  t_cmin 0 t_cmax 0
NOTICE:  heap_replace OID 87732  t_xmin 113196 t_xmax 0  t_cmin 0 t_cmax 0
...
where the failure occurs at the 200th UPDATE command.

            regards, tom lane

#!/usr/local/pgsql/bin/pgtclsh

set pgconn [pg_connect play]

set res [pg_exec $pgconn \
    "DROP TABLE updatebug"]
pg_result $res -clear

set res [pg_exec $pgconn \
    "CREATE TABLE updatebug (key int4 not null, val int4)"]
pg_result $res -clear

set res [pg_exec $pgconn \
    "CREATE UNIQUE INDEX updatebug_i ON updatebug USING btree(key)"]
pg_result $res -clear

for {set i 0} {$i <= 10000} {incr i} {
    set res [pg_exec $pgconn "INSERT INTO updatebug VALUES($i, NULL)"]
    pg_result $res -clear
}

# Vacuum to ensure that optimizer will decide to use index for updates...
set res [pg_exec $pgconn \
    "VACUUM VERBOSE ANALYZE updatebug"]
pg_result $res -clear

puts "table built..."

for {set i 10000} {$i >= 0} {incr i -1} {
    set res [pg_exec $pgconn \
        "UPDATE updatebug SET val = 1 WHERE key = $i"]
    pg_result $res -clear
}

Re: [HACKERS] Bogus "Non-functional update" notices

От
Vadim Mikheev
Дата:
Tom Lane wrote:
>
> I have only seen this happen when the UPDATE was using an index scan to
> find the tuples to update (the table in this example has a btree index
> on oid).  So, somehow the index is returning the same tuple more than
> once.

In UPDATE backend inserts index tuple for new version of heap tuple
and adjusts all index scans affected by this insertion.
Something is wrong in nbtscan.c:_bt_adjscans()...

Vadim

Re: [HACKERS] Bogus "Non-functional update" notices

От
Tom Lane
Дата:
Vadim Mikheev <vadim@krs.ru> writes:
> In UPDATE backend inserts index tuple for new version of heap tuple
> and adjusts all index scans affected by this insertion.
> Something is wrong in nbtscan.c:_bt_adjscans()...

Could be; maybe there's one boundary case that fails to advance the
index scan?  I hope there's someone who's looked at nbtree recently
who can take the time to debug this.

Another thing that struck me while looking at the update code is that
an update deletes the old tuple value, then inserts the new value,
but it doesn't bother to delete any old index entries pointing at the
old tuple.  ISTM that after a while, there are going to be a lot of old
index entries pointing at dead tuples ... or, perhaps, at *some other*
live tuple, if the space the dead tuple occupied has been reused for
something else.  This certainly seems to present a risk of returning
the wrong tuple.  I looked through the code to find out how such an
error is prevented, and didn't find anything.  But maybe I just don't
know where to look.

            regards, tom lane

Re: [HACKERS] Bogus "Non-functional update" notices

От
Vadim Mikheev
Дата:
Tom Lane wrote:
>
> Vadim Mikheev <vadim@krs.ru> writes:
> > In UPDATE backend inserts index tuple for new version of heap tuple
> > and adjusts all index scans affected by this insertion.
> > Something is wrong in nbtscan.c:_bt_adjscans()...
>
> Could be; maybe there's one boundary case that fails to advance the
> index scan?  I hope there's someone who's looked at nbtree recently
> who can take the time to debug this.

I'll try to look there...

> Another thing that struck me while looking at the update code is that
> an update deletes the old tuple value, then inserts the new value,
> but it doesn't bother to delete any old index entries pointing at the
> old tuple.  ISTM that after a while, there are going to be a lot of old
> index entries pointing at dead tuples ... or, perhaps, at *some other*
> live tuple, if the space the dead tuple occupied has been reused for
> something else.  This certainly seems to present a risk of returning
> the wrong tuple.  I looked through the code to find out how such an
> error is prevented, and didn't find anything.  But maybe I just don't
> know where to look.

Vacuum deletes index tuples before deleting heap ones...

Vadim

Re: [HACKERS] Bogus "Non-functional update" notices

От
Tom Lane
Дата:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> Another thing that struck me while looking at the update code is that
>> an update deletes the old tuple value, then inserts the new value,
>> but it doesn't bother to delete any old index entries pointing at the
>> old tuple.  ISTM that after a while, there are going to be a lot of old
>> index entries pointing at dead tuples ... or, perhaps, at *some other*
>> live tuple, if the space the dead tuple occupied has been reused for
>> something else.

> Vacuum deletes index tuples before deleting heap ones...

Right, but until you've done a vacuum, what's stopping the code from
returning wrong tuples?  I assume this stuff actually works, I just
couldn't see where the dead index entries get rejected.

            regards, tom lane

Re: [HACKERS] Bogus "Non-functional update" notices

От
dg@illustra.com (David Gould)
Дата:
> Vadim Mikheev <vadim@krs.ru> writes:
> > Tom Lane wrote:
> >> Another thing that struck me while looking at the update code is that
> >> an update deletes the old tuple value, then inserts the new value,
> >> but it doesn't bother to delete any old index entries pointing at the
> >> old tuple.  ISTM that after a while, there are going to be a lot of old
> >> index entries pointing at dead tuples ... or, perhaps, at *some other*
> >> live tuple, if the space the dead tuple occupied has been reused for
> >> something else.
>
> > Vacuum deletes index tuples before deleting heap ones...
>
> Right, but until you've done a vacuum, what's stopping the code from
> returning wrong tuples?  I assume this stuff actually works, I just
> couldn't see where the dead index entries get rejected.
>
>             regards, tom lane
>

Without checking the code, I suspect that dead rows are visible though the
index (they had to be to make time travel work), but do not match the time
qual so are not "seen".

-dg


David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - If simplicity worked, the world would be overrun with insects. -

Re: [HACKERS] Bogus "Non-functional update" notices

От
Vadim Mikheev
Дата:
David Gould wrote:
>
> >
> > > Vacuum deletes index tuples before deleting heap ones...
> >
> > Right, but until you've done a vacuum, what's stopping the code from
> > returning wrong tuples?  I assume this stuff actually works, I just
> > couldn't see where the dead index entries get rejected.
> >
>
> Without checking the code, I suspect that dead rows are visible though the
> index (they had to be to make time travel work), but do not match the time
> qual so are not "seen".

Yes. Backend sees that xmax of heap tuple is committed and
don't return tuple...

BTW, I've fixed SUBJ. Scan adjustment didn't work when
index page was splitted. I get rid of ON INSERT adjustment
at all: now backend uses heap tid of current index tuple to
restore current scan position before searching for the
next index tuple. (This will also allow us unlock index
page after we got index tuple and work in heap and so
index readers will not block writers ... when LLL
will be implemented -:).

The bug was more serious than "non-functional update"
when backend read index tuples twice: in some cases
scan didn't return good tuples at all!

drop table bt;
create table bt (x int);
copy bt from '/var/home/postgres/My/Btree/ADJ/UNIQ';
--            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- 1000 records with x in [1,1000]
--
create index bti on bt (x);
update bt set x = x where x <= 200;
update bt set x = x where x > 200 and x <= 210;
--
-- ONLY 4 tuples will be updated by last update!
--

I'll prepare patch for 6.3...

Vadim