Обсуждение: Arrays broken on temp tables

Поиск
Список
Период
Сортировка

Arrays broken on temp tables

От
Kristofer Munn
Дата:
Greetings.  I recently encountered a problem updating an array element on
a temp table.  Instead of updating just the specified array element, it
copies the array values onto all the tuples.  The command works as
expected with regular tables.


create temp table tmpArray (       id      int4,       val     int4[]
);
CREATE
insert into tmpArray values (1, '{1,2,3,4}');
INSERT 24630506 1
insert into tmpArray values (2, '{4,3,2,1}');
INSERT 24630507 1
insert into tmpArray values (3, '{9,10,11,12}');
INSERT 24630508 1
select * from tmpArray;
id|val
--+------------1|{1,2,3,4}2|{4,3,2,1}3|{9,10,11,12}
(3 rows)

update tmpArray set val[3] = 7;
UPDATE 3
select * from tmpArray;
id|val
--+---------1|{1,2,7,4}2|{1,2,7,4}3|{1,2,7,4}
(3 rows)

drop table tmpArray;
DROP
EOF

- K

Kristofer Munn * KMI * 973-509-9414 * AIM KrMunn * ICQ 352499 * www.munn.com



Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
> Greetings.  I recently encountered a problem updating an array element on
> a temp table.  Instead of updating just the specified array element, it
> copies the array values onto all the tuples.  The command works as
> expected with regular tables.
> 
> 
> create temp table tmpArray (
>         id      int4,
>         val     int4[]
> );
> CREATE
> insert into tmpArray values (1, '{1,2,3,4}');
> INSERT 24630506 1
> insert into tmpArray values (2, '{4,3,2,1}');
> INSERT 24630507 1
> insert into tmpArray values (3, '{9,10,11,12}');
> INSERT 24630508 1
> select * from tmpArray;
> id|val
> --+------------
>  1|{1,2,3,4}
>  2|{4,3,2,1}
>  3|{9,10,11,12}
> (3 rows)
> 
> update tmpArray set val[3] = 7;
> UPDATE 3
> select * from tmpArray;
> id|val
> --+---------
>  1|{1,2,7,4}
>  2|{1,2,7,4}
>  3|{1,2,7,4}
> (3 rows)
> 
> drop table tmpArray;
> DROP
> EOF
> 
> - K

Bug confirmed.  Wow, that is strange.  There isn't anything about temp
table that would suggest this would happen. I will keep the bug report
and try to figure it out in the future.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Bug confirmed.  Wow, that is strange.  There isn't anything about temp
> table that would suggest this would happen.

I see it too.  explain shows something pretty fishy:

regression=> explain update tmpArray set val[3] = 7;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=43043.00 rows=1000000 width=22) ->  Seq Scan on pg_temp.2904.0  (cost=43.00 rows=1000 width=12) ->
SeqScan on tmparray  (cost=43.00 rows=1000 width=10)
 

EXPLAIN

I'm betting that something in the array code is somehow bypassing the
normal table lookup mechanism, and is managing to see the underlying
temp-table name that should be hidden from it.  Will look further...
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
> I'm betting that something in the array code is somehow bypassing the
> normal table lookup mechanism, and is managing to see the underlying
> temp-table name that should be hidden from it.  Will look further...

Yup, here it is, in parse_target.c:
   /*    * If there are subscripts on the target column, prepare an    * array assignment expression.  This will
generatean array value    * that the source value has been inserted into, which can then    * be placed in the new
tupleconstructed by INSERT or UPDATE.    * Note that transformArraySubscripts takes care of type coercion.    */   if
(indirection)  {       Attr       *att = makeNode(Attr);       Node       *arrayBase;       ArrayRef   *aref;
 
       att->relname = pstrdup(RelationGetRelationName(rd)->data);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Next question is what to do about it --- the original table name
doesn't seem to be conveniently available in this routine.  A quick
search for other uses of RelationGetRelationName shows other places
that may have related bugs.  Possibly, temprel.c needs to provide
a reverse-lookup routine that will give back the user name of a table
that might be a temp table?
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
> > I'm betting that something in the array code is somehow bypassing the
> > normal table lookup mechanism, and is managing to see the underlying
> > temp-table name that should be hidden from it.  Will look further...
> 
> Yup, here it is, in parse_target.c:
> 
>     /*
>      * If there are subscripts on the target column, prepare an
>      * array assignment expression.  This will generate an array value
>      * that the source value has been inserted into, which can then
>      * be placed in the new tuple constructed by INSERT or UPDATE.
>      * Note that transformArraySubscripts takes care of type coercion.
>      */
>     if (indirection)
>     {
>         Attr       *att = makeNode(Attr);
>         Node       *arrayBase;
>         ArrayRef   *aref;
> 
>         att->relname = pstrdup(RelationGetRelationName(rd)->data);
>                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Next question is what to do about it --- the original table name
> doesn't seem to be conveniently available in this routine.  A quick
> search for other uses of RelationGetRelationName shows other places
> that may have related bugs.  Possibly, temprel.c needs to provide
> a reverse-lookup routine that will give back the user name of a table
> that might be a temp table?

Well, I now wonder whether I did the right thing in adding temp tables
the way I did.  Is there a better way.  The current code maps to
original name to temp name on opens using the relcache.  That way, the
original name is passed all through the code.  When we print an error
message, we use the user-supplied name, not the temp name.

However, if the code reaches directly into the pg_class tuple and pulls
out the name, it will see the temp name.

Comments?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Well, I now wonder whether I did the right thing in adding temp tables
> the way I did.  Is there a better way.

I don't think there's anything wrong with the basic temp table design.
We've just discovered an oversight: given a Relation entry, there's no
way to get back the original table name, and sometimes you need to.

I'm inclined to think that RelationGetRelationName should be replaced
by two access macros: one to give back the "physical" rel name (same
as the current macro) and one to give back the "logical" name, which'd
be different in the case of a temp table.  We'd need to extend relcache
entries to include the logical name as an additional field.  Then we'd
need to look at all the uses of RelationGetRelationName to see which
ones should be which.  There might be some direct accesses to
rel->rd_rel->relname as well :-( which need to be found and fixed.
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Well, I now wonder whether I did the right thing in adding temp tables
> > the way I did.  Is there a better way.
> 
> I don't think there's anything wrong with the basic temp table design.
> We've just discovered an oversight: given a Relation entry, there's no
> way to get back the original table name, and sometimes you need to.
> 
> I'm inclined to think that RelationGetRelationName should be replaced
> by two access macros: one to give back the "physical" rel name (same
> as the current macro) and one to give back the "logical" name, which'd
> be different in the case of a temp table.  We'd need to extend relcache
> entries to include the logical name as an additional field.  Then we'd
> need to look at all the uses of RelationGetRelationName to see which
> ones should be which.  There might be some direct accesses to
> rel->rd_rel->relname as well :-( which need to be found and fixed.

OK, one more comment.

Because both physical and logical names map to the same oid, in _most_
cases it doesn't matter if RelationGetRelationName returns the physical
name.

Any idea why the physical name causes a problem in this area of the
code?

Also, I believe I replaced most cases of rd_rel->relname with
RelationGetRelationName during one of my cleanups long ago.  Seems I had
not done this case because I see lots of them.  Adding macro call now.

BTW, it is quite easy to add reverse lookup in cache if that will fix
things.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> I don't think there's anything wrong with the basic temp table design.
>> We've just discovered an oversight: given a Relation entry, there's no
>> way to get back the original table name, and sometimes you need to.

> OK, one more comment.
> Because both physical and logical names map to the same oid, in _most_
> cases it doesn't matter if RelationGetRelationName returns the physical
> name.
> Any idea why the physical name causes a problem in this area of the
> code?

The problem is that the rangetable code doesn't realize that the logical
and physical names refer to the same table, so when the
subscript-processing code generates a reference to
<physicaltablename>.<attribute> the parser generates a second RTE for
the physical table name, in addition to the already-existing RTE for the
logical table name.  This causes the planner to generate a join, because
it can see no difference between this situation andFROM tablename, tablename aliasname
which *should* cause a join.  But the join causes each tuple to be
processed multiple times, which is the wrong thing for this case.

There is more than one way we could attack this, but I think the
cleanest answer will be to make it possible to extract a logical
table name from a relcache entry.
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
> The problem is that the rangetable code doesn't realize that the logical
> and physical names refer to the same table, so when the
> subscript-processing code generates a reference to
> <physicaltablename>.<attribute> the parser generates a second RTE for
> the physical table name, in addition to the already-existing RTE for the
> logical table name.  This causes the planner to generate a join, because
> it can see no difference between this situation and
>     FROM tablename, tablename aliasname
> which *should* cause a join.  But the join causes each tuple to be
> processed multiple times, which is the wrong thing for this case.
> 
> There is more than one way we could attack this, but I think the
> cleanest answer will be to make it possible to extract a logical
> table name from a relcache entry.

Well, as I remember, the good news is that our code was fine, and the
original poster just missed the WHERE clause on the update.  So I guess
that gets us off the hook for a while.

However, now looking at the posting again:
http://www.postgresql.org/mhonarc/pgsql-hackers/1999-11/msg00213.html

I am confused again.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> There is more than one way we could attack this, but I think the
>> cleanest answer will be to make it possible to extract a logical
>> table name from a relcache entry.

> Well, as I remember, the good news is that our code was fine, and the
> original poster just missed the WHERE clause on the update.  So I guess
> that gets us off the hook for a while.
> However, now looking at the posting again:
>     http://www.postgresql.org/mhonarc/pgsql-hackers/1999-11/msg00213.html
> I am confused again.

No, our code is *not* OK.  It's true that the original example was given
without a WHERE clause, whereas a practical UPDATE would usually have a
WHERE clause; but that has nothing to do with whether the planner will
generate a join or not.  If a join is done then the wrong things will
happen, WHERE or no WHERE.

The bottom line here is that we mustn't generate separate RTEs for the
logical and physical table names.
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
> No, our code is *not* OK.  It's true that the original example was given
> without a WHERE clause, whereas a practical UPDATE would usually have a
> WHERE clause; but that has nothing to do with whether the planner will
> generate a join or not.  If a join is done then the wrong things will
> happen, WHERE or no WHERE.
> 
> The bottom line here is that we mustn't generate separate RTEs for the
> logical and physical table names.

Are you saying a join on a temp table will not work?  Can you give an
example?


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> The bottom line here is that we mustn't generate separate RTEs for the
>> logical and physical table names.

> Are you saying a join on a temp table will not work?

Not at all; I'm saying that it's incorrect to generate a join for a
simple UPDATE.  What we had was
UPDATE table SET arrayfield[sub] = val;

which is really implemented as (more or less)
UPDATE table SET arrayfield = ARRAYINSERT(arrayfield, sub, val);

which works fine as long as you apply the computation and update once
per tuple in the table (or once per tuple selected by WHERE, if there
is one).  But for a temp table, what really gets emitted from the
parser is effectively like
UPDATE logtable SET arrayfield = arrayinsert(phytable.field,                                             sub, val)FROM
logtablephytable;
 

This is a Cartesian join, meaning that each tuple in
logtable-as-destination will be processed in combination with each tuple
in logtable-as-phytable.  The particular case Kristofer reported
implements the join as a nested loop with logtable-as-destination as the
inner side of the join.  So, each target tuple gets updated once with
an arrayfield value computed off each available source tuple --- and
when the dust settles, they've all got the value computed from the last
source tuple.  That's why they're all the same in his bug report.

Adding a WHERE clause limits the damage, but the target tuples will all
still get the same value, if I'm visualizing the behavior correctly.
It's the wrong thing in any case; the very best you could hope for is 
that the tuples all manage to get the right values after far more
processing than necessary.  There should be no join for a simple UPDATE.
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> The bottom line here is that we mustn't generate separate RTEs for the
> >> logical and physical table names.
> 
> > Are you saying a join on a temp table will not work?
> 
> Not at all; I'm saying that it's incorrect to generate a join for a
> simple UPDATE.  What we had was
> 
>     UPDATE table SET arrayfield[sub] = val;
> 
> which is really implemented as (more or less)
> 
>     UPDATE table SET arrayfield = ARRAYINSERT(arrayfield, sub, val);
> 
> which works fine as long as you apply the computation and update once
> per tuple in the table (or once per tuple selected by WHERE, if there
> is one).  But for a temp table, what really gets emitted from the
> parser is effectively like
> 
>     UPDATE logtable SET arrayfield = arrayinsert(phytable.field,
>                                                  sub, val)
>     FROM logtable phytable;
> 
> This is a Cartesian join, meaning that each tuple in
> logtable-as-destination will be processed in combination with each tuple
> in logtable-as-phytable.  The particular case Kristofer reported
> implements the join as a nested loop with logtable-as-destination as the
> inner side of the join.  So, each target tuple gets updated once with
> an arrayfield value computed off each available source tuple --- and
> when the dust settles, they've all got the value computed from the last
> source tuple.  That's why they're all the same in his bug report.
> 
> Adding a WHERE clause limits the damage, but the target tuples will all
> still get the same value, if I'm visualizing the behavior correctly.
> It's the wrong thing in any case; the very best you could hope for is 
> that the tuples all manage to get the right values after far more
> processing than necessary.  There should be no join for a simple UPDATE.

OK, I see it now.  They are assigning the relname at this point using
the in-tuple relname, which is the physical name, not the logical name.

If I look at all calls to RelationGetRelationName(), I can see several
problem cases where the code it assigning the rel/refname based on the
in-tuple name.

Ideas?  Should i add reverse-lookup code in temprel.c, and make the
lookups happen for those cases?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Arrays broken on temp tables

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> But for a temp table, what really gets emitted from the
>> parser is effectively like
>> 
>> UPDATE logtable SET arrayfield = arrayinsert(phytable.field,
>> sub, val)
>> FROM logtable phytable;
>> 
> OK, I see it now.  They are assigning the relname at this point using
> the in-tuple relname, which is the physical name, not the logical name.

Right, the array-element-update code needs to generate a reference
using the logical name.

> If I look at all calls to RelationGetRelationName(), I can see several
> problem cases where the code it assigning the rel/refname based on the
> in-tuple name.

I suspected as much, but I haven't grovelled through the calls in
detail.  Some of them probably really do want the physical name,
while others need the logical name.

> Ideas?  Should i add reverse-lookup code in temprel.c, and make the
> lookups happen for those cases?

We could do it that way, but as the code stands, relcache.c is
responsible for the forward lookup (you just pass a rel name to
heap_openr without worrying if it is a temp rel name or not).
So I think relcache.c ought to provide a function or macro to
go the other way: produce a logical relname from a Relation pointer.

Whether that's implemented by copying the originally given relname
into the relcache entry, or by asking temprel.c each time, is purely
a local optimization inside relcache.c --- it's a straight speed-for-
space tradeoff.  Before choosing, we should look at the uses of
RelationGetRelationName() to see if any of them that need to be
fetching the logical name are in performance-critical paths.
        regards, tom lane


Re: [HACKERS] Arrays broken on temp tables

От
Bruce Momjian
Дата:
I believe this is fixed was fixed by my RelationGetRelationName and
RelationGetPhysicalRelationName.


> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> The bottom line here is that we mustn't generate separate RTEs for the
> >> logical and physical table names.
> 
> > Are you saying a join on a temp table will not work?
> 
> Not at all; I'm saying that it's incorrect to generate a join for a
> simple UPDATE.  What we had was
> 
>     UPDATE table SET arrayfield[sub] = val;
> 
> which is really implemented as (more or less)
> 
>     UPDATE table SET arrayfield = ARRAYINSERT(arrayfield, sub, val);
> 
> which works fine as long as you apply the computation and update once
> per tuple in the table (or once per tuple selected by WHERE, if there
> is one).  But for a temp table, what really gets emitted from the
> parser is effectively like
> 
>     UPDATE logtable SET arrayfield = arrayinsert(phytable.field,
>                                                  sub, val)
>     FROM logtable phytable;
> 
> This is a Cartesian join, meaning that each tuple in
> logtable-as-destination will be processed in combination with each tuple
> in logtable-as-phytable.  The particular case Kristofer reported
> implements the join as a nested loop with logtable-as-destination as the
> inner side of the join.  So, each target tuple gets updated once with
> an arrayfield value computed off each available source tuple --- and
> when the dust settles, they've all got the value computed from the last
> source tuple.  That's why they're all the same in his bug report.
> 
> Adding a WHERE clause limits the damage, but the target tuples will all
> still get the same value, if I'm visualizing the behavior correctly.
> It's the wrong thing in any case; the very best you could hope for is 
> that the tuples all manage to get the right values after far more
> processing than necessary.  There should be no join for a simple UPDATE.
> 
>             regards, tom lane
> 
> ************
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026