Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server
Дата
Msg-id 5BA4CFE5.5040502@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Problem while updating a foreign table pointing to apartitioned table on foreign server  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers
(2018/09/18 21:14), Kyotaro HORIGUCHI wrote:
> At Fri, 14 Sep 2018 22:01:39 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5B9BB133.1060107@lab.ntt.co.jp>
>> @@ -126,8 +173,18 @@ get_relation_info(PlannerInfo *root, Oid
>> relationObjectId,\
>>   bool inhparent,
>>                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>                   errmsg("cannot access temporary or unlogged relations during
>>                   r\
>> ecovery")));
>>
>> +   max_attrnum = RelationGetNumberOfAttributes(relation);
>> +
>> + /* Foreign table may have exanded this relation with junk columns */
>> + if (root->simple_rte_array[varno]->relkind == RELKIND_FOREIGN_TABLE)
>> +   {
>> + AttrNumber maxattno = max_varattno(root->parse->targetList, varno);
>> +       if (max_attrnum<  maxattno)
>> +           max_attrnum = maxattno;
>> +   }
>> +
>>      rel->min_attr = FirstLowInvalidHeapAttributeNumber + 1;
>> -   rel->max_attr = RelationGetNumberOfAttributes(relation);
>> +   rel->max_attr = max_attrnum;
>>      rel->reltablespace = RelationGetForm(relation)->reltablespace;
>>
>> This breaks the fundamental assumption that rel->max_attr is equal to
>> RelationGetNumberOfAttributes of that table.  My concern is: this
>> change would probably be a wart, so it would be bug-prone in future
>> versions.
>
> Hmm. I believe that once RelOptInfo is created all attributes
> defined in it is safely accessed. Is it a wrong assumption?

The patch you proposed seems to fix the issue well for the current 
version of PG, but I'm a bit scared to have such an assumption (ie, to 
include columns in a rel's tlist that are not defined anywhere in the 
system catalogs).  In future we might add eg, a lsyscache.c routine for 
some planning use that are given the attr number of a column as an 
argument, like get_attavgwidth, and if so, it would be easily 
conceivable that that routine would error out for such an undefined 
column.  (get_attavgwidth would return 0, not erroring out, though.)

> Actually RelationGetNumberOfAttributes is used in few distinct
> places while planning.

> build_physical_tlist is not used for
> foreign relations.

For UPDATE/DELETE, that function would not be called for a foreign 
target in the posetgres_fdw case, as CTID is requested (see 
use_physical_tlist), but otherwise that function may be called if 
possible.  No?

> If we don't accept the expanded tupdesc for base relations, the
> another way I can find is transforming the foreign relation into
> something another like a subquery, or allowing expansion of
> attribute list of a base relation...

Sorry, I don't understand this fully, but there seems to be the same 
concern as mentioned above.

>> Another thing on the new version:
>>
>> @@ -1575,6 +1632,19 @@ build_physical_tlist(PlannerInfo *root,
>> RelOptInfo *rel)
>>              relation = heap_open(rte->relid, NoLock);
>>
>>              numattrs = RelationGetNumberOfAttributes(relation);
>> +
>> +           /*
>> + * Foreign tables may have expanded with some junk columns. Punt
>> +            * in the case.
> ...
>> I think this would disable the optimization on projection in foreign
>> scans, causing performance regression.
>
> Well, in update/delete cases, create_plan_recurse on foreign scan
> is called with CP_EXACT_TLIST in create_modifytable_plan

That's not necessarily true; consider UPDATE/DELETE on a local join; in 
that case the topmost plan node for a subplan of a ModifyTable would be 
a join, and if that's a NestLoop, create_plan_recurse would call 
create_nestloop_plan, which would recursively call create_plan_recurse 
for its inner/outer subplans with flag=0, not CP_EXACT_TLIST.

> so the
> code path is not actually used.

I think this is true for the postgres_fdw case; because 
use_physical_tlist would decide not to do build_physical_tlist for the 
reason mentioned above.  BUT my question here is: why do we need the 
change to build_physical_tlist?

>>> Since this uses fdw_scan_tlist so it is theoretically
>>> back-patchable back to 9.6.
>>
>> IIRC, the fdw_scan_tlist stuff was introduced in PG9.5 as part of join
>> pushdown infrastructure, so I think your patch can be back-patched to
>> PG9.5, but I don't think that's enough; IIRC, this issue was
>> introduced in PG9.3, so a solution for this should be back-patch-able
>> to PG9.3, I think.
>
> In the previous version, fdw_scan_tlist is used to hold only
> additional (junk) columns. I think that we can get rid of the
> variable by scanning the full tlist for junk columns. Apparently
> it's differnt patch for such versions. I'm not sure how much it
> is invasive for now but will consider.

Sorry, I don't fully understand this.  Could you elaborate a bit more on 
this?

>>> 0001-Add-test-for-postgres_fdw-foreign-parition-update.patch
>>>
>>>    This should fail for unpatched postgres_fdw. (Just for demonstration)
>>
>> +CREATE TABLE p1 (a int, b int);
>> +CREATE TABLE c1 (LIKE p1) INHERITS (p1);
>> +CREATE TABLE c2 (LIKE p1) INHERITS (p1);
>> +CREATE FOREIGN TABLE fp1 (a int, b int)
>> + SERVER loopback OPTIONS (table_name 'p1');
>> +INSERT INTO c1 VALUES (0, 1);
>> +INSERT INTO c2 VALUES (1, 1);
>> +SELECT tableoid::int - (SELECT min(tableoid) FROM fp1)::int AS
>> toiddiff, ctid, * FROM fp1;
>>
>> Does it make sense to evaluate toiddiff?  I think that should always
>> be 0.
>
> Right. it is checking that the values are not those of remote
> table oids.

Sorry, my explanation was not enough, but that seems to me more 
complicated than necessary.  How about just evaluating 
tableoid::regclass, instead of toiddiff?

> =======
>>> 0003-Fix-of-foreign-update-bug-of-PgFDW.patch
>>>
>>>    Fix of postgres_fdw for this problem.
>>
>> Sorry, I have not looked at it closely yet, but before that I'd like
>> to discuss the direction we go in.  I'm not convinced that your
>> approach is the right direction, so as promised, I wrote a patch using
>> the Param-based approach, and compared the two approaches.  Attached
>> is a WIP patch for that, which includes the 0003 patch.  I don't think
>> there would be any warts as discussed above in the Param-based
>> approach for now.  (That approach modifies the planner so that the
>> targetrel's tlist would contain Params as well as Vars/PHVs, so
>> actually, it breaks the planner assumption that a rel's tlist would
>> only include Vars/PHVs, but I don't find any issues on that at least
>> for now.  Will look into that in more detail.)  And I don't think
>> there would be any concern about performance regression, either.
>> Maybe I'm missing something, though.
>>
>> What do you think about that?
>
> Hmm. It is beyond my understanding. Great work (for me)!

I just implemented Tom's idea.  I hope I did that correctly.

> I confirmed that a FOREIGN_PARAM_EXEC is evaluated and stored
> into the parent node. For the mentioned Merge/Sort/ForeignScan
> case, Sort node takes the parameter value via projection. I
> didn't know PARAM_EXEC works that way. I consulted nodeNestLoop
> but not fully understood.
>
> So I think it works.  I still don't think expanded tupledesc is
> not wart but this is smarter than that.  Addition to that, it
> seems back-patchable. I must admit that yours is better.

As mentioned above, I'm a bit scared of the idea that we include columns 
not defined anywhere in the system catalogs in a rel's tlist.  For the 
reason mentioned above, I think we should avoid such a thing, IMO.

>> Note about the attached: I tried to invent a utility for
>> generate_new_param like SS_make_initplan_output_param as mentioned in
>> [1], but since the FDW API doesn't pass PlannerInfo to the FDW, I
>> think the FDW can't call the utility the same way.  Instead, I
>> modified the planner so that 1) the FDW adds Params without setting
>> PARAM_EXEC Param IDs using a new function, and then 2) the core fixes
>> the IDs.
>
> Agreed on not having PlannerInfo. I'll re-study this.  Some
> comments on this right now are the follows.

Thanks for the comments!

> It seems reserving the name remotetableoid, which doen't seem to
> be used by users but not never.

This has also been suggested by Tom [2].

> Maybe paramid space of FOREIGN_PARAM is not necessarily be the
> same with ordinary params that needs signalling aid.

Yeah, but I modified the planner so that it can distinguish one from the 
other; because I think it's better to avoid unneeded SS_finalize_plan 
processing when only generating foreign Params, and/or minimize the cost 
in set_plan_references by only converting foreign Params into simple 
Vars using search_indexed_tlist_for_non_var, which are both expensive.

One thing I noticed is: in any approach, I think use_physical_tlist 
needs to be modified so that it disables doing build_physical_tlist for 
a foreign scan in the case where the FDW added resjunk columns for 
UPDATE/DELETE that are different from user/system columns of the foreign 
table; else such columns would not be emitted from the foreign scan.

Best regards,
Etsuro Fujita

[2] https://www.postgresql.org/message-id/8627.1526591849%40sss.pgh.pa.us


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chris Travers
Дата:
Сообщение: Re: proposal: prefix function
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)