Re: Oddity in handling of cached plans for FDW queries

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Oddity in handling of cached plans for FDW queries
Дата
Msg-id CAFjFpRf94+DJOr8pOLUQYgmmOxjonGPhrU4uvo_00BKCZ-fGug@mail.gmail.com
обсуждение исходный текст
Ответ на Oddity in handling of cached plans for FDW queries  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: Oddity in handling of cached plans for FDW queries  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers




Seems odd to me.  Both relations use the same user mapping as before, so the join should be pushed down.

Another thing I noticed is that the code in plancache.c would invalidate a cached plan with a foreign join due to user mapping changes even in the case where user mappings are meaningless to the FDW.

To fix the first, I'd like to propose (1) replacing the existing has_foreign_join flag in the CachedPlan data structure with a new flag, say uses_user_mapping, that indicates whether a cached plan uses any user mapping regardless of whether the cached plan has foreign joins or not (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2) invalidating the cached plan if the uses_user_mapping flag is true.  I thought that we invalidate the cached plan if the flag is true and there is a possibility of performing foreign joins, but it seems to me that updates on the corresponding catalog are infrequent enough that such a detailed tracking is not worth the effort.

That way we will have plan cache invalidations even when simple foreign tables scans (not join) are involved, which means all the plans involving any reference to a foreign table with valid user mapping associated with it. That can be a huge cost as compared to the current solution where sub-optimal plan will be used only when a user mapping is changed while a statement has been prepared. That's a rare scenario and somebody can work around that by preparing the statement again. IIRC, we had discussed this situation when implementing the cache invalidation logic. But there's no workaround for your solution.
 
One benefit of using the proposed approach is that we could make the FDW's handling of user mappings in BeginForeignScan more efficient; currently, there is additional overhead caused by catalog re-lookups to obtain the user mapping information for the simple-foreign-table-scan case where user mappings mean something to the FDW as in postgres_fdw (probably, updates on the catalogs are infrequent, though), but we could improve the efficiency by using the validated user mapping information created at planning time for that case as well as for the foreign-join case.  For that, I'd like to propose storing the validated user mapping information into ForeignScan, not fdw_private.

postgres_fdw to fetches user mapping in some cases but never remembers it. If what you are describing is a better way, it should have been implemented before join pushdown was implemented. Refetching a user mapping is not that expensive given that there is a high chance that it will be found in the syscache, because it was accessed at the time of planning. Effect of plan cache invalidation is probably worse than fetching the value from a sys cache again.
 
There is a discussion about using an ExtensibleNode [1] for that, but the proposed approach would make the FDW's job much simpler.

I don't think the above change is sufficient to fix the second.  The root reason for that is that since currently, we allow the user mapping OID (rel->umid) to be InvalidOid in two cases: (1) user mappings mean something to the FDW but it can't get any user mapping at planning time and (2) user mappings are meaningless to the FDW, we cannot distinguish these two cases.

The way to differentiate between these two is to look at the serverid. If server id is invalid it's the case 1, otherwise 2. For a simple table, there will always be a serverid associated with it. A user mapping will always be associated with a simple table if there is one i.e. FDW requires it. Albeit, there will be a case when FDW requires a user mapping but it's not created, in which case the table will be useless for fetching remote data anyway. I don't think we should be programming for that case.
 
So, I'd like to introduce a new callback routine to specify that user mappings mean something to the FDW as proposed by Tom [2], and use that to reject the former case, which allows us to set the above uses_user_mapping flag appropriately, ie, set the flag to true only if user mapping changes require forcing a replan.  This would change the postgres_fdw's behavior that it allows to prepare statements involving foreign tables without any user mappings as long as those aren't required at planning time, but I'm not sure that it's a good idea to keep that behavior because ISTM that such a behavior would make people sloppy about creating user mappings, which could lead to latent security problems [2].

Attached is a proposed patch for that.

 
This routine is meaningless unless the core (or FDW) does not allow a user mapping to be created for such FDWs. Without that, core code would get confused as to what it should do when it sees a user mapping for an FDW which says user mappings are meaningless. If we do that, we might not set hasForeignJoin flag in create_foreignscan_plan() when the user mapping for pushed down join is invalid. That will keep FDWs which do not use user mappings out of plan cache invalidation.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - compute & show latency consistently
Следующее
От: Amit Langote
Дата:
Сообщение: Constraint merge and not valid status