Re: [HACKERS] Surjective functional indexes

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Surjective functional indexes
Дата
Msg-id 20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Surjective functional indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Surjective functional indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2018-11-07 14:25:54 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, May 11, 2018 at 11:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>>> I have no problem if you want to replace this with an even better
> >>>> design in a later release.
> 
> >>> Meh. The author / committer should get a patch into the right shape
> 
> >> They have done, at length. Claiming otherwise is just trash talk.
> 
> > Some people might call it "honest disagreement".
> 
> So where we are today is that this patch was lobotomized by commits
> 77366d90f/d06fe6ce2 as a result of this bug report:
> 
> https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql
> 
> We need to move forward, either by undertaking a more extensive
> clean-out, or by finding a path to a version of the code that is
> satisfactory.  I wanted to enumerate my concerns while yesterday's
> events are still fresh in mind.  (Andres or Robert might have more.)
> 
> * I do not understand why this feature is on-by-default in the first
> place.  It can only be a win for expression indexes that are many-to-one
> mappings; for indexes that are one-to-one or few-to-one, it's a pretty
> big loss.  I see no reason to assume that most expression indexes fall
> into the first category.  I suggest that the design ought to be to use
> this optimization only for indexes for which the user has explicitly
> enabled recheck_on_update.  That would allow getting rid of the cost check
> in IsProjectionFunctionalIndex, about which we'd otherwise have to have
> an additional fight: I do not like its ad-hoc-ness, nor the modularity
> violation (and potential circularity) involved in having the relcache call
> cost_qual_eval.
> 
> * Having heapam.c calling the executor also seems like a
> modularity/layering violation that will bite us in the future.
> 
> * The implementation seems incredibly inefficient.  ProjIndexIsUnchanged
> is doing creation/destruction of an EState, index_open/index_close
> (including acquisition of a lock we should already have), BuildIndexInfo,
> and expression compilation, all of which are completely redundant with
> work done elsewhere in the executor.  And it's doing that *over again for
> every tuple*, which totally aside from wasted cycles probably means there
> are large query-lifespan memory leaks in an UPDATE affecting many rows.
> I think a minimum expectation should be that one-time work is done only
> one time; but ideally none of those things would happen at all because
> we could share work with the regular code path.  Perhaps it's too much
> to hope that we could also avoid duplicate computation of the new index
> expression values, but as long as I'm listing complaints ...
> 
> * As noted in the bug thread, the implementation of the new reloption
> is broken because (a) it fails to work for some built-in index AMs
> and (b) by design, it can't work for add-on AMs.  I agree with Andres
> that that's not acceptable.
> 
> * This seems like bad data structure design:
> 
> +   Bitmapset  *rd_projidx;     /* Oids of projection indexes */
> 
> That comment is a lie, although IMO it'd be better if it were true;
> a list of target index OIDs would be a far better design here.  The use
> of rd_projidx as a set of indexes into the relation's indexlist is
> inefficient and overcomplicated, plus it creates an unnecessary and not
> very safe (even if it were documented) coupling between rd_indexlist and
> rd_projidx.
> 
> * Having ProjIndexIsUnchanged examine relation->rd_projidx directly is
> broken by design anyway, both from a modularity standpoint and because
> its inner loop involves catalog accesses that could result in relcache
> flushes.  I'm somewhat amazed that the regression tests passed on
> CLOBBER_CACHE_ALWAYS buildfarm animals, although possibly this is
> explained by the fact that they're only testing cases with a single
> expression index, so that the bitmap isn't checked again after the cache
> flush happens.  Again, this could be fixed if what was returned was just
> a list of relevant index OIDs.
> 
> * This bit of coding is unsafe, for the reason explained in the existing
> comment:
> 
>      /*
>       * Now save copies of the bitmaps in the relcache entry.  We intentionally
>       * set rd_indexattr last, because that's the one that signals validity of
>       * the values; if we run out of memory before making that copy, we won't
>       * leave the relcache entry looking like the other ones are valid but
>       * empty.
>       */
>      oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>      relation->rd_keyattr = bms_copy(uindexattrs);
>      relation->rd_pkattr = bms_copy(pkindexattrs);
>      relation->rd_idattr = bms_copy(idindexattrs);
>      relation->rd_indexattr = bms_copy(indexattrs);
> +    relation->rd_projindexattr = bms_copy(projindexattrs);
> +    relation->rd_projidx = bms_copy(projindexes);
>      MemoryContextSwitchTo(oldcxt);
> 
> * There's a lot of other inattention to comments.  For example, I noticed
> that this patch added new responsibilities to RelationGetIndexList without
> updating its header comment to mention them.
> 
> * There's a lack of attention to terminology, too.  I do not think that
> "projection index" is an appropriate term at all, nor do I think that
> "recheck_on_update" is a particularly helpful option name, though we
> may be stuck with the latter at this point.
> 
> * I also got annoyed by minor sloppiness like not adding the new
> regression test in the same place in serial_schedule and
> parallel_schedule.  The whole thing needed more careful review
> than it got.
> 
> In short, it seems likely to me that large parts of this patch need to
> be pulled out, rewritten, and then put back in different places than
> they are today.  I'm not sure if a complete revert is the best next
> step, or if we can make progress without that.

We've not really made progress on this. I continue to think that we
ought to revert this feature, and then work to re-merge it an
architecturally correct way afterwards.  Other opinions?

Greetings,

Andres Freund


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: unique, partitioned index fails to distinguish index key fromINCLUDEd columns
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives