Re: [HACKERS] Surjective functional indexes
От | Andres Freund |
---|---|
Тема | Re: [HACKERS] Surjective functional indexes |
Дата | |
Msg-id | 20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: [HACKERS] Surjective functional indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Surjective functional indexes
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On 2018-11-07 14:25:54 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, May 11, 2018 at 11:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >>>> I have no problem if you want to replace this with an even better > >>>> design in a later release. > > >>> Meh. The author / committer should get a patch into the right shape > > >> They have done, at length. Claiming otherwise is just trash talk. > > > Some people might call it "honest disagreement". > > So where we are today is that this patch was lobotomized by commits > 77366d90f/d06fe6ce2 as a result of this bug report: > > https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql > > We need to move forward, either by undertaking a more extensive > clean-out, or by finding a path to a version of the code that is > satisfactory. I wanted to enumerate my concerns while yesterday's > events are still fresh in mind. (Andres or Robert might have more.) > > * I do not understand why this feature is on-by-default in the first > place. It can only be a win for expression indexes that are many-to-one > mappings; for indexes that are one-to-one or few-to-one, it's a pretty > big loss. I see no reason to assume that most expression indexes fall > into the first category. I suggest that the design ought to be to use > this optimization only for indexes for which the user has explicitly > enabled recheck_on_update. That would allow getting rid of the cost check > in IsProjectionFunctionalIndex, about which we'd otherwise have to have > an additional fight: I do not like its ad-hoc-ness, nor the modularity > violation (and potential circularity) involved in having the relcache call > cost_qual_eval. > > * Having heapam.c calling the executor also seems like a > modularity/layering violation that will bite us in the future. > > * The implementation seems incredibly inefficient. ProjIndexIsUnchanged > is doing creation/destruction of an EState, index_open/index_close > (including acquisition of a lock we should already have), BuildIndexInfo, > and expression compilation, all of which are completely redundant with > work done elsewhere in the executor. And it's doing that *over again for > every tuple*, which totally aside from wasted cycles probably means there > are large query-lifespan memory leaks in an UPDATE affecting many rows. > I think a minimum expectation should be that one-time work is done only > one time; but ideally none of those things would happen at all because > we could share work with the regular code path. Perhaps it's too much > to hope that we could also avoid duplicate computation of the new index > expression values, but as long as I'm listing complaints ... > > * As noted in the bug thread, the implementation of the new reloption > is broken because (a) it fails to work for some built-in index AMs > and (b) by design, it can't work for add-on AMs. I agree with Andres > that that's not acceptable. > > * This seems like bad data structure design: > > + Bitmapset *rd_projidx; /* Oids of projection indexes */ > > That comment is a lie, although IMO it'd be better if it were true; > a list of target index OIDs would be a far better design here. The use > of rd_projidx as a set of indexes into the relation's indexlist is > inefficient and overcomplicated, plus it creates an unnecessary and not > very safe (even if it were documented) coupling between rd_indexlist and > rd_projidx. > > * Having ProjIndexIsUnchanged examine relation->rd_projidx directly is > broken by design anyway, both from a modularity standpoint and because > its inner loop involves catalog accesses that could result in relcache > flushes. I'm somewhat amazed that the regression tests passed on > CLOBBER_CACHE_ALWAYS buildfarm animals, although possibly this is > explained by the fact that they're only testing cases with a single > expression index, so that the bitmap isn't checked again after the cache > flush happens. Again, this could be fixed if what was returned was just > a list of relevant index OIDs. > > * This bit of coding is unsafe, for the reason explained in the existing > comment: > > /* > * Now save copies of the bitmaps in the relcache entry. We intentionally > * set rd_indexattr last, because that's the one that signals validity of > * the values; if we run out of memory before making that copy, we won't > * leave the relcache entry looking like the other ones are valid but > * empty. > */ > oldcxt = MemoryContextSwitchTo(CacheMemoryContext); > relation->rd_keyattr = bms_copy(uindexattrs); > relation->rd_pkattr = bms_copy(pkindexattrs); > relation->rd_idattr = bms_copy(idindexattrs); > relation->rd_indexattr = bms_copy(indexattrs); > + relation->rd_projindexattr = bms_copy(projindexattrs); > + relation->rd_projidx = bms_copy(projindexes); > MemoryContextSwitchTo(oldcxt); > > * There's a lot of other inattention to comments. For example, I noticed > that this patch added new responsibilities to RelationGetIndexList without > updating its header comment to mention them. > > * There's a lack of attention to terminology, too. I do not think that > "projection index" is an appropriate term at all, nor do I think that > "recheck_on_update" is a particularly helpful option name, though we > may be stuck with the latter at this point. > > * I also got annoyed by minor sloppiness like not adding the new > regression test in the same place in serial_schedule and > parallel_schedule. The whole thing needed more careful review > than it got. > > In short, it seems likely to me that large parts of this patch need to > be pulled out, rewritten, and then put back in different places than > they are today. I'm not sure if a complete revert is the best next > step, or if we can make progress without that. We've not really made progress on this. I continue to think that we ought to revert this feature, and then work to re-merge it an architecturally correct way afterwards. Other opinions? Greetings, Andres Freund
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Alvaro HerreraДата:
Сообщение: Re: unique, partitioned index fails to distinguish index key fromINCLUDEd columns
Следующее
От: Peter GeogheganДата:
Сообщение: Re: Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives