Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

Поиск
Список
Период
Сортировка
От	Peter Geoghegan
Тема	Re: INSERT ... ON CONFLICT {UPDATE \| IGNORE}
Дата	26 октября 2014 г. 20:39:18
Msg-id	CAM3SWZQnKLC9iMdx1FkOLC6ZWHhANnMnWNtGUPJd0naUDGGpYg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: INSERT ... ON CONFLICT {UPDATE \| IGNORE} (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: INSERT ... ON CONFLICT {UPDATE \| IGNORE}
Список	pgsql-hackers
Дерево обсуждения
On Sat, Oct 25, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Generating index paths for the UPDATE is a waste of cycles.
>> Theoretically, there could be an (a, b, c) unique index and a (c,b,a)
>> unique index, and those two might have a non-equal cost to scan. But
>> that almost certainly isn't going to happen in practice, since that's
>> a rather questionable indexing strategy, and even when it does, you're
>> going to have to insert into all the unique indexes a good proportion
>> of the time anyway, making the benefits of that approach pale in
>> comparison to the costs.
>
> I don't care whether you actually generate index-paths or not, and in
> fact I suspect it makes no sense to do so.  But I do care that you do
> a cost comparison between the available indexes and pick the one that
> looks cheapest.  If somebody's got a bloated index and builds a new
> one, they will want it to get used even before they drop the old one.

That seems extremely narrow. I am about ready to just do what you say,
by costing the index based on something like relpages, but for the
record I see no point. If I see no point, and you're sure that there
is a point, then there is a danger that I'll miss the point, which
contributes to my giving push back. I know I said that already, but I
reiterate it once more for emphasis.

> Even if that weren't an issue, though, the fact that you can't
> re-parse but you can re-plan is a killer point AFAICS. It means you
> are borked if the statement gets re-executed after the index you
> picked is dropped.

That simply isn't the case. As specifically noted in comments in the
patch, relcache invalidation works in such a way as to invalidate the
query proper, even when only an index has been dropped. For a time
during development, the semantic dependence was more directly
represented by actually having extract_query_dependencies() extract
the arbitrating unique index pg_class Oid as a dependency for the
benefit of the plan cache, but I ultimately decided against doing
anything more than noting the potential future problem (the problem
that arises in a world where relcache invalidation is more selective).

But, I digress: I'll have inference occur during planning during the
next revision since you think that's important.

>> From my point of view, I spent a significant amount of time making the
>> patch more or less match your proposed design for unique index
>> inference. It is discouraging to hear that you think I'm not
>> cooperating with community process. I'm doing my best. I think it
>> would be a bad idea for me to not engage with the community for an
>> extended period at this point. There were plenty of other issues
>> address by V1.3 that were not the CONFLICTING()/EXCLUDING thing that
>> you highlighted (or the other thing you highlighted around where to do
>> unique index inference).
>
> I think this gets at a point that has been bugging me and, perhaps,
> other people here.  You often post a new patch with some fixes but
> without fixes for the issues that reviewers have indicated are
> top-of-mind for them.  Sometimes, but not always, you say something
> like "I know this doesn't fix X but I'd like comments on other aspects
> of the patch".  Even when you do, though, it can be difficult to
> overlook something that appears to be a fundamental structural problem
> to comment on details, and sometimes it feels like that's what we're
> being asked to do.

I think that it's accurate to say that I asked the community to do
that once, last year. I was even very explicit about it at the time.
I'm surprised that you think that's the case now, though. I learned my
lesson a little later: don't do that, because fellow contributors are
unwilling to go along with it, even for something as important as
this.

As recently as a few months ago, you wanted me to go a *completely*
different direction with this (i.e. attempt an UPDATE first). I'm
surprised that you're emphasizing the EXCLUDED()/CONFLICTING() thing
so much. Should I take it that I more or less have your confidence
that the way the patch fits together at a high level is sound?
Because, that's the only way I can imagine you'd think that the
EXCLUDED()/CONFLICTING() thing is of such fundamental importance at
this point. It is after all split out as a much smaller commit on top
of the main implementation (a commit which could easily be deferred).
While it's important, it is very clearly subsidiary to the overall
structure of my design, which I think that there has been precious
little discussion of on this thread (e.g. the whole way I use
EvalPlanQual(), the general idea of a special auxiliary plan that is
executed in a special way, etc). That concerns me, because I suspect
that the reason there has been next to no discussion of those aspects
is because they're particularly hard things to have an opinion on, and
yet are the things of immediate importance. Please correct me if I
have it wrong.

I am in no position to demand that you or anyone else discuss any
particular aspect of the patch, of course. I am just conveying my
concerns. Like all of us, I very much want to get this feature into
the next release of PostgreSQL.

> When I'm reviewing, I tend to find issues more or
> less in proportion to the time I spend on the patch.  If things that I
> complained about before haven't been fixed, I tend to find the same
> ones again, but not necessarily all that much faster than I found them
> the first time.  So that's not efficient for me.  I would not make an
> absolute rule that no patch should ever be re-posted without
> addressing every issue; I wouldn't be able to follow that rule in
> every case myself.  But I try to follow it as often as I can, and I
> would suggest that you try to lean quite a bit more firmly in that
> direction.  I think you will make more progress, and spend less time
> arguing with others.

I'll try. But let me point out that the *very first* thing you
complained about, in relation to version 1.0, was that the plan
structure was funny. V1.3 cleaned that up, making a "sequential scan"
always occur, and having EXPLAIN treat that as a strict implementation
detail, while still attributing some aspects of the hidden, unexecuted
"sequential scan" (e.g. the qual) to its parent where that makes
sense. This seems like something that squarely addressed your
*original* concern. A little later, you (rightly) complained about the
lack of support for inheritance, and to a lesser extent updatable
views. As of V1.3, I have reasonable support for both of those things.
And, of course, you wanted a unique index to be inferred from a set of
columns/expressions, which (most notably) v1.3 now does. Also, I
incorporated feedback from Kevin on some of those same items, as well
as the behavior of statement-level triggers. Furthermore, I
incorporated the feedback of Simon in having way more tests, in
particular, two new isolation tests, one of which illustrates the
qualitatively new "MVCC violation" in more detail.

In short, I think I've done a pretty good job of incorporating
feedback, and where I haven't I have been quite clear about it (it
certainly didn't take you long to figure it out in this most recent
instance). There is always room for improvement, but in my book V1.3
made a lot more than small, incremental progress. I am surprised by
your remarks suggesting that the progress of each revision is
excessively gradual in addressing your concerns. AFAICT, it's just
that V1.3 wasn't totally comprehensive in doing so. In reality, the
main reason for that is: getting this feature to a point where it
might plausibly be committed is bloody difficult, as evidenced by the
fact that it took this long for someone to produce a patch. Please
don't lose sight of that.

-- 
Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}