Re: INSERT...ON DUPLICATE KEY IGNORE

Поиск
Список
Период
Сортировка
От	Peter Geoghegan
Тема	Re: INSERT...ON DUPLICATE KEY IGNORE
Дата	3 сентября 2013 г. 04:59:50
Msg-id	CAM3SWZT=3nH=xVO9G-zcjW+muwhr9OiGU7Z-G5+D=fzWa9O42g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: INSERT...ON DUPLICATE KEY IGNORE (Andres Freund <andres@2ndquadrant.com>)
Ответы	Re: INSERT...ON DUPLICATE KEY IGNORE
Список	pgsql-hackers
Дерево обсуждения
On Sun, Sep 1, 2013 at 4:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> We're looking for the first duplicate. So it would probably be okay
>> for the IGNORE case to not bother retrying and re-locking if the other
>> transaction committed (which, at least very broadly speaking, is the
>> likely outcome).
>
> Hm. Could you explain this a bit more? Do you mean that we can stop
> trying to insert after the first violation (yes, sure if so)?

Yeah, that's all I mean. Nothing controversial there.

> I can't really agree with this part. First off, a heap_insert()
> certainly isn't lightweight operation. There's several things that can
> take extended time:
> * toasting of rows (writing a gigabyte worth of data...)
> * extension of the heap/toast table
> * writeout of a dirty victim buffer if the page isn't in s_b
> * reading in the target page if the page isn't in s_b
> * waiting for an exclusive lock on the target pages

These are all valid concerns. I'll need to better assess just how bad
this can get (hopefully with help from others). But some of this stuff
can surely be done separately for my case. For example, surely we
could make TOASTing occur after index insertion proper when locks are
released, without too much work.

> Secondly, you seem to be saying that such a lock would only block other
> sessions from finishing of tuple insertions - but it will block normal
> index searches as well? I have a hard time accepting the fact that one
> session using INSERT ... KEY IGNORE will block lookup of a range of values
> from the index in *other* sessions.

I'm sorry that I created the impression that I was denying an impact
on read queries as compared to regular insertion. Of course, regular
insertion also blocks read queries for very small intervals, and there
could be quite a bit of work to be done with that exclusive lock held
already. I am only proposing increasing that period for this case, and
usually by not terribly much.

> I'd expect that with the current implementation strategy if you
> e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
> actually hitting duplicates, you'd see a very, very noticeable hit in
> scalability. I wouldn't expect a single session to get much slower bug
> trying a dozen or two should show a very noticeable dip.

I do not accept the premise of your questions regarding the patch's
performance, which appears to be that it's fair to expect the same
level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to
expect of regular INSERT. No scheme is going to get that, including
your scheme, which necessitates WAL logging everything related to
index tuple insertion twice for each unique index. We certainly didn't
require SSI to perform as well as the current REPEATABLE READ, and
certainly not as well as READ COMMITTED, and for good reasons.

That said, I'm not dismissive of these concerns. As I said already, by
all means, let us scrutinise the performance of the patch.

I'm not going to bother benchmarking the patch myself until I give it
another whack. There is some really obvious low-hanging performance
fruit. Obviously by this I mean doing as little as possible when
holding the exclusive lock.

> I think we could optimize away the second search with some
> cleverness. If we remember & pin the page from the first search we know
> that the promise will have to be either on that page or - after a page
> split - potentially to ones on the right. Which we should be able to find
> by repeatedly stepping right. Similar to logic of doing the
> _bt_moveright() in _bt_doinsert().

That may well be possible, but we'd still have to do a binary search
on the page. And under your proposal, that will happen twice - each
time with an exclusive buffer lock held. Since our intent would be to
update in-place, the entire binary search operation would need a full
write/exclusive lock the second time around as well. You might even
end up doing more work with an exclusive/write lock held than under my
proposal, before you even factor in the other increased costs
(granted, that's not that likely, but you're still doing much more
with exclusive locks held than the regular insertion case).

What you describe here is the kind of trick that could be very useful
to another, existing common cases -- blocking on an xid in the event
of a possible duplicate (as you know, it's okay to hold pins pretty
much indefinitely) -- and yet has remained unimplemented for many
years.

> I am pretty sure it would be better. We know that nbtree works pretty
> damn well under concurrency. And the "promise" logic wouldn't change
> anything of that.

It would have multiple extra steps. Including WAL-logging of the
initial insertion, and clean-up for the IGNORE case. And WAL-logging
of the cleanup. Instead of doing exactly nothing extra in the IGNORE
case, which, under my proposal, has an additional duration of the
exclusive lock being held of zero (instead of the ereport() call we
just immediately release the lock and report the duplicate). The
IGNORE case matters too, and there is going to be considerably more
exclusive locking for that case with your proposal, before we even
talk about other overhead.

> Furthermore, I am pretty sure that locking an index page *and* related
> heap pages exclusively at the same time will be a deadlock
> nightmare. Even in the case of only having a single unique index.

It's seems like what you're saying here is "I don't know, this
extended index locking thing seems pretty sketchy to me". Which is a
perfectly reasonable thing to say in the absence of better analysis
from either of us - it is sketchy. I think about the same of your idea
of creating a whole new category of bloat, and doing all that extra
work, though.

I grant you entirely that:

1) My analysis of the deadlock hazards generally but particular wrt
interactions with heap page locks to date has not been terribly
extensive. I've thought about it, but not in enough detail. It made
more sense for me to test the waters about the basic approach here
first.

2) In order for us to commit this, or a patch implementing any other
similar scheme, there needs to be at the very least a highly
scrutinized, watertight analysis of this and other deadlock hazards.

I have taken a certain amount of reassurance from the fact that I was
not able to observe any deadlocks when pushing the patch hard -
perhaps that's naive. I mean hammering a table with inserts using
pgbench for 3 hours, with 64 INSERT...DUPLICATE KEY IGNORE sessions.
PK values were over a range of values from 1....15 million. This was
with shared_buffers set to 512MB and assertions enabled. It stood up
without any deadlocks or other manifestations of bugs. Granted, I did
that before fixing the bug you've highlighted (but after you pointed
it out), and so I haven't tested the multi-index case in the same way.

>> Now, I guess you probably could make all this work - somehow - but
>> that's even more complexity and even more cycles for vanishingly small
>> gain in one specific area. And frankly I don't think there was much
>> gain to begin with. You're glossing over too many details about what
>> your scheme would need to do to make a fair comparison with mine. It
>> seems like you're really concerned about scalability in the sense of
>> scaling the number of unique indexes on one table when there is lots
>> of concurrent activity on the same range of values across multiple
>> unique indexes.
>
> No, I am primarily concerned of the performance of concurrent inserts
> and concurrent lookups, even in the case of a single unique index. I
> don't think we can disregard multiple unique indexes - they are not all
> that uncommon - but I think we'll see problems before that.

I'm actually more worried about deadlock caused by the heap buffer
locks than anything else. I don't think that locking multiple indexes
in this way is really much of a special case at all. I was annoyed at
myself for having overlooked such a simple bug, but it happened
because my testing of multiple unique indexes was lacking.

> Also, unless we abort with a fatal error, we can relatively easily mark
> all those promises as dead when the insertion doesn't suceed due to
> other indexes failing the uniqueness check (that won't merge pages though).

I suppose that's true. But then, you're getting into more extra steps.
The first step, and the clean-up step. Plus WAL-logging each. Just to
do nothing.

>> It's a lot more complex.
>
> No denying there, although I think your scheme will also grow a good bit
> more complex ;).

I suspect you're right. :-)

> Please don't get me wrong, I'd be very happy if your scheme can be made
> to work, it certainly has advantages. I just don't see how, that's why I
> am explaining the problems I see and what solution *I* can see. But
> then, I unsuprisingly I have thought more about my idea of doing things,
> so I might completely miss some solutions to your idea.

Fair enough.

> Yes, I wondered whether we could sort the indexes to move the ones with
> the highest conflict ratio upfront. But I am afraid that doing so with a
> scheme that involves keeping exlusive locks across multiple other
> operations would result in deadlocks.
> We might want to keep unique violation stats per index to do this
> efficently. I think keeping those would be a good idea anyway.

Why should unique index violation stats be a reliable proxy here? In
the IGNORE case, a violation doesn't occur, so you'd have to track
those separately.

I don't think that there'd be any deadlock hazards if the ordering was
a function of properties of relations that can only be changed by DDL.
Which is all I had in mind.

-- 
Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: INSERT...ON DUPLICATE KEY IGNORE