Re: INSERT...ON DUPLICATE KEY IGNORE

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: INSERT...ON DUPLICATE KEY IGNORE
Дата
Msg-id 20130903132654.GC5783@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: INSERT...ON DUPLICATE KEY IGNORE  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 2013-09-02 21:59:42 -0700, Peter Geoghegan wrote:
> > I can't really agree with this part. First off, a heap_insert()
> > certainly isn't lightweight operation. There's several things that can
> > take extended time:
> > * toasting of rows (writing a gigabyte worth of data...)
> > * extension of the heap/toast table
> > * writeout of a dirty victim buffer if the page isn't in s_b
> > * reading in the target page if the page isn't in s_b
> > * waiting for an exclusive lock on the target pages
>
> These are all valid concerns. I'll need to better assess just how bad
> this can get (hopefully with help from others). But some of this stuff
> can surely be done separately for my case. For example, surely we
> could make TOASTing occur after index insertion proper when locks are
> released, without too much work.

I don't think those are that easy to solve, but I'd like to be surprised.

E.g. the toasting cannot really happen after the main table insert since
the inserted rows needs to refer to the toast id. And you don't even
know the final size of a row without doing the toasting, since how much
it will get compressed depends on the amount of free space on the
current target buffer.

> > I'd expect that with the current implementation strategy if you
> > e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
> > actually hitting duplicates, you'd see a very, very noticeable hit in
> > scalability. I wouldn't expect a single session to get much slower bug
> > trying a dozen or two should show a very noticeable dip.
>
> I do not accept the premise of your questions regarding the patch's
> performance, which appears to be that it's fair to expect the same
> level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to
> expect of regular INSERT. No scheme is going to get that, including
> your scheme, which necessitates WAL logging everything related to
> index tuple insertion twice for each unique index. We certainly didn't
> require SSI to perform as well as the current REPEATABLE READ, and
> certainly not as well as READ COMMITTED, and for good reasons.

I think it's ok that INSERT .. IGNORE itself is noticeably slower than a
normal INSERT. What I am concerned about is it negatively impacting
overall scalability.

> I'm not going to bother benchmarking the patch myself until I give it
> another whack. There is some really obvious low-hanging performance
> fruit. Obviously by this I mean doing as little as possible when
> holding the exclusive lock.

Sounds fair.

> > I am pretty sure it would be better. We know that nbtree works pretty
> > damn well under concurrency. And the "promise" logic wouldn't change
> > anything of that.

> It would have multiple extra steps. Including WAL-logging of the
> initial insertion, and clean-up for the IGNORE case. And WAL-logging
> of the cleanup. Instead of doing exactly nothing extra in the IGNORE
> case, which, under my proposal, has an additional duration of the
> exclusive lock being held of zero (instead of the ereport() call we
> just immediately release the lock and report the duplicate). The
> IGNORE case matters too, and there is going to be considerably more
> exclusive locking for that case with your proposal, before we even
> talk about other overhead.

What I mean is that none of the individual proposed steps require
changing the locking semantics/granularity. I.e. we will never hold
locks on more pages at once than we do for a normal insert. Yes, one
page will get locked twice, but for shorter than a normal insert (all
you need to do is to rewrite the target of the item) and without having
any other pages locked.

> > Furthermore, I am pretty sure that locking an index page *and* related
> > heap pages exclusively at the same time will be a deadlock
> > nightmare. Even in the case of only having a single unique index.

> It's seems like what you're saying here is "I don't know, this
> extended index locking thing seems pretty sketchy to me". Which is a
> perfectly reasonable thing to say in the absence of better analysis
> from either of us - it is sketchy.

Yes, that's basically what I am saying.

I'd really like some input from others on this. Perhaps you could write
up an email/thread just about the locking semantics after the next
version of the patch (since that presumably will change the semantics)?

> I have taken a certain amount of reassurance from the fact that I was
> not able to observe any deadlocks when pushing the patch hard -
> perhaps that's naive. I mean hammering a table with inserts using
> pgbench for 3 hours, with 64 INSERT...DUPLICATE KEY IGNORE sessions.
> PK values were over a range of values from 1....15 million. This was
> with shared_buffers set to 512MB and assertions enabled. It stood up
> without any deadlocks or other manifestations of bugs. Granted, I did
> that before fixing the bug you've highlighted (but after you pointed
> it out), and so I haven't tested the multi-index case in the same way.

I can readily believe that there are no deadlocks with rather
straightforward schemas/queries like in pgbench. I think it's more
likely to be problematic in more complex scenarios. Running pgbench with
--foreign-keys might be enough to make that scenario more interesting.

> > No, I am primarily concerned of the performance of concurrent inserts
> > and concurrent lookups, even in the case of a single unique index. I
> > don't think we can disregard multiple unique indexes - they are not all
> > that uncommon - but I think we'll see problems before that.
>
> I'm actually more worried about deadlock caused by the heap buffer
> locks than anything else.

The above was meant in the context of performance/scalability.

But yes, I think the more fundamental issue is likely to be the locking
semantics.

> > Yes, I wondered whether we could sort the indexes to move the ones with
> > the highest conflict ratio upfront. But I am afraid that doing so with a
> > scheme that involves keeping exlusive locks across multiple other
> > operations would result in deadlocks.
> > We might want to keep unique violation stats per index to do this
> > efficently. I think keeping those would be a good idea anyway.

> Why should unique index violation stats be a reliable proxy here? In
> the IGNORE case, a violation doesn't occur, so you'd have to track
> those separately.

I assumed we'd track violations on a per-index basis and would include
INSERT .. IGNORE, UPSERTs that detect there already is a preexisting
row.

> I don't think that there'd be any deadlock hazards if the ordering was
> a function of properties of relations that can only be changed by DDL.
> Which is all I had in mind.

Depends on the locking scheme. If it's possible that one backend does an
INSERT IGNORE with the old order and one with the new one, you certainly
have a deadlock hazard.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: Further XLogInsert scaling tweaking
Следующее
От: Tom Lane
Дата:
Сообщение: Re: UTF8 national character data type support WIP patch and list of open issues.