Обсуждение: Performance question

Поиск
Список
Период
Сортировка

Performance question

От
Jean-Christian Imbeault
Дата:
I'm trying to convince another open-source project (phpOpenTracker) to
modify their current INSERT sql queries. Currently they just do an
INSERT into a table without first checking if their might be a record
with the same primary key.

The reason for this that they need fast inserts and most user I assume
are using MySQL which silently drops INSERT queries that violate primary
key constraints. But postgres on the other hand (and rightly so) issues
and logs an error.

I have suggested that their current INSERT INTO t VALUES() be changed to:

INSERT INTO
   T
SELECT 'v1', 'v2'
WHERE
   NOT EXISTS (
     SELECT NULL FROM t WHERE pk='v1'
   )

However one of the developers is worried that this would cause a
performance drop for MySQL users b/c of the extra SELECT in my version
of the INSERT query.

I had thought that the 'extra' SELECT isn't really extra at all since
*any* DB still has to check that there is a record with the primary key
that we are trying to insert. So whereas in my query the SELECT is
explicitly stated in the regular version of a simple INSERT, the select
is still there but implicit since there was a primary key defined on the
table. So there really shouldn't be much, if any of a performance drop.

Is there any truth to my logic in the previous paragraph? :)

I'd like to test my theory on a real MySQL database but I don't have
access to one right now, and I am not sure how to go about testing a
MySQL db or even what kind of testing. If I can get a spare machine I
will give it a go though.

Thanks,

Jean-Christian Imbeault


Re: Performance question

От
Tom Lane
Дата:
Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes:
> I have suggested that their current INSERT INTO t VALUES() be changed to:

> INSERT INTO
>    T
> SELECT 'v1', 'v2'
> WHERE
>    NOT EXISTS (
>      SELECT NULL FROM t WHERE pk='v1'
>    )

That doesn't really buy anything in safety terms: if two backends
execute this sort of command concurrently, it's perfectly likely
that both sub-SELECTS will find no row matching 'v1', and so they'll
both try the INSERT anyway.

IMO the best way to do this (assuming that you have a unique index
defined on the primary key column) is to just go ahead and try the
INSERT, but be prepared to roll back your transaction and retry
if you get a failure.

You might find it useful to read the slides from my talk at last
year's O'Reilly conference about this and related concurrency
problems:
http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681

            regards, tom lane

Re: Performance question

От
Jean-Christian Imbeault
Дата:
Tom Lane wrote:
>
> That doesn't really buy anything in safety terms: if two backends
> execute this sort of command concurrently, it's perfectly likely
> that both sub-SELECTS will find no row matching 'v1', and so they'll
> both try the INSERT anyway.

I wasn't looking for safety so much as avoiding the insert if possible.
I've just finished a long thread about this and the consensus was that
this query would be:

1- the insert would always happen if there was no record with the same
primary key
2- is concurrency safe. Even if this query is executed by multiple
backends at the same time one will always succeed.

The reason I had asked for a query that could check before insert was to
reduce the number of log errors the phpOpenTracker tool generates from
trying to insert duplicate records into a table.

The above would reduce (but not eliminate) the number attempts to insert
duplicates and as such reduce the number of log error entries.

At least that's what I am hoping for ...

Jean-Christian Imbeault


Re: Performance question

От
Tom Lane
Дата:
Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes:
> I wasn't looking for safety so much as avoiding the insert if possible.
> ...
> The above would reduce (but not eliminate) the number attempts to insert
> duplicates and as such reduce the number of log error entries.

Fair enough, it would do that.  But you still need the surrounding retry
logic to cope when a collision does happen.

            regards, tom lane

Re: Performance question

От
Jean-Christian Imbeault
Дата:
Tom Lane wrote:
>
> Fair enough, it would do that.  But you still need the surrounding retry
> logic to cope when a collision does happen.

Why retry logic? I thought that if there was a collision one of the
insert attempts would necessarily succeed and all the others fails. So
no need for a retry since the record (pk) was inserted.

What I am trying to duplicate is if an inserts into a primary key field
is a duplicate it fails silently. If it isn't or there are multiple
attempts at the same time to insert the *same* new primary key, the new
pk is inserted.

The suggested method does guarantee an insert if it there is a collision
right? If not I need to find a new solution :(

Jean-Christian Imbeault




Re: Performance question

От
Tom Lane
Дата:
Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes:
> Tom Lane wrote:
>> Fair enough, it would do that.  But you still need the surrounding retry
>> logic to cope when a collision does happen.

> Why retry logic? I thought that if there was a collision one of the
> insert attempts would necessarily succeed and all the others fails. So
> no need for a retry since the record (pk) was inserted.

Well, if that's how your application logic works then you're done.
I was envisioning a case where you'd prefer to generate a new key
value and try the insert again.

            regards, tom lane

Re: Performance question

От
Jean-Christian Imbeault
Дата:
Tom Lane wrote:
>
> Well, if that's how your application logic works then you're done.

Great! It's not _my_ application logic, someone else's. I'm trying to
find a way to make it more postgres friendly and generate less error
messages.

The application was first written for MySQL and the business _logic_
seems to have been to just throw inserts at the database and if they
failed it must be because the record was already there so no need to
worry about it.

Pretty loose logic but then again the app can be *very* write intensive
and the extra cycles needed to check whether something is  already in
the DB or not before inserting seems to be a real concern for the
application developers.

Jean-Christian Imbeault


Re: Performance question

От
Oleg Bartunov
Дата:
On Wed, 2 Jul 2003, Tom Lane wrote:

> Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes:
> > I have suggested that their current INSERT INTO t VALUES() be changed to:
>
> > INSERT INTO
> >    T
> > SELECT 'v1', 'v2'
> > WHERE
> >    NOT EXISTS (
> >      SELECT NULL FROM t WHERE pk='v1'
> >    )
>
> That doesn't really buy anything in safety terms: if two backends
> execute this sort of command concurrently, it's perfectly likely
> that both sub-SELECTS will find no row matching 'v1', and so they'll
> both try the INSERT anyway.
>
> IMO the best way to do this (assuming that you have a unique index
> defined on the primary key column) is to just go ahead and try the
> INSERT, but be prepared to roll back your transaction and retry
> if you get a failure.
>
> You might find it useful to read the slides from my talk at last
> year's O'Reilly conference about this and related concurrency
> problems:
> http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681
>

I'd like to see all presentations in one collections. It'd be nice
addition to documentation.


>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83