Обсуждение: Inserting from multiple processes?

Поиск
Список
Период
Сортировка

Inserting from multiple processes?

От
Dave Johansen
Дата:
I'm using Postgres 8.4 on RHEL 6 and I need to support inserting data into a table from multiple processes with there occasionally being duplicates from the different processes. Here's a simple example table:
CREATE TABLE test (tutc TIMESTAMP WITHOUT TIME ZONE, id INTEGER, value INTEGER, PRIMARY KEY (tutc, id));
If I do the following query from 2 processes, then it's fine:
INSERT INTO test (tutc, id, value) SELECT '2015-01-01 01:02:03', 4, 5 WHERE NOT EXISTS (SELECT 1 FROM test WHERE tutc='2015-01-01 01:02:03' AND id=4);

But if I put the operation in a transaction, then the second process will block until the transaction of the first is commited (which is fine) but then the insert fails with a "duplicate key value violation". I'm guessing that this is because the transaction is making it so that the SELECT only sees the values from before the transaction of the second process began.

Using an "upsert" type of function, like the one shown in the documentation ( see http://www.postgresql.org/docs/8.4/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE ) seems like it might work, but I don't need to support updating and was hoping to not have to use a custom function. So is there some way to catch the unique_violation exception without creating a function? Or some other solution to this?

Thanks,
Dave

Re: Inserting from multiple processes?

От
Dave Johansen
Дата:
On Mon, Jun 8, 2015 at 10:15 AM, Dave Johansen <davejohansen@gmail.com> wrote:
I'm using Postgres 8.4 on RHEL 6 and I need to support inserting data into a table from multiple processes with there occasionally being duplicates from the different processes. Here's a simple example table:
CREATE TABLE test (tutc TIMESTAMP WITHOUT TIME ZONE, id INTEGER, value INTEGER, PRIMARY KEY (tutc, id));
If I do the following query from 2 processes, then it's fine:
INSERT INTO test (tutc, id, value) SELECT '2015-01-01 01:02:03', 4, 5 WHERE NOT EXISTS (SELECT 1 FROM test WHERE tutc='2015-01-01 01:02:03' AND id=4);

But if I put the operation in a transaction, then the second process will block until the transaction of the first is commited (which is fine) but then the insert fails with a "duplicate key value violation". I'm guessing that this is because the transaction is making it so that the SELECT only sees the values from before the transaction of the second process began.

Using an "upsert" type of function, like the one shown in the documentation ( see http://www.postgresql.org/docs/8.4/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE ) seems like it might work, but I don't need to support updating and was hoping to not have to use a custom function. So is there some way to catch the unique_violation exception without creating a function? Or some other solution to this?

For the sake of documentation, here's the function that I used to accomplish this:
CREATE FUNCTION insert_test_no_dup(tutc_ TIMESTAMP WITHOUT TIME ZONE, id_ INTEGER, value_ INTEGER) RETURNS VOID AS
$$
BEGIN
  BEGIN
    INSERT INTO test(tutc, id, value) VALUES (tutc_, id_, value_);
    RETURN;
  EXCEPTION WHEN unique_violation THEN
    -- do nothing because the record already exists
  END;
END;
$$
LANGUAGE plpgsql;

Both using a rule and using a trigger had the same issue with transactions. Here's the declarations for documentation:
CREATE RULE ignore_duplicate_inserts AS
  ON INSERT TO test
  WHERE (EXISTS (SELECT 1 FROM test WHERE tutc=new.tutc AND id=new.id)) DO INSTEAD NOTHING;

CREATE FUNCTION tf_insert_test_ignore_duplicates() RETURNS trigger
AS $$
DECLARE
  found BOOLEAN;
BEGIN
  SELECT 1 INTO found FROM test WHERE tutc=new.tutc AND id=new.id;
  IF found THEN
    RETURN NULL;
  ELSE
    RETURN new;
  END IF;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_test_before BEFORE INSERT ON test
  FOR EACH ROW EXECUTE PROCEDURE tf_insert_test_ignore_duplicates();

Re: Inserting from multiple processes?

От
Dave Johansen
Дата:
On Tue, Jun 9, 2015 at 8:38 AM, Dave Johansen <davejohansen@gmail.com> wrote:
On Mon, Jun 8, 2015 at 10:15 AM, Dave Johansen <davejohansen@gmail.com>
For the sake of documentation, here's the function that I used to accomplish this:
CREATE FUNCTION insert_test_no_dup(tutc_ TIMESTAMP WITHOUT TIME ZONE, id_ INTEGER, value_ INTEGER) RETURNS VOID AS
$$
BEGIN
  BEGIN
    INSERT INTO test(tutc, id, value) VALUES (tutc_, id_, value_);
    RETURN;
  EXCEPTION WHEN unique_violation THEN
    -- do nothing because the record already exists
  END;
END;
$$
LANGUAGE plpgsql;

It appears that calling "SELECT insert_test_no_dup('2015-01-01', 1, 1)" cause the XID to increment? I'm not sure if it's only when the exception happens or all the time, but if there some way to prevent the increment of XID because it's causing problems with our system:
http://www.postgresql.org/message-id/CAAcYxUer3MA=enXvnOwe0oSAA8ComvxCF6OrHp-vUppr56twFg@mail.gmail.com

Thanks,
Dave

Re: Inserting from multiple processes?

От
Francisco Olarte
Дата:
Hi Dave:

On Fri, Jun 26, 2015 at 2:59 AM, Dave Johansen <davejohansen@gmail.com> wrote:

> It appears that calling "SELECT insert_test_no_dup('2015-01-01', 1, 1)"
> cause the XID to increment? I'm not sure if it's only when the exception
> happens or all the time, but if there some way to prevent the increment of
> XID because it's causing problems with our system:
> http://www.postgresql.org/message-id/CAAcYxUer3MA=enXvnOwe0oSAA8ComvxCF6OrHp-vUppr56twFg@mail.gmail.com

I, personally, would expect an START TRANSACTION to burn an XID, they
are serial, and they need to be allocated to have transaction
ordering, like the thing which happens with the sequences. I assume
the server can have some optimizations ( like delaying XID adquisition
to the first appropiate statement, which I think depends on your
isolation level ), but I would never expect it to not allocate it
before an insert, it needs it to be sent to the table, in case it
succeeds, and has to acquire it beforehand, in case someone needs to
acquire another xid between the time it starts inserting and the time
it succeeds or fail. Some internals expert may shed some light, but
after reading your link it seems your problem is just you do too many
transactions without a vacuum ( also reading your pointed threas it
sees you do vacuum fulls, which seems unneeded ) and expecting
postgres has some kind of magic to avoid burning the xids.

Francisco Olarte.


Re: Inserting from multiple processes?

От
Dave Johansen
Дата:
On Sat, Jun 27, 2015 at 11:00 AM, Francisco Olarte <folarte@peoplecall.com> wrote:
Hi Dave:

On Fri, Jun 26, 2015 at 2:59 AM, Dave Johansen <davejohansen@gmail.com> wrote:

> It appears that calling "SELECT insert_test_no_dup('2015-01-01', 1, 1)"
> cause the XID to increment? I'm not sure if it's only when the exception
> happens or all the time, but if there some way to prevent the increment of
> XID because it's causing problems with our system:
> http://www.postgresql.org/message-id/CAAcYxUer3MA=enXvnOwe0oSAA8ComvxCF6OrHp-vUppr56twFg@mail.gmail.com

I, personally, would expect an START TRANSACTION to burn an XID, they
are serial, and they need to be allocated to have transaction
ordering, like the thing which happens with the sequences. I assume
the server can have some optimizations ( like delaying XID adquisition
to the first appropiate statement, which I think depends on your
isolation level ), but I would never expect it to not allocate it
before an insert, it needs it to be sent to the table, in case it
succeeds, and has to acquire it beforehand, in case someone needs to
acquire another xid between the time it starts inserting and the time
it succeeds or fail. Some internals expert may shed some light, but
after reading your link it seems your problem is just you do too many
transactions without a vacuum ( also reading your pointed threas it
sees you do vacuum fulls, which seems unneeded ) and expecting
postgres has some kind of magic to avoid burning the xids.

The issue is that the following uses 5 XIDs when I would only expect it to us 1:
BEGIN;
SELECT insert_test_no_dup('2015-01-01', 1, 1);
SELECT insert_test_no_dup('2015-01-02', 2, 2);
SELECT insert_test_no_dup('2015-01-01', 1, 1);
SELECT insert_test_no_dup('2015-01-02', 2, 2);
SELECT insert_test_no_dup('2015-01-01', 1, 1);
SELECT insert_test_no_dup('2015-01-02', 2, 2);
END;

It appears that the unique violation that is caught and ignored increments the XID even though I didn't expect that to happen. I agree that our software was burning XIDs needlessly and Postgres handled this situation as best as it could. It also sounds like Postgres 9.5 adds features to support this sort of use more efficiently, but the XID incrementing on the unique violation seems like it could/should be fixed, if it hasn't been already.

Re: Inserting from multiple processes?

От
Francisco Olarte
Дата:
Hi Dave:

On Mon, Jun 29, 2015 at 6:32 AM, Dave Johansen <davejohansen@gmail.com> wrote:
> The issue is that the following uses 5 XIDs when I would only expect it to
> us 1:
> BEGIN;
> SELECT insert_test_no_dup('2015-01-01', 1, 1);
....
> END;

I see.

> It appears that the unique violation that is caught and ignored increments
> the XID even though I didn't expect that to happen. I agree that our
> software was burning XIDs needlessly and Postgres handled this situation as
> best as it could. It also sounds like Postgres 9.5 adds features to support
> this sort of use more efficiently, but the XID incrementing on the unique
> violation seems like it could/should be fixed, if it hasn't been already.

IIRC you were using BEGIN/EXCEPTION, which I think uses a savepoint
internally, which maybe what is burning the xid on every execution (
it probably needs one to implement rollback to savepoint properly ).
I've done a simple test which burns one very time the exception is
raised ( using a division by zero ).

If this is your case you may be able to work around it using a
conditional insert instead of an exception, and as you are using a
function the potential ugliness will remain encapsulated ( it may even
be faster, as the docs explicitly say exception blocks are expensive,
but as usual YMMV depending on the exact query and the collision ratio
).

Francisco Olarte.