configuring queries for concurrent updates

Поиск
Список
Период
Сортировка
От Robert Poor
Тема configuring queries for concurrent updates
Дата
Msg-id CAGHqdqWJdw0cG6+-HjpX+v2wE4-4+udm5aCMUMFS5rSaGpQ1uw@mail.gmail.com
обсуждение исходный текст
Ответы Re: configuring queries for concurrent updates  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-general
[std_disclaimer]I'm not a DBA and I'm running PostgreSQL on a
quad-core Intel laptop.  You may read on after you stop
laughing...[/std_disclaimer]

I've written a version of UPSERT designed to import a large number of
records.  It works in two steps: it UPDATEs incumbent records that
share common keys with the new records, then INSERTs new records that
do not share common keys with the incumbent records.  The gist of it
is:

SAVEPOINT ...;
-- update records from subquery that share common keys with tbl
UPDATE tbl
  SET f1 = X.f1, f2 = X.f2, ...
  FROM (subquery) AS X
 WHERE tbl.k1 = X.k1 AND tbl.k2 = X.k2 AND ...;
-- insert records from subquery that do NOT share common keys with tbl
    INSERT INTO tbl (f1, f2, ...)
         SELECT Y.f1, Y.f2, ...
           FROM (subquery) AS Y
LEFT OUTER JOIN tbl
             ON tbl.k1 = Y.k1 AND tbl.k2 = Y.k2 AND ...
          WHERE tbl.id IS NULL;
RELEASE SAVEPOINT ...;

The (subquery) frequently generates 5000 records for update/insert.
Three Unix processes are running the import process concurrently, all
updating/inserting records into a common table.

Most of the time this works, but I've observed examples where an
import process will
* get a "PG::Error: ERROR:  current transaction is aborted, commands
ignored until end of transaction block"
* get a "PG::Error: ERROR:  deadlock detected"
* go catatonic for several minutes

This leads me to believe that my SAVEPOINT / RELEASE SAVEPOINT is not
the correct way to protect the system from multiprocessing mischief.
I've read Tom Lane's presentation on concurrency
(http://www.postgresql.org/files/developer/concurrency.pdf), but I
haven't been able to figure out the right solution.

[For what it's worth, with the data I'm importing, INSERTs will far
outnumber the UPDATES.]

Since I someday plan to spin this code up on a big system with >> 3
import processes, I'd like to get this right.

Hints and solutions warmly accepted.  Thanks.

- rdp

В списке pgsql-general по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: retrieving function raise messages in ecpg embedded sql code
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [ADMIN] Issue in save and retreive file in postgres