Обсуждение: perform 1 check vs exception when unique_violation

Поиск
Список
Период
Сортировка

perform 1 check vs exception when unique_violation

От
Anton Bogdanovitch
Дата:
I have to insert rows to table with 95% primary key unique_violation.

I've tested 2 examples below:

1)
BEGIN
    INSERT INTO main (name, created) VALUES (i_name, CURRENT_TIMESTAMP
AT TIME ZONE 'GMT');
    EXCEPTION WHEN UNIQUE_VIOLATION THEN
       RETURN 'error: already exists';
END;
RETURN 'ok: store';

2)
PERFORM 1 FROM main WHERE name = i_name;
IF NOT FOUND THEN
    INSERT INTO main (name, created) VALUES (i_name, CURRENT_TIMESTAMP
AT TIME ZONE 'GMT');
    RETURN 'ok: stored';
ELSE
    RETURN 'error: already exists';
END IF;

The first one performs about 20% slower, have 5 times more disk i/o
write operations.
The second one uses 20% more cpu.
Is it because of raid1 and slow writes?
What is the better solution to fit best performance?
Pg version 8.3, table size will probably grow to 100M rows


Re: perform 1 check vs exception when unique_violation

От
"Robert Haas"
Дата:
On Tue, Dec 30, 2008 at 5:41 AM, Anton Bogdanovitch
<poison.box@gmail.com> wrote:
> I have to insert rows to table with 95% primary key unique_violation.

If you're inserting a lot of rows at once, I think you're probably
better off loading all of the data into a side table that does not
have a primary key, and then writing one statement to remove the
duplicates and do all the inserts at once.

INSERT INTO main (name, created)
SELECT
   s.name, CURRENT_TIMESTAMP
FROM
   (SELECT DISTINCT ON (name) FROM sidetable) s  -- nuke duplicate
names within sidetable
   LEFT JOIN main m ON s.name = m.name
   WHERE m.name IS NULL;   -- nuke names in sidetable that are already in main

I've usually found that any kind of loop in PL/pgsql is mind-numbingly
slow compared to doing the same thing as a single query.

...Robert