Re: Upsert performance considerations (~1 mil/hour)

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Upsert performance considerations (~1 mil/hour)
Дата
Msg-id CAMkU=1xkL1D0vvbOv=t_n7iL7f4SH2+fxU_BxCxD7MTv+1X-Kg@mail.gmail.com
обсуждение исходный текст
Ответ на Upsert performance considerations (~1 mil/hour)  (Fredrik Blomqvist <fredrik.blomqvist.95@gmail.com>)
Список pgsql-performance
On Wed, Sep 4, 2019 at 1:30 PM Fredrik Blomqvist <fredrik.blomqvist.95@gmail.com> wrote:
Hi,

I have tried doing some research for quite a while on the performance implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when a lot of upserts are made. The scale is something like 1 million records/hour, that is split up in groups of around 300 records each.

How is that done?  300 single-valued insert statements, grouped into on transaction?  one 300-valued insert statement?
 
So far, one hypothesis is that this project seems to be suffering from the large amount of writes that happen constantly since even if the upsert results in no inserts/updates, the "failed" inserts from the upsert will still get written somewhere (according to our knowledge).

You can suppress redundant updates with a trigger, as described https://www.postgresql.org/docs/current/functions-trigger.html.  This works even for updates that are the result of insert..on conflict..update.  There is still some writing, as each tuple does get locked, but it is much less (at least from a WAL perspective).   You can also put  a WHERE clause on the DO UPDATE so it only updates is a field has changed, but you have to list each field connected with OR.
 
Therefore, the idea is to utilize old-fashioned upserts (writeable CTEs) and do more granular operations that can make sure to only insert data that doesn't already exist, and only update data that has actually changed. Naturally, however, this will put more read-load on the DB and increase query complexity.

It shouldn't put a meaningful additional read load on the database, as the ON CONFLICT code still needs to do the read as well.  Yes, it makes the code slightly more complex.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Fredrik Blomqvist
Дата:
Сообщение: Upsert performance considerations (~1 mil/hour)
Следующее
От: Fredrik Blomqvist
Дата:
Сообщение: Re: Upsert performance considerations (~1 mil/hour)