Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts

Поиск
Список
Период
Сортировка
От Paul
Тема Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts
Дата
Msg-id 568d8060.291cb60a.e8a06.ffff9d00@mx.google.com
обсуждение исходный текст
Ответ на Re: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts  (Greg Stark <stark@mit.edu>)
Список pgsql-bugs
To be clear, the column defaulted to a sequencer is not at all involved in =
determining conflicts, and therefore it should not matter if it is used in =
a unique index, nor would there be any problems with a BEFORE trigger.

Gaps aren=E2=80=99t any kind of issue, and it=E2=80=99s not about how popul=
ar a related table is to the table upon which the UPSERT would be applied.

For example, I might be receiving event data from a shop floor where everyt=
hing is instrumented; tank levels, step counters, etc. There could be milli=
ons of events over a relatively short period of time. I might have a field =
in the event data which indicates the =E2=80=98class=E2=80=99 of device tha=
t generated the event; ex: =E2=80=98pump=E2=80=99, =E2=80=98actuator=E2=80=
=99, =E2=80=98regulator=E2=80=99, =E2=80=98thermometer=E2=80=99. There may =
only be less than a few hundred classes of devices that rarely change. So I=
 have a table, with a surrogate SMALLINT key defaulted to a sequence, and a=
 =E2=80=98name=E2=80=99 column which is a device class name.

As the raw event data is processed, I want to do an UPSERT into the device =
class table, using only the class name to resolve conflicts, to get it=E2=
=80=99s surrogate key, which is then used while inserting the related event=
 record linking that event record to the particular device class.  Since de=
vice classes are rarely added, it will be the common case that an actual IN=
SERT is almost never performed.

Using INSERT ON CONFLICT in this case, however, would mean the device class=
 table, which would only ever hold a few hundred records and would only req=
uire a SMALLINT for its surrogate key, would have to have a BIGINT for no g=
ood reason, which really isn=E2=80=99t a big deal. What IS the big deal is =
that the event table would also require a BIGINT for the device class colum=
n to relate the event to the device class. In most cases, there are many mo=
re =E2=80=98organizing=E2=80=99 columns, similar to the device class in thi=
s example, meaning BIGINTs would have to be used for no good reason within =
event tables, which can hold vast volumes of data. This can have a real and=
 measurable impact on memory, network, and storage requirements, which even=
tually map to dollars.

This basic scenario is very common, and one that greatly benefits from UPSE=
RT capability; i.e. linking event data to organizational data. It=E2=80=99s=
 much less common to have to UPSERT into some event table; i.e. the INSERT =
part of UPSERTs is usually the path less taken. I used the example of a sho=
p floor with device classes, but its the same issue for looking at orders b=
y product hierarchies, commissions by territories, stars by classification,=
 web-page-hits by marketing campaigns=E2=80=A6. I could list hundreds of pr=
ocesses within many problem domains that are of this same basic pattern.

My only point is, there=E2=80=99s another great capability in Postgres, for=
 doing efficient concurrent UPSERTS, but that one of the most common and pr=
ime use cases for it is going to cost more money from having to use 64bit n=
umbers everywhere for no good reason.




From: Andres Freund
Sent: Wednesday, January 6, 2016 11:19 AM
To: Paul Hester
Cc: Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencers onco=
nflicts

On 2016-01-05 22:04:41 -0700, Paul Hester wrote:
> No peek-ahead necessary, when the sequenced column would not be used in t=
he
> conflict_target. UPSERTS to dimension tables in a star schema as part of =
an
> OLAP system are very handy, but typically the records in the dimension
> tables use a surrogate key based on sequences for maintenance & performan=
ce
> reasons, and the 'natural' key/value is the only column that would be use=
d
> to detect a conflict. In this case, the 'normal' path would be the
> conflict_action nearly all the time, and not the insert. If the typical
> path was the INSERT, I'd bet the data would be from some event source, an=
d
> I guess in that case if there were a key column defaulted with a sequence=
,
> like an order#, it would have to be used to resolve conflicts. But it's
> hard to imagine a case in a process flow where you wouldn't already know =
if
> the event existed or not, limiting the need for an UPSERT on event data.

It'd be very fragile. You could only do that if the value isn't involved
in any unique indexes, if there are no BEFORE triggers (since they need
to see the value).

> This limitation, of consuming sequencers used for defaults on columns not
> necessary to resolve conflict, diminishes the viability for using Postges
> UPSERTS for large data warehouses, is all I'm saying (or requires surroga=
te
> keys being 64 bit). Just caught me by surprise in comparison to other
> RDBMSs offering some form of UPSERTs that would not consume a sequencer i=
f
> its values weren't required to resolve conflicts.

If you use normal sequences you already need to cope with gaps in
sequences. And ids needing to be 64bits if you're a longer lived
business and it's for a a halfway "popular" table, isn't something new.


So I don't think addressing this is worth adding code for, especially
given the fragility of the situations where it'd be usable.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: eugeneymail@ymail.com
Дата:
Сообщение: BUG #13852: SQL Select Slow Issues
Следующее
От: Paul
Дата:
Сообщение: Re: BUG #13846: INSERT ON CONFLICT consumes sequencersonconflicts