Обсуждение: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

Поиск
Список
Период
Сортировка

BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

От
paul@salesintel.com
Дата:
The following bug has been logged on the website:

Bug reference:      13846
Logged by:          Paul Hester
Email address:      paul@salesintel.com
PostgreSQL version: 9.5rc1
Operating system:   Windows 10 Pro
Description:

When using an INSERT statement with an ON CONFLICT clause, if there is a
conflict, and the table being inserted into has a column defaulted to
nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
and artificially consume all values of the sequencer; it behaves as if
conflict detection happens after inserts are attempted, rather than before.

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

От
Andres Freund
Дата:
On 2016-01-05 15:02:27 +0000, paul@salesintel.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13846
> Logged by:          Paul Hester
> Email address:      paul@salesintel.com
> PostgreSQL version: 9.5rc1
> Operating system:   Windows 10 Pro
> Description:
>
> When using an INSERT statement with an ON CONFLICT clause, if there is a
> conflict, and the table being inserted into has a column defaulted to
> nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
> and artificially consume all values of the sequencer; it behaves as if
> conflict detection happens after inserts are attempted, rather than before.

Yes. That's by design. You can't reliably do conflict detection before
evaluating column default values.

Andres Freund

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

От
Peter Geoghegan
Дата:
On Tue, Jan 5, 2016 at 8:13 AM, Andres Freund <andres@anarazel.de> wrote:
>> When using an INSERT statement with an ON CONFLICT clause, if there is a
>> conflict, and the table being inserted into has a column defaulted to
>> nextval('seq'), the sequencer 'seq' is always incremented. This can quickly
>> and artificially consume all values of the sequencer; it behaves as if
>> conflict detection happens after inserts are attempted, rather than before.
>
> Yes. That's by design. You can't reliably do conflict detection before
> evaluating column default values.

Right. If you didn't consume a sequence value, but just did a
"peek-ahead", then several concurrently inserting sessions would all
"peek-ahead" and see the same value. There'd then be a race condition
that broke the useful guarantees that ON CONFLICT DO UPDATE makes.

--
Peter Geoghegan

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts

От
Paul
Дата:
If none of the columns with defaults are used to detect conflicts, why must=
 they be evaluated before conflict detection in order to reliably detect a =
conflict?

Lots of systems categorize information into =E2=80=98organizational=E2=80=
=99 data that may be hierarchical in nature and fairly static, in relation =
to the other =E2=80=98operational=E2=80=99 kind of data, which is usually g=
enerated from events; those events being organized by the former for batchi=
ng and reporting purposes, etc. It is often the case UPSERTS are most usefu=
l when deriving the organizational data from the operational data, meaning =
a very typical usage of an UPSERT would have many more updates than inserts=
, in direct correlation to the number of events being processed. Its not un=
reasonable over a relatively short period of time (within the overall lifet=
ime of a particular system) to process billions of events, meaning SERIAL t=
yped columns used as surrogate keys to the natural keys of organizational d=
ata could exhaust their sequencers for no good reason, and SMALLSERIAL just=
 wouldn=E2=80=99t be usable, BIGSERIAL could work, but they=E2=80=99re, uhm=
.. BIG.

There should be a way to not evaluate defaulted columns not used in the con=
flict_target, so that SERIAL typed columns not used for conflict detection =
don=E2=80=99t unnecessary exhaust their sequencers=E2=80=A6. (I think that =
was like a quadruple-negative sentence there=E2=80=A6, sorry about that)

Regards,
Paul

From: Andres Freund
Sent: Tuesday, January 5, 2016 9:13 AM
To: paul@salesintel.com
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencers onco=
nflicts

On 2016-01-05 15:02:27 +0000, paul@salesintel.com wrote:
> The following bug has been logged on the website:
>=20
> Bug reference:      13846
> Logged by:          Paul Hester
> Email address:      paul@salesintel.com
> PostgreSQL version: 9.5rc1
> Operating system:   Windows 10 Pro
> Description:       =20
>=20
> When using an INSERT statement with an ON CONFLICT clause, if there is a
> conflict, and the table being inserted into has a column defaulted to
> nextval('seq'), the sequencer 'seq' is always incremented. This can quick=
ly
> and artificially consume all values of the sequencer; it behaves as if
> conflict detection happens after inserts are attempted, rather than befor=
e.

Yes. That's by design. You can't reliably do conflict detection before
evaluating column default values.

Andres Freund

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

От
Paul Hester
Дата:
No peek-ahead necessary, when the sequenced column would not be used in the
conflict_target. UPSERTS to dimension tables in a star schema as part of an
OLAP system are very handy, but typically the records in the dimension
tables use a surrogate key based on sequences for maintenance & performance
reasons, and the 'natural' key/value is the only column that would be used
to detect a conflict. In this case, the 'normal' path would be the
conflict_action nearly all the time, and not the insert. If the typical
path was the INSERT, I'd bet the data would be from some event source, and
I guess in that case if there were a key column defaulted with a sequence,
like an order#, it would have to be used to resolve conflicts. But it's
hard to imagine a case in a process flow where you wouldn't already know if
the event existed or not, limiting the need for an UPSERT on event data.

This limitation, of consuming sequencers used for defaults on columns not
necessary to resolve conflict, diminishes the viability for using Postges
UPSERTS for large data warehouses, is all I'm saying (or requires surrogate
keys being 64 bit). Just caught me by surprise in comparison to other
RDBMSs offering some form of UPSERTs that would not consume a sequencer if
its values weren't required to resolve conflicts.

On Tue, Jan 5, 2016 at 12:27 PM, Peter Geoghegan <pg@heroku.com> wrote:

> On Tue, Jan 5, 2016 at 8:13 AM, Andres Freund <andres@anarazel.de> wrote:
> >> When using an INSERT statement with an ON CONFLICT clause, if there is a
> >> conflict, and the table being inserted into has a column defaulted to
> >> nextval('seq'), the sequencer 'seq' is always incremented. This can
> quickly
> >> and artificially consume all values of the sequencer; it behaves as if
> >> conflict detection happens after inserts are attempted, rather than
> before.
> >
> > Yes. That's by design. You can't reliably do conflict detection before
> > evaluating column default values.
>
> Right. If you didn't consume a sequence value, but just did a
> "peek-ahead", then several concurrently inserting sessions would all
> "peek-ahead" and see the same value. There'd then be a race condition
> that broke the useful guarantees that ON CONFLICT DO UPDATE makes.
>
> --
> Peter Geoghegan
>



--
*Paul Hester*
Co-founder & Chief Architect
www.salesintel.com

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

От
Andres Freund
Дата:
On 2016-01-05 22:04:41 -0700, Paul Hester wrote:
> No peek-ahead necessary, when the sequenced column would not be used in the
> conflict_target. UPSERTS to dimension tables in a star schema as part of an
> OLAP system are very handy, but typically the records in the dimension
> tables use a surrogate key based on sequences for maintenance & performance
> reasons, and the 'natural' key/value is the only column that would be used
> to detect a conflict. In this case, the 'normal' path would be the
> conflict_action nearly all the time, and not the insert. If the typical
> path was the INSERT, I'd bet the data would be from some event source, and
> I guess in that case if there were a key column defaulted with a sequence,
> like an order#, it would have to be used to resolve conflicts. But it's
> hard to imagine a case in a process flow where you wouldn't already know if
> the event existed or not, limiting the need for an UPSERT on event data.

It'd be very fragile. You could only do that if the value isn't involved
in any unique indexes, if there are no BEFORE triggers (since they need
to see the value).

> This limitation, of consuming sequencers used for defaults on columns not
> necessary to resolve conflict, diminishes the viability for using Postges
> UPSERTS for large data warehouses, is all I'm saying (or requires surrogate
> keys being 64 bit). Just caught me by surprise in comparison to other
> RDBMSs offering some form of UPSERTs that would not consume a sequencer if
> its values weren't required to resolve conflicts.

If you use normal sequences you already need to cope with gaps in
sequences. And ids needing to be 64bits if you're a longer lived
business and it's for a a halfway "popular" table, isn't something new.


So I don't think addressing this is worth adding code for, especially
given the fragility of the situations where it'd be usable.

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers on conflicts

От
Thomas Kellerer
Дата:
Paul Hester schrieb am 06.01.2016 um 06:04:
> This limitation, of consuming sequencers used for defaults on columns
> not necessary to resolve conflict, diminishes the viability for using
> Postges UPSERTS for large data warehouses, is all I'm saying (or
> requires surrogate keys being 64 bit). Just caught me by surprise in
> comparison to other RDBMSs offering some form of UPSERTs that would
> not consume a sequencer if its values weren't required to resolve
> conflicts.

Why does this "diminish" the viability of a sequence?

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts

От
Peter Geoghegan
Дата:
T24gV2VkLCBKYW4gNiwgMjAxNiBhdCAxOjAwIFBNLCBQYXVsIDxwYXVsQHNhbGVzaW50ZWwuY29t
PiB3cm90ZToNCj4gTXkgb25seSBwb2ludCBpcywgdGhlcmXigJlzIGFub3RoZXIgZ3JlYXQgY2Fw
YWJpbGl0eSBpbiBQb3N0Z3JlcywgZm9yIGRvaW5nDQo+IGVmZmljaWVudCBjb25jdXJyZW50IFVQ
U0VSVFMsIGJ1dCB0aGF0IG9uZSBvZiB0aGUgbW9zdCBjb21tb24gYW5kIHByaW1lIHVzZQ0KPiBj
YXNlcyBmb3IgaXQgaXMgZ29pbmcgdG8gY29zdCBtb3JlIG1vbmV5IGZyb20gaGF2aW5nIHRvIHVz
ZSA2NGJpdCBudW1iZXJzDQo+IGV2ZXJ5d2hlcmUgZm9yIG5vIGdvb2QgcmVhc29uLg0KDQpOb3Qg
cmVhbGx5OyBhdCBsZWFzdCwgaXQgd29uJ3QgaW5jcmVhc2UgdGhlIHNpemUgb2YgYSBwbGFpbiBC
SUdTRVJJQUwNCnByaW1hcnkga2V5IGluZGV4ICh3aGljaCB3aWxsIGdlbmVyYWxseSBub3QgYmUg
Y29tcG9zaXRlKToNCg0KcG9zdGdyZXM9IyBjcmVhdGUgdGFibGUgZm9vIGFzIHNlbGVjdCAocmFu
ZG9tKCkgKiAxMDAwMDAwMDApOjppbnQ0DQpwa2V5IGZyb20gZ2VuZXJhdGVfc2VyaWVzKDEsIDEw
MDAwMCk7DQpTRUxFQ1QgMTAwMDAwDQpwb3N0Z3Jlcz0jIGNyZWF0ZSBpbmRleCBvbiBmb28gKHBr
ZXkpOw0KQ1JFQVRFIElOREVYDQoNCnBvc3RncmVzPSMgXGR0KyBmb28NCiAgICAgICAgICAgICAg
ICAgICBMaXN0IG9mIHJlbGF0aW9ucw0KIFNjaGVtYSDilIIgTmFtZSDilIIgVHlwZSAg4pSCIE93
bmVyIOKUgiAgU2l6ZSAgIOKUgiBEZXNjcmlwdGlvbg0K4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA
4pS84pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA
4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA
4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSADQogcHVibGljIOKUgiBmb28gIOKUgiB0YWJsZSDi
lIIgcGcgICAg4pSCIDM1NDQga0Ig4pSCDQooMSByb3cpDQoNCnBvc3RncmVzPSMgXGRpKyBmb29f
cGtleV9pZHgNCiAgICAgICAgICAgICAgICAgICAgICAgICAgIExpc3Qgb2YgcmVsYXRpb25zDQog
U2NoZW1hIOKUgiAgICAgTmFtZSAgICAg4pSCIFR5cGUgIOKUgiBPd25lciDilIIgVGFibGUg4pSC
ICBTaXplICAg4pSCIERlc2NyaXB0aW9uDQrilIDilIDilIDilIDilIDilIDilIDilIDilLzilIDi
lIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDi
lIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilLzi
lIDilIDilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilIDilIDi
lIDilIDilIDilIANCiBwdWJsaWMg4pSCIGZvb19wa2V5X2lkeCDilIIgaW5kZXgg4pSCIHBnICAg
IOKUgiBmb28gICDilIIgMjIwOCBrQiDilIINCigxIHJvdykNCg0KcG9zdGdyZXM9IyBhbHRlciB0
YWJsZSBmb28gYWx0ZXIgY29sdW1uIHBrZXkgdHlwZSBpbnQ4Ow0KQUxURVIgVEFCTEUNCg0KcG9z
dGdyZXM9IyBcZGkrIGZvb19wa2V5X2lkeA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgTGlz
dCBvZiByZWxhdGlvbnMNCiBTY2hlbWEg4pSCICAgICBOYW1lICAgICDilIIgVHlwZSAg4pSCIE93
bmVyIOKUgiBUYWJsZSDilIIgIFNpemUgICDilIIgRGVzY3JpcHRpb24NCuKUgOKUgOKUgOKUgOKU
gOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKU
vOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKU
gOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKU
gOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgA0KIHB1YmxpYyDilIIgZm9vX3BrZXlfaWR4IOKU
giBpbmRleCDilIIgcGcgICAg4pSCIGZvbyAgIOKUgiAyMjA4IGtCIOKUgg0KKDEgcm93KQ0KDQpU
aGUgdW5jaGFuZ2VkIHNpemUgb2YgdGhlIGluZGV4IGZvb19wa2V5X2lkeCBzZWVuIGhlcmUgZHVl
IHRvDQphbGlnbm1lbnQgY29uc2lkZXJhdGlvbnMuIEdyYW50ZWQsIHRoZSBoZWFwIG1pZ2h0IHN0
aWxsIGJlIGEgYml0DQpsYXJnZXIgdGhhbiBpdCB3b3VsZCBvdGhlcndpc2UgYmUsIGJlY2F1c2Ug
aXQgd2lsbCB1c3VhbGx5IGJlDQoiY29tcG9zaXRlIiwgYnV0IEkgdGhpbmsgdGhlICJjb3N0IiBv
ZiB5b3VyIHVzaW5nIGludDggd2lsbCBub3QgYnJlYWsNCnRoZSBiYW5rLg0KDQotLSANClBldGVy
IEdlb2doZWdhbg0K

Re: BUG #13846: INSERT ON CONFLICT consumes sequencersonconflicts

От
Andres Freund
Дата:
Hi,

On 2016-01-06 15:00:17 -0700, Paul wrote:
> I’m looking at math more like a single Fact table having  500 million
> records, with 10 dimension columns. If INTs were used for the
> dimension columns, that’s 20GB.

> If I had to change those dimension columns to BIGINTs, that’s
> 40GB. This can impact how much you can fit into server memory, where
> olaps like to live, and just slow down moving stuff around between
> memory and disk and over the network and backups, etc.

Row headers, padding & alignment makes that absolutely different in
reality. It's like a sub 20% difference.

And in pretty much all the cases with sufficient insertion rates you're
going to want bigints anyway. If there's few rows it doesn't matter
anyway.


> On a technical note, why is the following flow considered ‘fragile’?

The fragility comes from the fact that it'd only be used in a subset of
cases. E.g. if a BEFORE trigger is present the to-be-inserted rows needs
to be "materialized" to be presented to the trigger.


> 1) Evaluate only columns used in conflict_target
> a. Conflict-resolving Columns with default nextval() increment the corresponding sequencer
> i. And in this case, there were never be conflicts by definition, so
> ON CONFLICT can always be ignored

Wrong. Rows with sequences can very well conflict, there's nothing
forcing sequences to always be used.

Also note that sequence default values aren't in any way different from other
default values, and that relevant pieces of code currently don't know
whether a default value is a nextval or not.

> 2) If conflict, DO UPDATE
> a. If nextval()-defaulted column used in conflict_target, we never get here
> b. Defaults never evaluated
> 3) Else Evaluate remaining columns not used in conflict_target and INSERT
> a. Columns with nextval() increment their corresponding sequencer

Perhaps read the code, and prototype it? I can tell you that it'd be a
significant amount of work, and that I'm personally absolutely not
interested in investing significant amounts time into it. But that
doesn't have to stop *you*.


Anyway, EOD for me.

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts

От
Paul
Дата:
To be clear, the column defaulted to a sequencer is not at all involved in =
determining conflicts, and therefore it should not matter if it is used in =
a unique index, nor would there be any problems with a BEFORE trigger.

Gaps aren=E2=80=99t any kind of issue, and it=E2=80=99s not about how popul=
ar a related table is to the table upon which the UPSERT would be applied.

For example, I might be receiving event data from a shop floor where everyt=
hing is instrumented; tank levels, step counters, etc. There could be milli=
ons of events over a relatively short period of time. I might have a field =
in the event data which indicates the =E2=80=98class=E2=80=99 of device tha=
t generated the event; ex: =E2=80=98pump=E2=80=99, =E2=80=98actuator=E2=80=
=99, =E2=80=98regulator=E2=80=99, =E2=80=98thermometer=E2=80=99. There may =
only be less than a few hundred classes of devices that rarely change. So I=
 have a table, with a surrogate SMALLINT key defaulted to a sequence, and a=
 =E2=80=98name=E2=80=99 column which is a device class name.

As the raw event data is processed, I want to do an UPSERT into the device =
class table, using only the class name to resolve conflicts, to get it=E2=
=80=99s surrogate key, which is then used while inserting the related event=
 record linking that event record to the particular device class.  Since de=
vice classes are rarely added, it will be the common case that an actual IN=
SERT is almost never performed.

Using INSERT ON CONFLICT in this case, however, would mean the device class=
 table, which would only ever hold a few hundred records and would only req=
uire a SMALLINT for its surrogate key, would have to have a BIGINT for no g=
ood reason, which really isn=E2=80=99t a big deal. What IS the big deal is =
that the event table would also require a BIGINT for the device class colum=
n to relate the event to the device class. In most cases, there are many mo=
re =E2=80=98organizing=E2=80=99 columns, similar to the device class in thi=
s example, meaning BIGINTs would have to be used for no good reason within =
event tables, which can hold vast volumes of data. This can have a real and=
 measurable impact on memory, network, and storage requirements, which even=
tually map to dollars.

This basic scenario is very common, and one that greatly benefits from UPSE=
RT capability; i.e. linking event data to organizational data. It=E2=80=99s=
 much less common to have to UPSERT into some event table; i.e. the INSERT =
part of UPSERTs is usually the path less taken. I used the example of a sho=
p floor with device classes, but its the same issue for looking at orders b=
y product hierarchies, commissions by territories, stars by classification,=
 web-page-hits by marketing campaigns=E2=80=A6. I could list hundreds of pr=
ocesses within many problem domains that are of this same basic pattern.

My only point is, there=E2=80=99s another great capability in Postgres, for=
 doing efficient concurrent UPSERTS, but that one of the most common and pr=
ime use cases for it is going to cost more money from having to use 64bit n=
umbers everywhere for no good reason.




From: Andres Freund
Sent: Wednesday, January 6, 2016 11:19 AM
To: Paul Hester
Cc: Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumes sequencers onco=
nflicts

On 2016-01-05 22:04:41 -0700, Paul Hester wrote:
> No peek-ahead necessary, when the sequenced column would not be used in t=
he
> conflict_target. UPSERTS to dimension tables in a star schema as part of =
an
> OLAP system are very handy, but typically the records in the dimension
> tables use a surrogate key based on sequences for maintenance & performan=
ce
> reasons, and the 'natural' key/value is the only column that would be use=
d
> to detect a conflict. In this case, the 'normal' path would be the
> conflict_action nearly all the time, and not the insert. If the typical
> path was the INSERT, I'd bet the data would be from some event source, an=
d
> I guess in that case if there were a key column defaulted with a sequence=
,
> like an order#, it would have to be used to resolve conflicts. But it's
> hard to imagine a case in a process flow where you wouldn't already know =
if
> the event existed or not, limiting the need for an UPSERT on event data.

It'd be very fragile. You could only do that if the value isn't involved
in any unique indexes, if there are no BEFORE triggers (since they need
to see the value).

> This limitation, of consuming sequencers used for defaults on columns not
> necessary to resolve conflict, diminishes the viability for using Postges
> UPSERTS for large data warehouses, is all I'm saying (or requires surroga=
te
> keys being 64 bit). Just caught me by surprise in comparison to other
> RDBMSs offering some form of UPSERTs that would not consume a sequencer i=
f
> its values weren't required to resolve conflicts.

If you use normal sequences you already need to cope with gaps in
sequences. And ids needing to be 64bits if you're a longer lived
business and it's for a a halfway "popular" table, isn't something new.


So I don't think addressing this is worth adding code for, especially
given the fragility of the situations where it'd be usable.

Re: BUG #13846: INSERT ON CONFLICT consumes sequencersonconflicts

От
Paul
Дата:
QWxsIHRoaW5ncyBhcmUgcmVsYXRpdmUsIHRvIGhvdyBtdWNoIGRhdGEsIGFuZCBob3cgYmlnIHRo
ZSBiYW5rLiBBbmQgaGF2aW5nIGJ1aWx0IGdpZ2FudGljIChiaWxsaW9ucyBvZiBiaWxsaW9ucyBv
ZiBmYWN0IHJlY29yZHMsIGh1bmRyZWRzIG9mIGRpbWVuc2lvbnMsIG1pbGxpb25zIG9mIG1lbWJl
cnMpIGRhdGEgd2FyZWhvdXNlIGFuZCBvbGFwIHNvbHV0aW9ucywgeW91IHJlYWxseSBkbyB0cnkg
dG8ga2VlcCBzdHVmZiBzbWFsbCBiZWNhdXNlIGl0IGRvZXMgaGF2ZSBhbiBpbXBhY3Qgb24gY29z
dCwgc2l6ZSwgYW5kIHBlcmZvcm1hbmNlLg0KIA0KT24gYSBtb3JlIGNvbmNlcHR1YWwgbm90ZSwg
aWYgYSBzZXF1ZW5jZS1kZWZhdWx0ZWQgY29sdW1uIGlzIHVzZWQgdG8gcmVzb2x2ZSBhIGNvbmZs
aWN0LCBpLmUuIGl04oCZcyBpbiB0aGUgY29uZmxpY3RfdGFyZ2V0LCB0aGFuIGJ5IGRlZmluaXRp
b24gdGhlcmUgd2lsbCBuZXZlciBiZSBhIGNvbmZsaWN0IQ0KDQpTbyBhbnl0aGluZyBhYm91dCDi
gJhmcmFnaWxpdHnigJkgYW5kIOKAmHBlZWstYWhlYWTigJkgYW5kIOKAmEJFRk9SRSB0cmlnZ2Vy
c+KAmSBub3Qgd29ya2luZyBqdXN0IGRvZXNu4oCZdCBtYWtlIGFueSBzZW5zZS4gSWYgYSBzZXF1
ZW5jZSBpcyB1c2VkIHRvIHJlc29sdmUgYSBjb25mbGljdCwgdGhlcmUgd2lsbCBuZXZlciBiZSBh
IGNvbmZsaWN0LCBhbmQgaWYgaXQgaXNu4oCZdCwgdGhlcmXigJlzIG5vIG5lZWQgdG8gZ2V0IHRo
ZSBuZXh0IHNlcXVlbmNlZCB2YWx1ZSBiZWZvcmUgYmVpbmcgYWJsZSB0byBkZXRlcm1pbmUgY29u
ZmxpY3RpbmcgcmVjb3JkcyBtZWFuaW5nIG5vIGNvbmN1cnJlbmN5IGlzc3VlczsgeW91IHdvdWxk
IHN0aWxsIG9ubHkgbmVlZCB0byBnZXQgdGhlIG5leHQgc2VxdWVudGlhbCB2YWx1ZSBvbmx5IGZv
ciBhbiBhY3R1YWwgaW5zZXJ0Lg0KDQoNCg0KDQpGcm9tOiBQZXRlciBHZW9naGVnYW4NClNlbnQ6
IFdlZG5lc2RheSwgSmFudWFyeSA2LCAyMDE2IDI6MTMgUE0NClRvOiBQYXVsDQpDYzogQW5kcmVz
IEZyZXVuZDsgcGdzcWwtYnVncw0KU3ViamVjdDogUmU6IFtCVUdTXSBCVUcgIzEzODQ2OiBJTlNF
UlQgT04gQ09ORkxJQ1QgY29uc3VtZXMgc2VxdWVuY2Vyc29uY29uZmxpY3RzDQoNCk9uIFdlZCwg
SmFuIDYsIDIwMTYgYXQgMTowMCBQTSwgUGF1bCA8cGF1bEBzYWxlc2ludGVsLmNvbT4gd3JvdGU6
DQo+IE15IG9ubHkgcG9pbnQgaXMsIHRoZXJl4oCZcyBhbm90aGVyIGdyZWF0IGNhcGFiaWxpdHkg
aW4gUG9zdGdyZXMsIGZvciBkb2luZw0KPiBlZmZpY2llbnQgY29uY3VycmVudCBVUFNFUlRTLCBi
dXQgdGhhdCBvbmUgb2YgdGhlIG1vc3QgY29tbW9uIGFuZCBwcmltZSB1c2UNCj4gY2FzZXMgZm9y
IGl0IGlzIGdvaW5nIHRvIGNvc3QgbW9yZSBtb25leSBmcm9tIGhhdmluZyB0byB1c2UgNjRiaXQg
bnVtYmVycw0KPiBldmVyeXdoZXJlIGZvciBubyBnb29kIHJlYXNvbi4NCg0KTm90IHJlYWxseTsg
YXQgbGVhc3QsIGl0IHdvbid0IGluY3JlYXNlIHRoZSBzaXplIG9mIGEgcGxhaW4gQklHU0VSSUFM
DQpwcmltYXJ5IGtleSBpbmRleCAod2hpY2ggd2lsbCBnZW5lcmFsbHkgbm90IGJlIGNvbXBvc2l0
ZSk6DQoNCnBvc3RncmVzPSMgY3JlYXRlIHRhYmxlIGZvbyBhcyBzZWxlY3QgKHJhbmRvbSgpICog
MTAwMDAwMDAwKTo6aW50NA0KcGtleSBmcm9tIGdlbmVyYXRlX3NlcmllcygxLCAxMDAwMDApOw0K
U0VMRUNUIDEwMDAwMA0KcG9zdGdyZXM9IyBjcmVhdGUgaW5kZXggb24gZm9vIChwa2V5KTsNCkNS
RUFURSBJTkRFWA0KDQpwb3N0Z3Jlcz0jIFxkdCsgZm9vDQogICAgICAgICAgICAgICAgICAgTGlz
dCBvZiByZWxhdGlvbnMNCiBTY2hlbWEg4pSCIE5hbWUg4pSCIFR5cGUgIOKUgiBPd25lciDilIIg
IFNpemUgICDilIIgRGVzY3JpcHRpb24NCuKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKU
gOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKU
gOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKU
gOKUgOKUgOKUgOKUgOKUgOKUgA0KIHB1YmxpYyDilIIgZm9vICDilIIgdGFibGUg4pSCIHBnICAg
IOKUgiAzNTQ0IGtCIOKUgg0KKDEgcm93KQ0KDQpwb3N0Z3Jlcz0jIFxkaSsgZm9vX3BrZXlfaWR4
DQogICAgICAgICAgICAgICAgICAgICAgICAgICBMaXN0IG9mIHJlbGF0aW9ucw0KIFNjaGVtYSDi
lIIgICAgIE5hbWUgICAgIOKUgiBUeXBlICDilIIgT3duZXIg4pSCIFRhYmxlIOKUgiAgU2l6ZSAg
IOKUgiBEZXNjcmlwdGlvbg0K4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA
4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS8
4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA
4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA
4pSADQogcHVibGljIOKUgiBmb29fcGtleV9pZHgg4pSCIGluZGV4IOKUgiBwZyAgICDilIIgZm9v
ICAg4pSCIDIyMDgga0Ig4pSCDQooMSByb3cpDQoNCnBvc3RncmVzPSMgYWx0ZXIgdGFibGUgZm9v
IGFsdGVyIGNvbHVtbiBwa2V5IHR5cGUgaW50ODsNCkFMVEVSIFRBQkxFDQoNCnBvc3RncmVzPSMg
XGRpKyBmb29fcGtleV9pZHgNCiAgICAgICAgICAgICAgICAgICAgICAgICAgIExpc3Qgb2YgcmVs
YXRpb25zDQogU2NoZW1hIOKUgiAgICAgTmFtZSAgICAg4pSCIFR5cGUgIOKUgiBPd25lciDilIIg
VGFibGUg4pSCICBTaXplICAg4pSCIERlc2NyaXB0aW9uDQrilIDilIDilIDilIDilIDilIDilIDi
lIDilLzilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDi
lIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDi
lIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDi
lIDilIDilIDilIDilIDilIDilIANCiBwdWJsaWMg4pSCIGZvb19wa2V5X2lkeCDilIIgaW5kZXgg
4pSCIHBnICAgIOKUgiBmb28gICDilIIgMjIwOCBrQiDilIINCigxIHJvdykNCg0KVGhlIHVuY2hh
bmdlZCBzaXplIG9mIHRoZSBpbmRleCBmb29fcGtleV9pZHggc2VlbiBoZXJlIGR1ZSB0bw0KYWxp
Z25tZW50IGNvbnNpZGVyYXRpb25zLiBHcmFudGVkLCB0aGUgaGVhcCBtaWdodCBzdGlsbCBiZSBh
IGJpdA0KbGFyZ2VyIHRoYW4gaXQgd291bGQgb3RoZXJ3aXNlIGJlLCBiZWNhdXNlIGl0IHdpbGwg
dXN1YWxseSBiZQ0KImNvbXBvc2l0ZSIsIGJ1dCBJIHRoaW5rIHRoZSAiY29zdCIgb2YgeW91ciB1
c2luZyBpbnQ4IHdpbGwgbm90IGJyZWFrDQp0aGUgYmFuay4NCg0KLS0gDQpQZXRlciBHZW9naGVn
YW4NCg0K

Re: BUG #13846: INSERT ON CONFLICT consumes sequencersonconflicts

От
Paul
Дата:
SeKAmW0gbG9va2luZyBhdCBtYXRoIG1vcmUgbGlrZSBhIHNpbmdsZSBGYWN0IHRhYmxlIGhhdmlu
ZyAgNTAwIG1pbGxpb24gcmVjb3Jkcywgd2l0aCAxMCBkaW1lbnNpb24gY29sdW1ucy4gSWYgSU5U
cyB3ZXJlIHVzZWQgZm9yIHRoZSBkaW1lbnNpb24gY29sdW1ucywgdGhhdOKAmXMgMjBHQi4gSWYg
SSBoYWQgdG8gY2hhbmdlIHRob3NlIGRpbWVuc2lvbiBjb2x1bW5zIHRvIEJJR0lOVHMsIHRoYXTi
gJlzIDQwR0IuIFRoaXMgY2FuIGltcGFjdCBob3cgbXVjaCB5b3UgY2FuIGZpdCBpbnRvIHNlcnZl
ciBtZW1vcnksIHdoZXJlIG9sYXBzIGxpa2UgdG8gbGl2ZSwgYW5kIGp1c3Qgc2xvdyBkb3duIG1v
dmluZyBzdHVmZiBhcm91bmQgYmV0d2VlbiBtZW1vcnkgYW5kIGRpc2sgYW5kIG92ZXIgdGhlIG5l
dHdvcmsgYW5kIGJhY2t1cHMsIGV0Yy4NCg0KT24gYSB0ZWNobmljYWwgbm90ZSwgd2h5IGlzIHRo
ZSBmb2xsb3dpbmcgZmxvdyBjb25zaWRlcmVkIOKAmGZyYWdpbGXigJk/DQoxKSBFdmFsdWF0ZSBv
bmx5IGNvbHVtbnMgdXNlZCBpbiBjb25mbGljdF90YXJnZXQNCmEuIENvbmZsaWN0LXJlc29sdmlu
ZyBDb2x1bW5zIHdpdGggZGVmYXVsdCBuZXh0dmFsKCkgaW5jcmVtZW50IHRoZSBjb3JyZXNwb25k
aW5nIHNlcXVlbmNlcg0KaS4gQW5kIGluIHRoaXMgY2FzZSwgdGhlcmUgd2VyZSBuZXZlciBiZSBj
b25mbGljdHMgYnkgZGVmaW5pdGlvbiwgc28gT04gQ09ORkxJQ1QgY2FuIGFsd2F5cyBiZSBpZ25v
cmVkDQoyKSBJZiBjb25mbGljdCwgRE8gVVBEQVRFDQphLiBJZiBuZXh0dmFsKCktZGVmYXVsdGVk
IGNvbHVtbiB1c2VkIGluIGNvbmZsaWN0X3RhcmdldCwgd2UgbmV2ZXIgZ2V0IGhlcmUNCmIuIERl
ZmF1bHRzIG5ldmVyIGV2YWx1YXRlZA0KMykgRWxzZSBFdmFsdWF0ZSByZW1haW5pbmcgY29sdW1u
cyBub3QgdXNlZCBpbiBjb25mbGljdF90YXJnZXQgYW5kIElOU0VSVA0KYS4gQ29sdW1ucyB3aXRo
IG5leHR2YWwoKSBpbmNyZW1lbnQgdGhlaXIgY29ycmVzcG9uZGluZyBzZXF1ZW5jZXINCg0KDQoN
Cg0KDQpGcm9tOiBQZXRlciBHZW9naGVnYW4NClNlbnQ6IFdlZG5lc2RheSwgSmFudWFyeSA2LCAy
MDE2IDI6MTMgUE0NClRvOiBQYXVsDQpDYzogQW5kcmVzIEZyZXVuZDsgcGdzcWwtYnVncw0KU3Vi
amVjdDogUmU6IFtCVUdTXSBCVUcgIzEzODQ2OiBJTlNFUlQgT04gQ09ORkxJQ1QgY29uc3VtZXMg
c2VxdWVuY2Vyc29uY29uZmxpY3RzDQoNCk9uIFdlZCwgSmFuIDYsIDIwMTYgYXQgMTowMCBQTSwg
UGF1bCA8cGF1bEBzYWxlc2ludGVsLmNvbT4gd3JvdGU6DQo+IE15IG9ubHkgcG9pbnQgaXMsIHRo
ZXJl4oCZcyBhbm90aGVyIGdyZWF0IGNhcGFiaWxpdHkgaW4gUG9zdGdyZXMsIGZvciBkb2luZw0K
PiBlZmZpY2llbnQgY29uY3VycmVudCBVUFNFUlRTLCBidXQgdGhhdCBvbmUgb2YgdGhlIG1vc3Qg
Y29tbW9uIGFuZCBwcmltZSB1c2UNCj4gY2FzZXMgZm9yIGl0IGlzIGdvaW5nIHRvIGNvc3QgbW9y
ZSBtb25leSBmcm9tIGhhdmluZyB0byB1c2UgNjRiaXQgbnVtYmVycw0KPiBldmVyeXdoZXJlIGZv
ciBubyBnb29kIHJlYXNvbi4NCg0KTm90IHJlYWxseTsgYXQgbGVhc3QsIGl0IHdvbid0IGluY3Jl
YXNlIHRoZSBzaXplIG9mIGEgcGxhaW4gQklHU0VSSUFMDQpwcmltYXJ5IGtleSBpbmRleCAod2hp
Y2ggd2lsbCBnZW5lcmFsbHkgbm90IGJlIGNvbXBvc2l0ZSk6DQoNCnBvc3RncmVzPSMgY3JlYXRl
IHRhYmxlIGZvbyBhcyBzZWxlY3QgKHJhbmRvbSgpICogMTAwMDAwMDAwKTo6aW50NA0KcGtleSBm
cm9tIGdlbmVyYXRlX3NlcmllcygxLCAxMDAwMDApOw0KU0VMRUNUIDEwMDAwMA0KcG9zdGdyZXM9
IyBjcmVhdGUgaW5kZXggb24gZm9vIChwa2V5KTsNCkNSRUFURSBJTkRFWA0KDQpwb3N0Z3Jlcz0j
IFxkdCsgZm9vDQogICAgICAgICAgICAgICAgICAgTGlzdCBvZiByZWxhdGlvbnMNCiBTY2hlbWEg
4pSCIE5hbWUg4pSCIFR5cGUgIOKUgiBPd25lciDilIIgIFNpemUgICDilIIgRGVzY3JpcHRpb24N
CuKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKU
gOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKU
gOKUgOKUgOKUgOKUvOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgOKUgA0KIHB1
YmxpYyDilIIgZm9vICDilIIgdGFibGUg4pSCIHBnICAgIOKUgiAzNTQ0IGtCIOKUgg0KKDEgcm93
KQ0KDQpwb3N0Z3Jlcz0jIFxkaSsgZm9vX3BrZXlfaWR4DQogICAgICAgICAgICAgICAgICAgICAg
ICAgICBMaXN0IG9mIHJlbGF0aW9ucw0KIFNjaGVtYSDilIIgICAgIE5hbWUgICAgIOKUgiBUeXBl
ICDilIIgT3duZXIg4pSCIFRhYmxlIOKUgiAgU2l6ZSAgIOKUgiBEZXNjcmlwdGlvbg0K4pSA4pSA
4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA
4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS8
4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pS84pSA
4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSADQogcHVibGljIOKUgiBmb29fcGtl
eV9pZHgg4pSCIGluZGV4IOKUgiBwZyAgICDilIIgZm9vICAg4pSCIDIyMDgga0Ig4pSCDQooMSBy
b3cpDQoNCnBvc3RncmVzPSMgYWx0ZXIgdGFibGUgZm9vIGFsdGVyIGNvbHVtbiBwa2V5IHR5cGUg
aW50ODsNCkFMVEVSIFRBQkxFDQoNCnBvc3RncmVzPSMgXGRpKyBmb29fcGtleV9pZHgNCiAgICAg
ICAgICAgICAgICAgICAgICAgICAgIExpc3Qgb2YgcmVsYXRpb25zDQogU2NoZW1hIOKUgiAgICAg
TmFtZSAgICAg4pSCIFR5cGUgIOKUgiBPd25lciDilIIgVGFibGUg4pSCICBTaXplICAg4pSCIERl
c2NyaXB0aW9uDQrilIDilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDi
lIDilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDi
lIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilLzilIDilIDilIDilIDilIDi
lIDilIDilIDilIDilLzilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIANCiBw
dWJsaWMg4pSCIGZvb19wa2V5X2lkeCDilIIgaW5kZXgg4pSCIHBnICAgIOKUgiBmb28gICDilIIg
MjIwOCBrQiDilIINCigxIHJvdykNCg0KVGhlIHVuY2hhbmdlZCBzaXplIG9mIHRoZSBpbmRleCBm
b29fcGtleV9pZHggc2VlbiBoZXJlIGR1ZSB0bw0KYWxpZ25tZW50IGNvbnNpZGVyYXRpb25zLiBH
cmFudGVkLCB0aGUgaGVhcCBtaWdodCBzdGlsbCBiZSBhIGJpdA0KbGFyZ2VyIHRoYW4gaXQgd291
bGQgb3RoZXJ3aXNlIGJlLCBiZWNhdXNlIGl0IHdpbGwgdXN1YWxseSBiZQ0KImNvbXBvc2l0ZSIs
IGJ1dCBJIHRoaW5rIHRoZSAiY29zdCIgb2YgeW91ciB1c2luZyBpbnQ4IHdpbGwgbm90IGJyZWFr
DQp0aGUgYmFuay4NCg0KLS0gDQpQZXRlciBHZW9naGVnYW4NCg0K

Re: BUG #13846: INSERT ON CONFLICT consumes sequencers onconflicts

От
Greg Stark
Дата:
On Wed, Jan 6, 2016 at 9:00 PM, Paul <paul@salesintel.com> wrote:
> As the raw event data is processed, I want to do an UPSERT into the devic=
e
> class table, using only the class name to resolve conflicts, to get it=E2=
=80=99s
> surrogate key, which is then used while inserting the related event recor=
d
> linking that event record to the particular device class.  Since device
> classes are rarely added, it will be the common case that an actual INSER=
T
> is almost never performed.


Fwiw while the implementation of UPSERT is fairly efficient it's
probably still too high an overhead to handle this kind of case.
Ingesting high volume of sensor data like this you want to be very
fast with little extra work. It's perfectly sensible to use UPSERT to
insert sensor ids but you wouldn't want to do so on every single
sensor datum or you'll end up spending more time and I/O doing that
than just ingesting your raw data.

Luckily in a case like that you don't expect the sensor ids to be
updated or deleted so you aren't really concerned about concurrency
except for a race between multiple insertions. So simply keeping a
list of known sensors and using upsert whenever a new sensor id is
seen (and periodically refreshing the list) would work perfectly well.
And independently of how sequences work you would want to be doing
that anyways for performance.

I'm more sympathetic to your concern about storage density than Peter
and Andres seem to be but unfortunately the reality is that it's much
harder to implement what you describe than it might appear. That often
happens in Postgres because it's a general purpose flexible platform
and has to handle whatever setup users create. So often solutions that
make perfect sense for a typical setup actually don't work in general.
Implementing UPSERT correctly for general case was actually really
really hard already so avoiding extra complications for features like
this is an important strategy for getting things done. I'm sure you'll
agree that we're better off with UPSERT with this limitation than no
UPSERT at all...

Fwiw this is the kind of feedback for which there are beta releases.
We need users to try to develop applications using new features before
they're released to find exactly these types of mismatches with user
expectations. Even if someone wanted to work on this now it wouldn't
happen until 9.6 which means any application that it would have helped
would probably already have run into the problem and had to adjust
already.

--=20
greg

Re: BUG #13846: INSERT ON CONFLICT consumessequencersonconflicts

От
Paul
Дата:
Hi Andres, appreciate your replies. I hope you will still read this when yo=
u can, as I believe I can make the financial case, and that the solution is=
 relatively simple. I also believe there=E2=80=99s still some misunderstand=
ing to the problem. Maybe you could point me to the right place in the code=
.




>> Row headers, padding & alignment makes that absolutely different in
>> reality. It's like a sub 20% difference.

Regarding the space increase going from 10 INTs to 10 BIGINTS per row, usin=
g my example of 500 million fact records, and referring to http://www.postg=
resql.org/docs/9.5/static/storage-page-layout.html

Using 10 INT columns per record, with row-index, row-header, and 64 bit ali=
gnment overhead, requires 4B (row Idx) + 24B (row hdr) + 40B (row data) =3D=
 68B per row.  With a page size of 8192B, minus 24B for the page header, gi=
ves 8168 / 68 =3D ~120 rows per page.

Using 20 BIGINT columns per record, with row-index, row-header, and 64 bit =
alignment overhead, requires 4B (row Idx) + 24B (row hdr) + 80B (row data) =
=3D 108B per row.  With a page size of 8192B, minus 24B for the page header=
, gives 8168 / 108 =3D ~75 rows per page.

1 =E2=80=93 75 / 120 =3D ~38%

So its much nearer 40%, and not sub 20%. When dealing in 10s to 100=E2=80=
=99s of GBs (there are terabyte warehouses), 38% is significant, and can ha=
ve a real financial impact with respect to memory, network, primary and bac=
kup storage costs.=20




>> And in pretty much all the cases with sufficient insertion rates you're
>> going to want bigints anyway. If there's few rows it doesn't matter
>> anyway.

This is true for the surrogate key (generated by a sequencer default) of a =
Fact table (which comes from operational business transactions, like taking=
 orders, or web-page visits). As I stated, there are warehouses with billio=
ns of fact records, that are just about always sequentially inserted and ha=
ve a timestamp holding when they occurred. I have never come across a pract=
ical need to do UPSERTs on Fact tables (i.e. transaction tables, event tabl=
es, data generated by things occurring in the real world etc.).

But the table where the UPSERT is being applied is not the Fact table! It=
=E2=80=99s a related Dimension table, and in most cases, the INSERT path wi=
ll not be taken!! A Fact table will have columns representing Members along=
 Dimensions. For example, an Order Fact table holding records for individua=
l Order Line Items, would most likely have a Dimension column to represent =
the Status of the Line Item; i.e. Open, Shipped, Invoiced, Closed. There wo=
uld be a separate Status dimension table, with a SMALLINT column, that woul=
d be used in the Fact table to relate an Order Line Item to a particular St=
atus. It is the Status Dimension table upon which an UPSERT would be perfor=
med for every Order Line Item ETL=E2=80=99d from the OLTP system into the w=
arehouse.

So, I=E2=80=99m not clear how your above statement applies exactly.



>> The fragility comes from the fact that it'd only be used in a subset of
>> cases. E.g. if a BEFORE trigger is present the to-be-inserted rows needs
>> to be "materialized" to be presented to the trigger.
(I=E2=80=99m assuming you mean the BEFORE INSERT trigger, and I=E2=80=99m n=
oting you state =E2=80=98to-be-inserted=E2=80=99 records.)

I would then argue that the BEFORE trigger is being applied inappropriately=
 in the case of an INSERT with an ON CONFLICT clause. First, I think the pu=
rpose of the INSERT ON CONFLICT is to implement UPSERTs; would you agree? W=
e were always able to implement UPSERT logic before, it just was a pain in =
the butt because of concurrency. Before, I would have had a separate INSERT=
 statement and a separate UPDATE statement. If I had a BEFORE INSERT trigge=
r, it would only execute for records that were actually going to be inserte=
d. However, you=E2=80=99re now stating that the INSERT ON CONFLICT will exe=
cute a BEFORE INSERT trigger even for records that are not inserted?? That =
doesn=E2=80=99t seem quite logical, and contrary to =E2=80=98old fashioned=
=E2=80=99 upsert logic. Does this also mean that the BEFORE UPDATE trigger =
is always called as well, or is it never called?



>> Wrong. Rows with sequences can very well conflict, there's nothing
>> forcing sequences to always be used.

My oversight, you are correct here, should an explicit value be provided fo=
r a column normally defaulted from a sequencer. But that doesn=E2=80=99t in=
validate the basic premise of what I=E2=80=99ve been trying to get across.



>> Also note that sequence default values aren't in any way different from =
other
>> default values, and that relevant pieces of code currently don't know
>> whether a default value is a nextval or not.

The relevant pieces of code wouldn=E2=80=99t need to know anything about ho=
w the default value is computed for any column. The relevant code merely ne=
eds to only evaluate default values for only the columns used to resolve co=
nflicts, and not all of the columns. Then, for the records that don=E2=80=
=99t conflict, evaluate any remaining default expressions, and then call th=
e BEFORE INSERT trigger with only the records that are actually going to be=
 inserted!!


I greatly welcome your thoughts, and please point to the right place in the=
 code.

Regards,
paul


From: Andres Freund
Sent: Wednesday, January 6, 2016 3:11 PM
To: Paul
Cc: Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICT consumessequencersonconf=
licts

Hi,

On 2016-01-06 15:00:17 -0700, Paul wrote:
> I=E2=80=99m looking at math more like a single Fact table having  500 mil=
lion
> records, with 10 dimension columns. If INTs were used for the
> dimension columns, that=E2=80=99s 20GB.

> If I had to change those dimension columns to BIGINTs, that=E2=80=99s
> 40GB. This can impact how much you can fit into server memory, where
> olaps like to live, and just slow down moving stuff around between
> memory and disk and over the network and backups, etc.

Row headers, padding & alignment makes that absolutely different in
reality. It's like a sub 20% difference.

And in pretty much all the cases with sufficient insertion rates you're
going to want bigints anyway. If there's few rows it doesn't matter
anyway.


> On a technical note, why is the following flow considered =E2=80=98fragil=
e=E2=80=99?

The fragility comes from the fact that it'd only be used in a subset of
cases. E.g. if a BEFORE trigger is present the to-be-inserted rows needs
to be "materialized" to be presented to the trigger.


> 1) Evaluate only columns used in conflict_target
> a. Conflict-resolving Columns with default nextval() increment the corres=
ponding sequencer
> i. And in this case, there were never be conflicts by definition, so
> ON CONFLICT can always be ignored

Wrong. Rows with sequences can very well conflict, there's nothing
forcing sequences to always be used.

Also note that sequence default values aren't in any way different from oth=
er
default values, and that relevant pieces of code currently don't know
whether a default value is a nextval or not.

> 2) If conflict, DO UPDATE
> a. If nextval()-defaulted column used in conflict_target, we never get he=
re
> b. Defaults never evaluated
> 3) Else Evaluate remaining columns not used in conflict_target and INSERT
> a. Columns with nextval() increment their corresponding sequencer

Perhaps read the code, and prototype it? I can tell you that it'd be a
significant amount of work, and that I'm personally absolutely not
interested in investing significant amounts time into it. But that
doesn't have to stop *you*.


Anyway, EOD for me.

Re: BUG #13846: INSERT ON CONFLICT consumessequencersonconflicts

От
"David G. Johnston"
Дата:
On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote:

> If I had a BEFORE INSERT trigger, it would only execute for records that
> were actually going to be inserted. However, you=E2=80=99re now stating t=
hat the
> INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for records
> that are not inserted?? That doesn=E2=80=99t seem quite logical, and cont=
rary to
> =E2=80=98old fashioned=E2=80=99 upsert logic. Does this also mean that th=
e BEFORE UPDATE
> trigger is always called as well, or is it never called?
>

=E2=80=8BThis seems to boil down to the two possible ways of manually imple=
menting
UPSERT:

UPDATE, if not present, INSERT
INSERT, if failing, UPDATE

In the later the before insert trigger fires and influences whether the
insert=E2=80=8B

=E2=80=8Bfails.  In the former you are already pretty certain the insert wi=
ll
=E2=80=8Bsucceed because the UPDATE found no records.

We've implemented INSERT, if failing UPDATE.  The insert has to be
attempted and right now there is no concept of targeted partial
deferrability when constructing the record to be inserted.

To solve this situation it is likely that some form of "UPDATE ON MISSING
INSERT" would need to be designed.  The insert portion would specify
"DEFAULT" for sequence columns and would execute nextval() only if the ON
MISSING portion is executed.

Fundamentally, the difference is that ON MISSING is considerably less
complicated than ON CONFLICT.  What is wanted here is an ON MISSING
interpretation but what we've implemented is ON CONFLICT.  It seems that
the hackers are in agreement that our implementation of ON CONFLICT is
consistent with its definition.  That it doesn't efficiently solve problems
better handled by ON MISSING - while unfortunate - doesn't constitute a
bug: only an opportunity for future enhancement.

David J.


=E2=80=8B

Re: BUG #13846: INSERT ON CONFLICT consumessequencersonconflicts

От
Peter Geoghegan
Дата:
On Fri, May 6, 2016 at 12:02 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> To solve this situation it is likely that some form of "UPDATE ON MISSING
> INSERT" would need to be designed.  The insert portion would specify
> "DEFAULT" for sequence columns and would execute nextval() only if the ON
> MISSING portion is executed.

That's unworkable, at least without accepting a bunch of new
edge-cases, like having the insert then have a duplicate violation
involving a value that was determined to not exist in the first phase.
IOW, it's unworkable to do an insert on the basis of an *absence* of
something in an index or in a table (and not get those edge-cases).
Doing so on the basis of the *presence* of a value (i.e. INSERT ... ON
CONFLICT DO UPDATE as implemented) lets the implementation clamp down
on race conditions enough to provide those useful user-visible
guarantees about getting 1 of 2 possible outcomes.

There are multiple definitions of a value "existing" here that are in
tension here. It's rather complicated.

--
Peter Geoghegan

Re: BUG #13846: INSERT ON CONFLICT consumessequencersonconflicts

От
"David G. Johnston"
Дата:
On Fri, May 6, 2016 at 12:10 PM, Peter Geoghegan <pg@heroku.com> wrote:

> On Fri, May 6, 2016 at 12:02 PM, David G. Johnston
> <david.g.johnston@gmail.com> wrote:
> > To solve this situation it is likely that some form of "UPDATE ON MISSI=
NG
> > INSERT" would need to be designed.  The insert portion would specify
> > "DEFAULT" for sequence columns and would execute nextval() only if the =
ON
> > MISSING portion is executed.
>
> That's unworkable, at least without accepting a bunch of new
> edge-cases, like having the insert then have a duplicate violation
> involving a value that was determined to not exist in the first phase.
> IOW, it's unworkable to do an insert on the basis of an *absence* of
> something in an index or in a table (and not get those edge-cases).
> Doing so on the basis of the *presence* of a value (i.e. INSERT ... ON
> CONFLICT DO UPDATE as implemented) lets the implementation clamp down
> on race conditions enough to provide those useful user-visible
> guarantees about getting 1 of 2 possible outcomes.
>
> There are multiple definitions of a value "existing" here that are in
> tension here. It's rather complicated.
>

=E2=80=8BUPDATE ON MISSING INSERT ON CONFLICT DO THE ORIGINAL UPDATE ...

=E2=80=8BI'm nowhere near close enough to this to contribute deeply - the m=
ost I
hope for is to spark an idea in someone else.

=E2=80=8BDavid J.
=E2=80=8B

Re: BUG #13846: INSERT ON CONFLICTconsumessequencersonconflicts

От
Paul
Дата:
Hi Dave,=20
Thanks for the informative reply. We absolutely agree it=E2=80=99s not a bu=
g in implementation, but maybe a bug in conceptualization with regard to us=
ability. Not knowing the initial motive for the feature, we assumed it was =
meant for an intuitively simpler form of the rather common upsert pattern, =
but it seems INSERT ON CONFLICT was meant for some other purpose that we do=
n=E2=80=99t quite understand. We stopped using and went back to a manual ap=
proach, which also works great because PostgreSQL is just kinda great =F0=
=9F=98=8A.

If any related enhancements where to be done, our =E2=80=98holy grail=E2=80=
=99 would be an implementation of the MERGE statement, like this, or this, =
or this=E2=80=A6 We can at least ask for it right?? =F0=9F=98=89

Thanks for all your guys=E2=80=99 hard work

-p

From: David G. Johnston
Sent: Friday, May 6, 2016 1:02 PM
To: Paul
Cc: Andres Freund; Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICTconsumessequencersonconfl=
icts

On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote:
If I had a BEFORE INSERT trigger, it would only execute for records that we=
re actually going to be inserted. However, you=E2=80=99re now stating that =
the INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for record=
s that are not inserted?? That doesn=E2=80=99t seem quite logical, and cont=
rary to =E2=80=98old fashioned=E2=80=99 upsert logic. Does this also mean t=
hat the BEFORE UPDATE trigger is always called as well, or is it never call=
ed?

=E2=80=8BThis seems to boil down to the two possible ways of manually imple=
menting UPSERT:

UPDATE, if not present, INSERT
INSERT, if failing, UPDATE

In the later the before insert trigger fires and influences whether the ins=
ert=E2=80=8B
=C2=A0
=E2=80=8Bfails.=C2=A0 In the former you are already pretty certain the inse=
rt will =E2=80=8Bsucceed because the UPDATE found no records.

We've implemented INSERT, if failing UPDATE.=C2=A0 The insert has to be att=
empted and right now there is no concept of targeted partial deferrability =
when constructing the record to be inserted.

To solve this situation it is likely that some form of "UPDATE ON MISSING I=
NSERT" would need to be designed.=C2=A0 The insert portion would specify "D=
EFAULT" for sequence columns and would execute nextval() only if the ON MIS=
SING portion is executed.

Fundamentally, the difference is that ON MISSING is considerably less compl=
icated than ON CONFLICT.=C2=A0 What is wanted here is an ON MISSING interpr=
etation but what we've implemented is ON CONFLICT.=C2=A0 It seems that the =
hackers are in agreement that our implementation of ON CONFLICT is consiste=
nt with its definition.=C2=A0 That it doesn't efficiently solve problems be=
tter handled by ON MISSING - while unfortunate - doesn't constitute a bug: =
only an opportunity for future enhancement.

David J.


=E2=80=8B

Re: BUG #13846: INSERT ONCONFLICTconsumessequencersonconflicts

От
Paul
Дата:
Just for clarity on the real financial problem of usability,

In our situation, we have =E2=80=98event=E2=80=99 tables, and related =E2=
=80=98category=E2=80=99 tables; I=E2=80=99m simplifying and generalizing a =
bit. The event tables capture streams of activity. Each record has several =
columns that categorize each event. Sometimes new categories come along fro=
m the raw event data source, but several categories would never have more t=
han a couple hundred discreet values, if that. We also need to filter, grou=
p, and aggregate the events along some of those categories.

So structure is something like this (although with more category columns)

CREATE TABLE event (
  id BIGSERAL NOT NULL,
  cat_a_id SMALLINT NOT NULL,  /* would have to be BIGINT if using IOC */
  cat_b_id SMALLINT NOT NULL,  /* would have to be BIGINT if using IOC */
  measure FLOAT,
  CONSTRAINT pk_event PRIMARY KEY (id)
);
CREATE INDEX ix_event1 ON event USING BTREE(cat_a_id);
CREATE INDEX ix_event2 ON event USING BTREE(cat_b_id, cat_a_id);

CREATE TABLE cat_a (
  id SMALLSERIAL NOT NULL,  /* would have to be BIGSERIAL if using IOC */
  name TEXT NOT NULL,
  CONSTRAINT pk_cat_a PRIMARY KEY (id)
);
CREATE UNIQUE INDEX ux_cat_a ON EVENT USING BTREE(name);

CREATE TABLE cat_b (
  id SMALLSERIAL NOT NULL,  /* would have to be BIGSERIAL if using IOC */
  name TEXT NOT NULL,
  CONSTRAINT pk_cat_b PRIMARY KEY (id)
);
CREATE UNIQUE INDEX ux_cat_b ON EVENT USING BTREE(name);

Some =E2=80=98raw=E2=80=99 input source records might look like this
=E2=80=9Cso-reg=E2=80=9D, =E2=80=9Copen=E2=80=9D, 10
=E2=80=9Cso-reg=E2=80=9D, =E2=80=9Cstalled=E2=80=9D, 1
=E2=80=9Cno-reg=E2=80=9D, =E2=80=9Cstalled=E2=80=9D, 1
=E2=80=9Cunknown=E2=80=99, =E2=80=9Congoing=E2=80=9D, 1

As part of ingesting the raw event source into the event table, we do an up=
sert on the category tables, get the category=E2=80=99s id, then store the =
id in the respective category column in the event table.

Just in a few months, one of our event tables, having 9 category type colum=
ns, is already up over 200M records, and may hit a 1B in a few more months,=
 and maybe in a year or so over 4B.

If we had continued to use INSERT ON CONFLICT [IOC], the cat_a_id and cat_b=
_id columns in the event table would have to be changed from SMALLINT to BI=
GINT, just because the related id sequences from the cat_a and cat_b tables=
 are getting consumed on every insert into the event table, even though 99.=
9% of the time nothing is actually being inserted into the cat_a and cat_b =
tables.

We=E2=80=99re running our stuff in a cloud host, so we=E2=80=99re paying fo=
r memory, cpu, SSD storage, HDD storage, and network bandwidth (backups) on=
 a monthly basis.

So if using IOC, our one event table with 9 category columns at 200M record=
s goes from 9 * 2 =3D 18 bytes-per-record to  9 * 8 =3D 72 bytes-per-record=
. Or, from 3.6GB to 14.4GB! At a 1B events 18GB to 72GB!! And that doesn=E2=
=80=99t include the indexes on the event table for certain category columns=
 and combinations, and that=E2=80=99s just one table in one replica.

56GB extra means we can fit way less of the event rows into memory, it take=
s up way more expensive SSD space (and HDD for backup), uses way more bandw=
idth and just takes longer for backups and replication, and the CPU when do=
ing sorts and grouping etc has to compare lots more bits. This can add up t=
o thousands of dollars or more over time.

Also, if we had kept using IOC and made the category columns just INT, woul=
d we have had to convert 4B records from INT to BIGINT sometime next year. =
Ouch!!

So, are problem with using IOC isn=E2=80=99t at all technical, it=E2=80=99s=
 financial! Very expensive just to save a few lines of SQL=E2=80=A6

I=E2=80=99m kinda curious what the initial use cases for IOC were? If it ac=
tually was for UPSERTs, under what conditions? A huge amount of data is jus=
t events and categories; i.e. tweets, emails, sensor readings, purchase ord=
ers, etc., and region, person, device, product-brand, etc. Ingesting this k=
ind of data is a prime use case for UPSERTs.. Just curious what problem IOC=
 is solving?

Anyways, none of this is meant to be at all a knock on anyone contributing =
to PG.. It=E2=80=99s an absolutely amazing amount of high quality value for=
 free.. Thanks to all of you.

From: Paul
Sent: Friday, May 6, 2016 2:06 PM
To: David G. Johnston
Cc: Andres Freund; Peter Geoghegan; pgsql-bugs
Subject: RE: [BUGS] BUG #13846: INSERT ONCONFLICTconsumessequencersonconfli=
cts

Hi Dave,=20
Thanks for the informative reply. We absolutely agree it=E2=80=99s not a bu=
g in implementation, but maybe a bug in conceptualization with regard to us=
ability. Not knowing the initial motive for the feature, we assumed it was =
meant for an intuitively simpler form of the rather common upsert pattern, =
but it seems INSERT ON CONFLICT was meant for some other purpose that we do=
n=E2=80=99t quite understand. We stopped using and went back to a manual ap=
proach, which also works great because PostgreSQL is just kinda great =F0=
=9F=98=8A.

If any related enhancements where to be done, our =E2=80=98holy grail=E2=80=
=99 would be an implementation of the MERGE statement, like this, or this, =
or this=E2=80=A6 We can at least ask for it right?? =F0=9F=98=89

Thanks for all your guys=E2=80=99 hard work

-p

From: David G. Johnston
Sent: Friday, May 6, 2016 1:02 PM
To: Paul
Cc: Andres Freund; Peter Geoghegan; pgsql-bugs
Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICTconsumessequencersonconfl=
icts

On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote:
If I had a BEFORE INSERT trigger, it would only execute for records that we=
re actually going to be inserted. However, you=E2=80=99re now stating that =
the INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for record=
s that are not inserted?? That doesn=E2=80=99t seem quite logical, and cont=
rary to =E2=80=98old fashioned=E2=80=99 upsert logic. Does this also mean t=
hat the BEFORE UPDATE trigger is always called as well, or is it never call=
ed?

=E2=80=8BThis seems to boil down to the two possible ways of manually imple=
menting UPSERT:

UPDATE, if not present, INSERT
INSERT, if failing, UPDATE

In the later the before insert trigger fires and influences whether the ins=
ert=E2=80=8B
=C2=A0
=E2=80=8Bfails.=C2=A0 In the former you are already pretty certain the inse=
rt will =E2=80=8Bsucceed because the UPDATE found no records.

We've implemented INSERT, if failing UPDATE.=C2=A0 The insert has to be att=
empted and right now there is no concept of targeted partial deferrability =
when constructing the record to be inserted.

To solve this situation it is likely that some form of "UPDATE ON MISSING I=
NSERT" would need to be designed.=C2=A0 The insert portion would specify "D=
EFAULT" for sequence columns and would execute nextval() only if the ON MIS=
SING portion is executed.

Fundamentally, the difference is that ON MISSING is considerably less compl=
icated than ON CONFLICT.=C2=A0 What is wanted here is an ON MISSING interpr=
etation but what we've implemented is ON CONFLICT.=C2=A0 It seems that the =
hackers are in agreement that our implementation of ON CONFLICT is consiste=
nt with its definition.=C2=A0 That it doesn't efficiently solve problems be=
tter handled by ON MISSING - while unfortunate - doesn't constitute a bug: =
only an opportunity for future enhancement.

David J.


=E2=80=8B