Re: BUG #13846: INSERT ONCONFLICTconsumessequencersonconflicts
От | Paul |
---|---|
Тема | Re: BUG #13846: INSERT ONCONFLICTconsumessequencersonconflicts |
Дата | |
Msg-id | 572d0f00.e33a320a.57866.ffffdf07@mx.google.com обсуждение исходный текст |
Ответ на | Re: BUG #13846: INSERT ON CONFLICTconsumessequencersonconflicts (Paul <paul@salesintel.com>) |
Список | pgsql-bugs |
Just for clarity on the real financial problem of usability, In our situation, we have =E2=80=98event=E2=80=99 tables, and related =E2= =80=98category=E2=80=99 tables; I=E2=80=99m simplifying and generalizing a = bit. The event tables capture streams of activity. Each record has several = columns that categorize each event. Sometimes new categories come along fro= m the raw event data source, but several categories would never have more t= han a couple hundred discreet values, if that. We also need to filter, grou= p, and aggregate the events along some of those categories. So structure is something like this (although with more category columns) CREATE TABLE event ( id BIGSERAL NOT NULL, cat_a_id SMALLINT NOT NULL, /* would have to be BIGINT if using IOC */ cat_b_id SMALLINT NOT NULL, /* would have to be BIGINT if using IOC */ measure FLOAT, CONSTRAINT pk_event PRIMARY KEY (id) ); CREATE INDEX ix_event1 ON event USING BTREE(cat_a_id); CREATE INDEX ix_event2 ON event USING BTREE(cat_b_id, cat_a_id); CREATE TABLE cat_a ( id SMALLSERIAL NOT NULL, /* would have to be BIGSERIAL if using IOC */ name TEXT NOT NULL, CONSTRAINT pk_cat_a PRIMARY KEY (id) ); CREATE UNIQUE INDEX ux_cat_a ON EVENT USING BTREE(name); CREATE TABLE cat_b ( id SMALLSERIAL NOT NULL, /* would have to be BIGSERIAL if using IOC */ name TEXT NOT NULL, CONSTRAINT pk_cat_b PRIMARY KEY (id) ); CREATE UNIQUE INDEX ux_cat_b ON EVENT USING BTREE(name); Some =E2=80=98raw=E2=80=99 input source records might look like this =E2=80=9Cso-reg=E2=80=9D, =E2=80=9Copen=E2=80=9D, 10 =E2=80=9Cso-reg=E2=80=9D, =E2=80=9Cstalled=E2=80=9D, 1 =E2=80=9Cno-reg=E2=80=9D, =E2=80=9Cstalled=E2=80=9D, 1 =E2=80=9Cunknown=E2=80=99, =E2=80=9Congoing=E2=80=9D, 1 As part of ingesting the raw event source into the event table, we do an up= sert on the category tables, get the category=E2=80=99s id, then store the = id in the respective category column in the event table. Just in a few months, one of our event tables, having 9 category type colum= ns, is already up over 200M records, and may hit a 1B in a few more months,= and maybe in a year or so over 4B. If we had continued to use INSERT ON CONFLICT [IOC], the cat_a_id and cat_b= _id columns in the event table would have to be changed from SMALLINT to BI= GINT, just because the related id sequences from the cat_a and cat_b tables= are getting consumed on every insert into the event table, even though 99.= 9% of the time nothing is actually being inserted into the cat_a and cat_b = tables. We=E2=80=99re running our stuff in a cloud host, so we=E2=80=99re paying fo= r memory, cpu, SSD storage, HDD storage, and network bandwidth (backups) on= a monthly basis. So if using IOC, our one event table with 9 category columns at 200M record= s goes from 9 * 2 =3D 18 bytes-per-record to 9 * 8 =3D 72 bytes-per-record= . Or, from 3.6GB to 14.4GB! At a 1B events 18GB to 72GB!! And that doesn=E2= =80=99t include the indexes on the event table for certain category columns= and combinations, and that=E2=80=99s just one table in one replica. 56GB extra means we can fit way less of the event rows into memory, it take= s up way more expensive SSD space (and HDD for backup), uses way more bandw= idth and just takes longer for backups and replication, and the CPU when do= ing sorts and grouping etc has to compare lots more bits. This can add up t= o thousands of dollars or more over time. Also, if we had kept using IOC and made the category columns just INT, woul= d we have had to convert 4B records from INT to BIGINT sometime next year. = Ouch!! So, are problem with using IOC isn=E2=80=99t at all technical, it=E2=80=99s= financial! Very expensive just to save a few lines of SQL=E2=80=A6 I=E2=80=99m kinda curious what the initial use cases for IOC were? If it ac= tually was for UPSERTs, under what conditions? A huge amount of data is jus= t events and categories; i.e. tweets, emails, sensor readings, purchase ord= ers, etc., and region, person, device, product-brand, etc. Ingesting this k= ind of data is a prime use case for UPSERTs.. Just curious what problem IOC= is solving? Anyways, none of this is meant to be at all a knock on anyone contributing = to PG.. It=E2=80=99s an absolutely amazing amount of high quality value for= free.. Thanks to all of you. From: Paul Sent: Friday, May 6, 2016 2:06 PM To: David G. Johnston Cc: Andres Freund; Peter Geoghegan; pgsql-bugs Subject: RE: [BUGS] BUG #13846: INSERT ONCONFLICTconsumessequencersonconfli= cts Hi Dave,=20 Thanks for the informative reply. We absolutely agree it=E2=80=99s not a bu= g in implementation, but maybe a bug in conceptualization with regard to us= ability. Not knowing the initial motive for the feature, we assumed it was = meant for an intuitively simpler form of the rather common upsert pattern, = but it seems INSERT ON CONFLICT was meant for some other purpose that we do= n=E2=80=99t quite understand. We stopped using and went back to a manual ap= proach, which also works great because PostgreSQL is just kinda great =F0= =9F=98=8A. If any related enhancements where to be done, our =E2=80=98holy grail=E2=80= =99 would be an implementation of the MERGE statement, like this, or this, = or this=E2=80=A6 We can at least ask for it right?? =F0=9F=98=89 Thanks for all your guys=E2=80=99 hard work -p From: David G. Johnston Sent: Friday, May 6, 2016 1:02 PM To: Paul Cc: Andres Freund; Peter Geoghegan; pgsql-bugs Subject: Re: [BUGS] BUG #13846: INSERT ON CONFLICTconsumessequencersonconfl= icts On Wed, Jan 6, 2016 at 5:14 PM, Paul <paul@salesintel.com> wrote: If I had a BEFORE INSERT trigger, it would only execute for records that we= re actually going to be inserted. However, you=E2=80=99re now stating that = the INSERT ON CONFLICT will execute a BEFORE INSERT trigger even for record= s that are not inserted?? That doesn=E2=80=99t seem quite logical, and cont= rary to =E2=80=98old fashioned=E2=80=99 upsert logic. Does this also mean t= hat the BEFORE UPDATE trigger is always called as well, or is it never call= ed? =E2=80=8BThis seems to boil down to the two possible ways of manually imple= menting UPSERT: UPDATE, if not present, INSERT INSERT, if failing, UPDATE In the later the before insert trigger fires and influences whether the ins= ert=E2=80=8B =C2=A0 =E2=80=8Bfails.=C2=A0 In the former you are already pretty certain the inse= rt will =E2=80=8Bsucceed because the UPDATE found no records. We've implemented INSERT, if failing UPDATE.=C2=A0 The insert has to be att= empted and right now there is no concept of targeted partial deferrability = when constructing the record to be inserted. To solve this situation it is likely that some form of "UPDATE ON MISSING I= NSERT" would need to be designed.=C2=A0 The insert portion would specify "D= EFAULT" for sequence columns and would execute nextval() only if the ON MIS= SING portion is executed. Fundamentally, the difference is that ON MISSING is considerably less compl= icated than ON CONFLICT.=C2=A0 What is wanted here is an ON MISSING interpr= etation but what we've implemented is ON CONFLICT.=C2=A0 It seems that the = hackers are in agreement that our implementation of ON CONFLICT is consiste= nt with its definition.=C2=A0 That it doesn't efficiently solve problems be= tter handled by ON MISSING - while unfortunate - doesn't constitute a bug: = only an opportunity for future enhancement. David J. =E2=80=8B
В списке pgsql-bugs по дате отправления:
Предыдущее
От: PaulДата:
Сообщение: Re: BUG #13846: INSERT ON CONFLICTconsumessequencersonconflicts
Следующее
От: Casey WiremanДата:
Сообщение: Re: BUG #14126: INSERT ON CONFLICT DO NOTHING auto increments serial primary key when no insert happens.