Re: upsert doesn't seem to work..

Поиск
Список
Период
Сортировка
От Ben Morrow
Тема Re: upsert doesn't seem to work..
Дата
Msg-id 20130217094411.GA28188@anubis.morrow.me.uk
обсуждение исходный текст
Ответ на upsert doesn't seem to work..  (Bert <biertie@gmail.com>)
Список pgsql-sql
Quoth biertie@gmail.com (Bert):
> 
> We continuously load data from flat files in our database.
> We first insert the data into unlogged tables (in the loadoltp schema), and
> then we use the 'upsert' statement to transfer the data from the load table
> into the tables we are going to use.
> 
> The load tables are unlogged, and don't have indexes / pk's on them. All
> our 'real tables', which contains the data, always have a pk consisting out
> of 2 fields. In the example those are 'tick_server_id' and 'item_id'.
> 
> At first everything seems to run ok, however it seems that new fields
> aren't always inserted as desired.
> 
> 
> This is an example query which causes troubles:

That query is basically equivalent to something like
   create table "st_item" (       server_id   integer,       item_id     integer,       item_desc   text,       primary
key(server_id, item_id)   );   create table "st_item_insert" (       server_id   integer,       item_id     integer,
  item_desc   text   );
 
   with "upsert" as (       update "st_item" et       set "item_desc" = e.item_desc       from "st_item_insert" e
whereet.server_id = e.server_id           and et.item_id = e.item_id       returning et.server_id, et.item_id   )
insertinto "st_item"        ("server_id", "item_id", "item_desc")   select et.server_id, et.item_id, et.item_desc
from"st_item_insert" et   where et.server_id not in (           select et.server_id           from "upsert" b)
andet.item_id not in (           select et.item_id           from "upsert" b)
 

There are three problems here. The first is that the NOT IN subselect
selects from et instead of from b. In the context of this subselect "et"
is a table reference from outside the subselect, so it's treated as a
constant for each run of the subselect. That means that the subselect
will return the value you are testing against for every row in "upsert",
so if there were any updates at all you will make no insertions.

The second is that you are making two separate subselects. This means
that a row in st_item_insert will not be inserted if there is a row in
"upsert" with a matching server_id and a row in "upsert" with a matching
item_id, *even if they are different rows*. For instance, suppose
st_item_insert has
   2   1   foo   1   2   bar   2   2   baz

and the 'foo' and 'bar' entries get updated. The 'baz' entry will then
not get inserted, because the first subselect will find the 'foo' row
and the second will find the 'bar' row. What you need is a single row
subselect, like this:
   where (et.server_id, et.item_id) not in (       select "server_id", "item_id"       from "upsert")

The third is that upsert is not as simple as you think. It isn't
possible (at least, not in Postgres) to take a lock on a row which
doesn't exist, so it's possible that a concurrent transaction could
insert a row with a conflicting key between the time the UPDATE runs and
the time the INSERT runs. You need to either lock the whole table or use
the retry strategy documented in the 'Trapping Errors' section of the
PL/pgSQL documentation. Annoyingly, even 9's serializable transactions
don't seem to help here, at least not by my experiments.

Ben




В списке pgsql-sql по дате отправления:

Предыдущее
От: Ben Morrow
Дата:
Сообщение: Volatile functions in WITH
Следующее
От: Ben Morrow
Дата:
Сообщение: Re: Perform Function When The Rows Of A View Change