Обсуждение: what is the PostgreSQL idiom for "insert or update"?
In my application, I receive large blocks of external data that needs
to be extracted / translated / loaded into the db, and many of these
data are duplicates of what's already there.
Consequently, I would like to do efficient "bulk loading" of tables
using multi-row INSERT commands, ignoring unique records that are
already present, where 'uniqueness' is defined by key constraints.
F'rinstance, assume:
CREATE TABLE "weather_observations" ("id" serial primary key,
"station_id" integer, "observation_time" timestamp, "temperature_c"
float)
CREATE UNIQUE INDEX "observation_index" ON "weather_observations"
("station_id", "observation_time")
Now I'd like to be able to do multi-row inserts, but ignoring
duplicate entries (specifically, those that would violate uniqueness
constraint of the index):
INSERT INTO weather (station_id, date, temperature) VALUES
(2257, '2001-01-01', 22.5),
(2257, '2001-01-02', 25.3);
INSERT INTO weather (station_id, date, temperature) VALUES
(2257, '2001-01-02', 25.5), -- ignored: record already present
(2257, '2001-01-03', 21.0);
What's the idiom for doing this in PostgreSQL?
[As an aside, in SQLite, you can modify an INSERT statement with "OR
IGNORE" to achieve this.]
Thanks!
On Wed, Mar 16, 2011 at 07:32:13AM -0700, Robert Poor wrote:
> In my application, I receive large blocks of external data that needs
> to be extracted / translated / loaded into the db, and many of these
> data are duplicates of what's already there.
>
> Consequently, I would like to do efficient "bulk loading" of tables
> using multi-row INSERT commands, ignoring unique records that are
> already present, where 'uniqueness' is defined by key constraints.
>
> F'rinstance, assume:
>
> CREATE TABLE "weather_observations" ("id" serial primary key,
> "station_id" integer, "observation_time" timestamp, "temperature_c"
> float)
> CREATE UNIQUE INDEX "observation_index" ON "weather_observations"
> ("station_id", "observation_time")
>
> Now I'd like to be able to do multi-row inserts, but ignoring
> duplicate entries (specifically, those that would violate uniqueness
> constraint of the index):
>
> INSERT INTO weather (station_id, date, temperature) VALUES
> (2257, '2001-01-01', 22.5),
> (2257, '2001-01-02', 25.3);
>
> INSERT INTO weather (station_id, date, temperature) VALUES
> (2257, '2001-01-02', 25.5), -- ignored: record already present
> (2257, '2001-01-03', 21.0);
>
> What's the idiom for doing this in PostgreSQL?
>
> [As an aside, in SQLite, you can modify an INSERT statement with "OR
> IGNORE" to achieve this.]
>
> Thanks!
>
Here is the current documented method:
http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html
Regards,
Ken
On Wed, Mar 16, 2011 at 7:32 AM, Robert Poor <rdpoor@gmail.com> wrote:
> INSERT INTO weather (station_id, date, temperature) VALUES
> (2257, '2001-01-01', 22.5),
> (2257, '2001-01-02', 25.3);
>
> INSERT INTO weather (station_id, date, temperature) VALUES
> (2257, '2001-01-02', 25.5), -- ignored: record already present
> (2257, '2001-01-03', 21.0);
>
> What's the idiom for doing this in PostgreSQL?
How about:
INSERT INTO weather (station_id, date, temperature )
SELECT A.station_id, A.date, A.temperature
FROM ( VALUES(2257, '2001-01-01', 22.5),
(2257, '2001-01-02', 25.3) ) AS A ( station_id, date, temperature)
LEFT JOIN weather AS B
ON ( A.station_id, A.date ) = ( B.station_id, B.date )
WHERE B.station_id IS NULL;
--
Regards,
Richard Broersma Jr.
Ken: On Wed, Mar 16, 2011 at 07:48, Kenneth Marshall <ktm@rice.edu> wrote: > On Wed, Mar 16, 2011 at 07:32:13AM -0700, Robert Poor wrote: >> ...I would like to do efficient "bulk loading" of tables >> using multi-row INSERT commands, ignoring unique records that are >> already present, where 'uniqueness' is defined by key constraints. > > Here is the current documented method: > > http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html The construct at the bottom of the page looks promising. I'm loading about 500 records in a single transaction; I'm not clear on how to recast the merge_db() method to handle multiple records. Can you elucidate? Thanks...
Richard: On Wed, Mar 16, 2011 at 08:45, Richard Broersma <richard.broersma@gmail.com> wrote: > How about: > > INSERT INTO weather (station_id, date, temperature ) > SELECT A.station_id, A.date, A.temperature > FROM ( VALUES(2257, '2001-01-01', 22.5), > (2257, '2001-01-02', 25.3) ) AS A ( station_id, date, temperature) > LEFT JOIN weather AS B > ON ( A.station_id, A.date ) = ( B.station_id, B.date ) > WHERE B.station_id IS NULL; That would work, though I was hoping there'd be some mechanism that used the key to determine if the incoming record was unique or not (rather than writing a custom query). Are there limits to the size of the VALUES sub-query? I'm processing about 500 records at a time, and each record is fairly wide (about 350 characters without the field names).
On Wed, Mar 16, 2011 at 9:31 AM, Robert Poor <rdpoor@gmail.com> wrote: > Are there limits to the size of the VALUES sub-query? I'm not too sure. I wonder if this limitation *could* be related to the maximum text field size which is < 1GB. -- Regards, Richard Broersma Jr.