Обсуждение: Postgres restore help
Hi,
I m trying to restore only data from a backup file.but i hav some primary key that already exists in the table.I want to restore non duplicate data only.
Please Help.
Thanks And Regards
Piyush
I would recommend restoring the table to a temporary table and then using something like:
INSERT INTO main_table SELECT temp_table.* FROM temp_table JOIN main_table USING (pk) WHERE main_table.pk IS NULL
Someone else here might have a more efficient way of phrasing that insert statement (I'm still fairly new to the concept of anti-joins). But I think something like this should work for you.
-Alessandro
On Tue, Feb 28, 2012 at 7:34 AM, Piyush Lenka <lenka.piyush@gmail.com> wrote:
Hi,I m trying to restore only data from a backup file.but i hav some primary key that already exists in the table.I want to restore non duplicate data only.Please Help.Thanks And RegardsPiyush
This is almost perfect :)
but won't work in that case in the way You expect, because of join type. This should be LEFT JOIN to select all from temp_table (left) and matching from main_table (right), then WHERE clausule will filter rows not existing in main_table. Without USING, which is shorthand (http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-FROM) it should look like this:
INSERT INTO main_table
SELECT temp_table.*
FROM temp_table
LEFT JOIN main_table ON (temp_table.pk = main_table.pk) /* instead of "ON (...)" you can use Alessandro's shorthand: "USING (pk)"*/
Regards,
Bartek
2012/2/28 Alessandro Gagliardi <alessandro@path.com>
I would recommend restoring the table to a temporary table and then using something like:INSERT INTO main_table SELECT temp_table.* FROM temp_table JOIN main_table USING (pk) WHERE main_table.pk IS NULLSomeone else here might have a more efficient way of phrasing that insert statement (I'm still fairly new to the concept of anti-joins). But I think something like this should work for you.-AlessandroOn Tue, Feb 28, 2012 at 7:34 AM, Piyush Lenka <lenka.piyush@gmail.com> wrote:Hi,I m trying to restore only data from a backup file.but i hav some primary key that already exists in the table.I want to restore non duplicate data only.Please Help.Thanks And RegardsPiyush
Argh! I can't believe I forgot the "LEFT". That's embarrassing. Anyway, I was wondering if a WHERE NOT EXISTS clause would be better. I'm still new to those, so didn't want to try to offer that as a solution, but I gather it can be more efficient than a JOIN in some cases.
On Tue, Feb 28, 2012 at 12:13 PM, Bartosz Dmytrak <bdmytrak@eranet.pl> wrote:
This is almost perfect :)but won't work in that case in the way You expect, because of join type. This should be LEFT JOIN to select all from temp_table (left) and matching from main_table (right), then WHERE clausule will filter rows not existing in main_table. Without USING, which is shorthand (http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-FROM) it should look like this:INSERT INTO main_tableSELECT temp_table.*FROM temp_tableLEFT JOIN main_table ON (temp_table.pk = main_table.pk) /* instead of "ON (...)" you can use Alessandro's shorthand: "USING (pk)"*/Regards,Bartek2012/2/28 Alessandro Gagliardi <alessandro@path.com>I would recommend restoring the table to a temporary table and then using something like:INSERT INTO main_table SELECT temp_table.* FROM temp_table JOIN main_table USING (pk) WHERE main_table.pk IS NULLSomeone else here might have a more efficient way of phrasing that insert statement (I'm still fairly new to the concept of anti-joins). But I think something like this should work for you.-AlessandroOn Tue, Feb 28, 2012 at 7:34 AM, Piyush Lenka <lenka.piyush@gmail.com> wrote:Hi,I m trying to restore only data from a backup file.but i hav some primary key that already exists in the table.I want to restore non duplicate data only.Please Help.Thanks And RegardsPiyush
Alessandro Gagliardi <alessandro@path.com> writes: > Argh! I can't believe I forgot the "LEFT". That's embarrassing. Anyway, I > was wondering if a WHERE NOT EXISTS clause would be better. I'm still new > to those, so didn't want to try to offer that as a solution, but I gather > it can be more efficient than a JOIN in some cases. Given a reasonably recent PG version, it should be exactly the same --- the planner will translate both of those into anti-joins. regards, tom lane
Let's have the tables like this:
CREATE TABLE "tblTest"
(
"RowId" serial NOT NULL,
"Value1" text,
"Value2" text,
"Value3" text,
"Value4" text,
"Value5" text,
"Value6" text,
CONSTRAINT "tblTest_pkey" PRIMARY KEY ("RowId" )
)
WITH (
OIDS=FALSE
);
and
CREATE TABLE "tblTest2"
(
"RowId" serial NOT NULL,
"Value1" text,
"Value2" text,
"Value3" text,
"Value4" text,
"Value5" text,
"Value6" text,
CONSTRAINT "tblTest2_pkey" PRIMARY KEY ("RowId" )
)
WITH (
OIDS=FALSE
);
I mean the same strcture, but with different number of rows and
tblTest - 999000
tblTest2 - 1000
let's try first
EXPLAIN
SELECT a.* FROM "tblTest" a
LEFT JOIN "tblTest2" b
ON (a."RowId" = b."RowId")
WHERE b."RowId" IS NULL;
'Hash Anti Join (cost=70.50..69551.31 rows=998058 width=339)'
' Hash Cond: (a."RowId" = b."RowId")'
' -> Seq Scan on "tblTest" a (cost=0.00..56876.58 rows=999058 width=339)'
' -> Hash (cost=58.00..58.00 rows=1000 width=4)'
' -> Seq Scan on "tblTest2" b (cost=0.00..58.00 rows=1000 width=4)'
and:
EXPLAIN
SELECT a.* FROM "tblTest" a
WHERE NOT EXISTS (SELECT "RowId" FROM "tblTest2" b WHERE b."RowId" = a."RowId")
'Hash Anti Join (cost=70.50..69551.31 rows=998058 width=339)'
' Hash Cond: (a."RowId" = b."RowId")'
' -> Seq Scan on "tblTest" a (cost=0.00..56876.58 rows=999058 width=339)'
' -> Hash (cost=58.00..58.00 rows=1000 width=4)'
' -> Seq Scan on "tblTest2" b (cost=0.00..58.00 rows=1000 width=4)'
for me looks the same :) but I personally prefer first.
There is also "...WHERE "RowId" NOT IN..." solution, but in that case it is a little faster when You are looking for rows from tblTest
EXPLAIN
SELECT * FROM "tblTest"
WHERE "RowId" NOT IN (SELECT "RowId" FROM "tblTest2")
'Seq Scan on "tblTest" (cost=60.50..59434.72 rows=499529 width=339)'
' Filter: (NOT (hashed SubPlan 1))'
' SubPlan 1'
' -> Seq Scan on "tblTest2" (cost=0.00..58.00 rows=1000 width=4)'
but it is much, much slower in opposite way
EXPLAIN
SELECT * FROM "tblTest2"
WHERE "RowId" NOT IN (SELECT "RowId" FROM "tblTest")
'Seq Scan on "tblTest2" (cost=0.00..34136318.00 rows=500 width=344)'
' Filter: (NOT (SubPlan 1))'
' SubPlan 1'
' -> Materialize (cost=0.00..65774.87 rows=999058 width=4)'
' -> Seq Scan on "tblTest" (cost=0.00..56876.58 rows=999058 width=4)'
Regards,
Bartek
2012/2/28 Alessandro Gagliardi <alessandro@path.com>
Argh! I can't believe I forgot the "LEFT". That's embarrassing. Anyway, I was wondering if a WHERE NOT EXISTS clause would be better. I'm still new to those, so didn't want to try to offer that as a solution, but I gather it can be more efficient than a JOIN in some cases.On Tue, Feb 28, 2012 at 12:13 PM, Bartosz Dmytrak <bdmytrak@eranet.pl> wrote:This is almost perfect :)but won't work in that case in the way You expect, because of join type. This should be LEFT JOIN to select all from temp_table (left) and matching from main_table (right), then WHERE clausule will filter rows not existing in main_table. Without USING, which is shorthand (http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-FROM) it should look like this:INSERT INTO main_tableSELECT temp_table.*FROM temp_tableLEFT JOIN main_table ON (temp_table.pk = main_table.pk) /* instead of "ON (...)" you can use Alessandro's shorthand: "USING (pk)"*/Regards,Bartek2012/2/28 Alessandro Gagliardi <alessandro@path.com>I would recommend restoring the table to a temporary table and then using something like:INSERT INTO main_table SELECT temp_table.* FROM temp_table JOIN main_table USING (pk) WHERE main_table.pk IS NULLSomeone else here might have a more efficient way of phrasing that insert statement (I'm still fairly new to the concept of anti-joins). But I think something like this should work for you.-AlessandroOn Tue, Feb 28, 2012 at 7:34 AM, Piyush Lenka <lenka.piyush@gmail.com> wrote:Hi,I m trying to restore only data from a backup file.but i hav some primary key that already exists in the table.I want to restore non duplicate data only.Please Help.Thanks And RegardsPiyush