Re: Postgres restore help

Поиск
Список
Период
Сортировка
От Bartosz Dmytrak
Тема Re: Postgres restore help
Дата
Msg-id CAD8_UcbQ0hOx8+Gab4NQK_oYPw-kfp_5740+JkgMdni5pD4CZQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgres restore help  (Alessandro Gagliardi <alessandro@path.com>)
Список pgsql-novice

Let's have the tables like this:
CREATE TABLE "tblTest"
(
  "RowId" serial NOT NULL,
  "Value1" text,
  "Value2" text,
  "Value3" text,
  "Value4" text,
  "Value5" text,
  "Value6" text,
  CONSTRAINT "tblTest_pkey" PRIMARY KEY ("RowId" )
)
WITH (
  OIDS=FALSE
);

and 
CREATE TABLE "tblTest2"
(
  "RowId" serial NOT NULL,
  "Value1" text,
  "Value2" text,
  "Value3" text,
  "Value4" text,
  "Value5" text,
  "Value6" text,
  CONSTRAINT "tblTest2_pkey" PRIMARY KEY ("RowId" )
)
WITH (
  OIDS=FALSE
);

I mean the same strcture, but with different number of rows and
tblTest - 999000
tblTest2 - 1000

let's try first 
EXPLAIN
SELECT a.* FROM "tblTest" a
LEFT JOIN "tblTest2" b
ON (a."RowId" = b."RowId")
WHERE b."RowId" IS NULL;


'Hash Anti Join  (cost=70.50..69551.31 rows=998058 width=339)'
'  Hash Cond: (a."RowId" = b."RowId")'
'  ->  Seq Scan on "tblTest" a  (cost=0.00..56876.58 rows=999058 width=339)'
'  ->  Hash  (cost=58.00..58.00 rows=1000 width=4)'
'        ->  Seq Scan on "tblTest2" b  (cost=0.00..58.00 rows=1000 width=4)'


and:
EXPLAIN 
SELECT a.* FROM "tblTest" a
WHERE NOT EXISTS (SELECT "RowId" FROM "tblTest2" b WHERE b."RowId" = a."RowId")

'Hash Anti Join  (cost=70.50..69551.31 rows=998058 width=339)'
'  Hash Cond: (a."RowId" = b."RowId")'
'  ->  Seq Scan on "tblTest" a  (cost=0.00..56876.58 rows=999058 width=339)'
'  ->  Hash  (cost=58.00..58.00 rows=1000 width=4)'
'        ->  Seq Scan on "tblTest2" b  (cost=0.00..58.00 rows=1000 width=4)'

for me looks the same :) but I personally prefer first.

There is also "...WHERE "RowId" NOT IN..." solution, but in that case it is a little faster when You are looking for rows from tblTest

EXPLAIN 
SELECT * FROM "tblTest"
WHERE "RowId" NOT IN (SELECT "RowId" FROM "tblTest2")

'Seq Scan on "tblTest"  (cost=60.50..59434.72 rows=499529 width=339)'
'  Filter: (NOT (hashed SubPlan 1))'
'  SubPlan 1'
'    ->  Seq Scan on "tblTest2"  (cost=0.00..58.00 rows=1000 width=4)'

but it is much, much slower in opposite way
EXPLAIN 
SELECT * FROM "tblTest2"
WHERE "RowId" NOT IN (SELECT "RowId" FROM "tblTest")

'Seq Scan on "tblTest2"  (cost=0.00..34136318.00 rows=500 width=344)'
'  Filter: (NOT (SubPlan 1))'
'  SubPlan 1'
'    ->  Materialize  (cost=0.00..65774.87 rows=999058 width=4)'
'          ->  Seq Scan on "tblTest"  (cost=0.00..56876.58 rows=999058 width=4)'



Regards,
Bartek


2012/2/28 Alessandro Gagliardi <alessandro@path.com>
Argh! I can't believe I forgot the "LEFT". That's embarrassing. Anyway, I was wondering if a WHERE NOT EXISTS clause would be better. I'm still new to those, so didn't want to try to offer that as a solution, but I gather it can be more efficient than a JOIN in some cases.


On Tue, Feb 28, 2012 at 12:13 PM, Bartosz Dmytrak <bdmytrak@eranet.pl> wrote:
This is almost perfect :)
but won't work in that case in the way You expect, because of join type. This should be LEFT JOIN to select all from temp_table (left) and matching from main_table (right), then WHERE clausule will  filter rows not existing in main_table. Without USING, which is shorthand (http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-FROM) it should look like this:

INSERT INTO main_table 
SELECT temp_table.* 
FROM temp_table 
      LEFT JOIN main_table ON (temp_table.pkmain_table.pk) /* instead of "ON (...)" you can use Alessandro's shorthand: "USING (pk)"*/
WHERE main_table.pk IS NULL;


Regards,
Bartek

2012/2/28 Alessandro Gagliardi <alessandro@path.com>
I would recommend restoring the table to a temporary table and then using something like:
INSERT INTO main_table SELECT temp_table.* FROM temp_table JOIN main_table USING (pk) WHERE main_table.pk IS NULL
Someone else here might have a more efficient way of phrasing that insert statement (I'm still fairly new to the concept of anti-joins). But I think something like this should work for you.

-Alessandro

On Tue, Feb 28, 2012 at 7:34 AM, Piyush Lenka <lenka.piyush@gmail.com> wrote:
Hi,
I m trying to restore only data from a backup file.but i hav some primary key that already exists in the table.
I want to restore non duplicate data only.
Please Help.

Thanks And Regards 
Piyush




В списке pgsql-novice по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Postgres restore help
Следующее
От: tukuna patro
Дата:
Сообщение: a role without login priviledge