Simple machine-killing query!

Поиск
Список
Период
Сортировка
От Victor Ciurus
Тема Simple machine-killing query!
Дата
Msg-id e7a72f4004102107345063b467@mail.gmail.com
обсуждение исходный текст
Ответы Re: Simple machine-killing query!  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
Re: Simple machine-killing query!  (Aaron Werman <aaron.werman@gmail.com>)
Re: Simple machine-killing query!  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Simple machine-killing query!  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-performance
Hi all,

I'm writing this because I've reached the limit of my imagination and
patience! So here is it...

2 tables:
1 containing 27 million variable lenght, alpha-numeric records
(strings) in 1 (one) field. (10 - 145 char lenght per record)
1 containing 2.5 million variable lenght, alpha-numeric records
(strings) in 1 (one) field.

table wehere created using:
CREATE TABLE "public"."BIGMA" ("string" VARCHAR(255) NOT NULL) WITH OIDS; +
CREATE INDEX "BIGMA_INDEX" ON "public"."BIGMA" USING btree ("string");
and
CREATE TABLE "public"."DIRTY" ("string" VARCHAR(128) NOT NULL) WITH OIDS; +
CREATE INDEX "DIRTY_INDEX" ON "public"."DIRTY" USING btree ("string");

What I am requested to do is to keep all records from 'BIGMA' that do
not apear in 'DIRTY'
So far I have tried solving this by going for:

[explain] select * from BIGMA where string not in (select * from DIRTY);
                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on bigma  (cost=0.00..24582291.25 rows=500 width=145)
   Filter: (NOT (subplan))
   SubPlan
     ->  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
(4 rows)

AND

[explain] select * from bigma,dirty where bigma.email!=dirty.email;
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=20.00..56382092.13 rows=2491443185 width=227)
   Join Filter: (("inner".email)::text <> ("outer".email)::text)
   ->  Seq Scan on dirty  (cost=0.00..42904.63 rows=2503963 width=82)
   ->  Materialize  (cost=20.00..30.00 rows=1000 width=145)
         ->  Seq Scan on bigma  (cost=0.00..20.00 rows=1000 width=145)
(5 rows)

Now the problem is that both of my previous tries seem to last
forever! I'm not a pqsql guru so that's why I'm asking you fellas to
guide mw right! I've tried this on mysql previosly but there seems to
be no way mysql can handle this large query.

QUESTIONS:
What can I do in order to make this work?
Where do I make mistakes? Is there a way I can improve the performance
in table design, query style, server setting so that I can get this
monster going and producing a result?

Thanks all for your preciuos time and answers!

Victor C.

В списке pgsql-performance по дате отправления:

Предыдущее
От: george young
Дата:
Сообщение: Re: create index with substr function
Следующее
От: Jan Dittmer
Дата:
Сообщение: Re: Anything to be gained from a 'Postgres Filesystem'?