Re: Query performance question on a large table

Поиск
Список
Период
Сортировка
От Sean Shanny
Тема Re: Query performance question on a large table
Дата
Msg-id 3FFBB0F2.6040006@earthlink.net
обсуждение исходный текст
Ответ на Re: Query performance question on a large table  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Query performance question on a large table  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Tom,

Sort of piggybacking on this thread but why the suggestion to drop the
use of DISTINCT in 7.4?  We use DISTINCT all over the place to eliminate
duplicates in sub select statements. Running 7.4.0 currently on
FreeBSD5.1 Dell 2650 4GB RAM 5 disk SCSI array hardware RAID 0

 Example:

explain analyze select t1.raw_agent_string from d_useragent t1 where
t1.id in (select distinct useragent_key from f_pageviews where date_key
between 356 and 362);

QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1020025.13..1020178.84 rows=51 width=79) (actual
time=954080.021..970268.457 rows=82207 loops=1)
   ->  HashAggregate  (cost=1020025.13..1020025.13 rows=51 width=4)
(actual time=954049.317..954450.065 rows=82208 loops=1)
         ->  Subquery Scan "IN_subquery"  (cost=983429.20..1020025.00
rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1)
               ->  Unique  (cost=983429.20..1020024.49 rows=51 width=4)
(actual time=856641.230..952939.539 rows=82208 loops=1)
                     ->  Sort  (cost=983429.20..1001726.84 rows=7319058
width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1)
                           Sort Key: useragent_key
                           ->  Index Scan using
idx_pageviews_date_dec_2003 on f_pageviews  (cost=0.00..136434.63
rows=7319058 width=4) (actual time=1.140..693400.464 rows=11067735 loops=1)
                                 Index Cond: ((date_key >= 356) AND
(date_key <= 362))
   ->  Index Scan using d_useragent_pkey on d_useragent t1
(cost=0.00..3.00 rows=1 width=83) (actual time=0.169..0.174 rows=1
loops=82208)
         Index Cond: (t1.id = "outer".useragent_key)
 Total runtime: 970657.888 ms
(11 rows)

t1.id is the primary key on d_useragent.  d_useragent  actually has
390751 rows.
useragent_key has an index.  f_pageviews has roughly 120 million rows.

Is there a better way of writing this sort of query that will accomplish
the same thing?

Thanks.

--sean

Tom Lane wrote:

>=?iso-8859-2?Q?Egy=FCd_Csaba?= <csegyud@vnet.hu> writes:
>
>
>>here is a sample query:
>>select  mertido, fomeazon, ertektipus, mertertek from t_me30 where fomeazon
>>in (select distinct fomeazon from t_fome where lower(inuse) = 'igen') and
>>mertido like '2003-12-17%' and ertektipus in ('+MW') order by mertido,
>>fomeazon, ertektipus;
>>
>>
>
>
>
>>Ohh, I nearly forgot the config:  Linux 7.1; Postgres 7.3.2;
>>
>>
>
>The first thing you ought to do is move to PG 7.4.  "foo IN (SELECT ...)"
>generally works a lot better under 7.4 than prior releases.  I'd suggest
>dropping the "DISTINCT" when using 7.4, too.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>


В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: SPI question (or not): trying to read from Large Objects from within a function
Следующее
От: "Chris Travers"
Дата:
Сообщение: Re: problems with transaction blocks