Обсуждение: Strange DISTINCT !

Поиск
Список
Период
Сортировка

Strange DISTINCT !

От
Jean-Christophe Boggio
Дата:
Hello, can someone explain to me why this query is so slow :

select distinct t.idmembre,p.datecrea
from   tmp_stat t,prefs p
where  p.idmembre=t.idmembre
limit  5;

And this one is so fast :

select t.idmembre,p.datecrea
from   (select distinct idmembre from tmp_stat) as t,      prefs p
where  p.idmembre=t.idmembre
limit 5;
(I currently have idmembre as an index on tmp_stat and prefs)

How does DISTINCT work ? Is this a bug or a misconfigured index ?

--
Jean-Christophe Boggio                       
cat@thefreecat.org                           -o)
Independant Consultant and Developer         /\\
Delphi, Linux, Perl, PostgreSQL             _\_V



Re: Strange DISTINCT !

От
Oleg Lebedev
Дата:
I think this is because if you remove duplicates before joining the tables,
then you would join smaller tables, therefore cutting the cost of the join
(and later sorting and removing duplicates).
Say tmp_stat has the size of 1000 and 10 duplicates on the everage for each
distinct tuple. Also, if the size of prefs is 1000, then the cost of joining
(assuming you have no indexing and just use nested loop joins) these two
tables is 1000 x 1000 = 1,000, 000. After this result of the join needs to
be sorted and duplicates removed.
However, if you remove duplicates before joining, the result (cost) of join
is 100 x 1000 = 100, 000, which also reduces the time of sorting and
duplicates removal.
cheers,

Oleg

Jean-Christophe Boggio wrote:

> Hello, can someone explain to me why this query is so slow :
>
> select distinct t.idmembre,p.datecrea
> from   tmp_stat t,prefs p
> where  p.idmembre=t.idmembre
> limit  5;
>
> And this one is so fast :
>
> select t.idmembre,p.datecrea
> from   (select distinct idmembre from tmp_stat) as t,
>        prefs p
> where  p.idmembre=t.idmembre
> limit 5;
>
> (I currently have idmembre as an index on tmp_stat and prefs)
>
> How does DISTINCT work ? Is this a bug or a misconfigured index ?
>
> --
> Jean-Christophe Boggio
> cat@thefreecat.org                           -o)
> Independant Consultant and Developer         /\\
> Delphi, Linux, Perl, PostgreSQL             _\_V
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html



Re: Strange DISTINCT !

От
Stephan Szabo
Дата:
On Mon, 20 Aug 2001, Jean-Christophe Boggio wrote:

> Hello, can someone explain to me why this query is so slow :
> 
> select distinct t.idmembre,p.datecrea
> from   tmp_stat t,prefs p
> where  p.idmembre=t.idmembre
> limit  5;
> 
> And this one is so fast :
> 
> select t.idmembre,p.datecrea
> from   (select distinct idmembre from tmp_stat) as t,
>        prefs p
> where  p.idmembre=t.idmembre
> limit 5;
>  
> (I currently have idmembre as an index on tmp_stat and prefs)

I'd suggest looking at the explain output for the two queries,
however, I believe the first query is likely to result in a sort
and unique step and the second is going to probably use the index
to distinct on tmp_stat.  I think that *possibly* if you used
select distrinct p.idmembre, p.datecrea and made an index on (idmembre,
datecrea) on prefs you could possibly see an improvement but I really
don't know.

In addition, these two queries may not do the same thing.  The first
will unique over both membre and datecrea whereas I think the second 
will not, so if you had two prefs rows with the same idmembre and 
datecrea, I believe the first will give one row and the second two.