Обсуждение: Strange DISTINCT !
Hello, can someone explain to me why this query is so slow : select distinct t.idmembre,p.datecrea from tmp_stat t,prefs p where p.idmembre=t.idmembre limit 5; And this one is so fast : select t.idmembre,p.datecrea from (select distinct idmembre from tmp_stat) as t, prefs p where p.idmembre=t.idmembre limit 5; (I currently have idmembre as an index on tmp_stat and prefs) How does DISTINCT work ? Is this a bug or a misconfigured index ? -- Jean-Christophe Boggio cat@thefreecat.org -o) Independant Consultant and Developer /\\ Delphi, Linux, Perl, PostgreSQL _\_V
I think this is because if you remove duplicates before joining the tables, then you would join smaller tables, therefore cutting the cost of the join (and later sorting and removing duplicates). Say tmp_stat has the size of 1000 and 10 duplicates on the everage for each distinct tuple. Also, if the size of prefs is 1000, then the cost of joining (assuming you have no indexing and just use nested loop joins) these two tables is 1000 x 1000 = 1,000, 000. After this result of the join needs to be sorted and duplicates removed. However, if you remove duplicates before joining, the result (cost) of join is 100 x 1000 = 100, 000, which also reduces the time of sorting and duplicates removal. cheers, Oleg Jean-Christophe Boggio wrote: > Hello, can someone explain to me why this query is so slow : > > select distinct t.idmembre,p.datecrea > from tmp_stat t,prefs p > where p.idmembre=t.idmembre > limit 5; > > And this one is so fast : > > select t.idmembre,p.datecrea > from (select distinct idmembre from tmp_stat) as t, > prefs p > where p.idmembre=t.idmembre > limit 5; > > (I currently have idmembre as an index on tmp_stat and prefs) > > How does DISTINCT work ? Is this a bug or a misconfigured index ? > > -- > Jean-Christophe Boggio > cat@thefreecat.org -o) > Independant Consultant and Developer /\\ > Delphi, Linux, Perl, PostgreSQL _\_V > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
On Mon, 20 Aug 2001, Jean-Christophe Boggio wrote: > Hello, can someone explain to me why this query is so slow : > > select distinct t.idmembre,p.datecrea > from tmp_stat t,prefs p > where p.idmembre=t.idmembre > limit 5; > > And this one is so fast : > > select t.idmembre,p.datecrea > from (select distinct idmembre from tmp_stat) as t, > prefs p > where p.idmembre=t.idmembre > limit 5; > > (I currently have idmembre as an index on tmp_stat and prefs) I'd suggest looking at the explain output for the two queries, however, I believe the first query is likely to result in a sort and unique step and the second is going to probably use the index to distinct on tmp_stat. I think that *possibly* if you used select distrinct p.idmembre, p.datecrea and made an index on (idmembre, datecrea) on prefs you could possibly see an improvement but I really don't know. In addition, these two queries may not do the same thing. The first will unique over both membre and datecrea whereas I think the second will not, so if you had two prefs rows with the same idmembre and datecrea, I believe the first will give one row and the second two.