Re: Question on explain

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Question on explain
Дата
Msg-id CAMkU=1wPOHktPNTz-k9bR7OGCBuM9ieyu=qd6p3n+LA7rj9mkQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Question on explain  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: Question on explain  (Enrico Pirozzi <sscotty71@gmail.com>)
Список pgsql-general
On Saturday, June 8, 2013, Jeff Janes wrote:
On Sat, Jun 8, 2013 at 8:06 AM, Enrico Pirozzi <sscotty71@gmail.com> wrote:
Hi all,

I have 2 tables:

table1 (field1,field2,.....) 
table2 (field1,field2,field3,.....)

field1 is an uuid type

Is it a primary key?  Is there a foreign key constraint between them?
 
field2 is a timestamp with time zone type.

If I execute:

explain (analyze on, timing off)
select B.field3,A.field1,A.field2
FROM table1  A INNER JOIN table2 B
using (field1,field2)

the query plan is


                                                QUERY PLAN                                                 
   
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=137324.20..247200.77 rows=1 width=31) (actual rows=1136175 loops=1)
   Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
   ->  Seq Scan on table2 b  (cost=0.00..49694.75 rows=1136175 width=31) (actual rows=1136175 loops
=1)
   ->  Hash  (cost=89610.68..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1)
         Buckets: 65536  Batches: 4  Memory Usage: 15699kB
         ->  Seq Scan on table1 a  (cost=0.00..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1
)
 Total runtime: 5055.118 ms
(7 rows)

My question is
Why Have I a rows=1 in the explain and rows=1136175 in the explain analyze?

Yes, that seems quite strange.

On further thought, that is not strange at all.  You have two very selective join conditions, and the planner assumes they are independent, so that it can multiply the selectivities.  But in reality they are completely (or almost completely) dependent.   If the planner knew about cross column correlations, that might not even help as you can have complete statistical dependence without having correlation.

It seems unlikely to me that the timestamp belongs in both tables, since it's value seems to be completely dependent on the value of the UUID.

 
In any event, it is unlikely the planner would pick a different plan were it to correctly understand the selectivities, so no harm is done.  Although it is easy to imagine similar queries where that would not be the case.

Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее
От: Rad Cirskis
Дата:
Сообщение: pgxs question - linking c-functions to external libraries
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Sample database with difficult SQL questions