Обсуждение: Join vs Subquery

Поиск
Список
Период
Сортировка

Join vs Subquery

От
Brian Herlihy
Дата:
Hi,

I am using postgres 8.1.3 for this.  If this has been dealt with later, please disregard.  And this is not a complaint or a request, I am just curious, so I know how to best construct my queries.

There is a unique index mapping domains to domain_ids.

views_ts specifies a partitioned table, where views_ts_2007_04_01 is the only partition matching the range given in the query.

My goal is to produce summaries of counts of rows for each day within a given range (can be days, months, years).

The issue: the second query results in a lower cost estimate.  I am wondering why the second query plan was not chosen for the first query.

Thanks!
Brian

live=> explain select ts::date,count(*) from views_ts join domains using (domain_id) where domain = '1234.com' and ts >= '2007-04-01' and ts < '2007-04-02' group by ts::date;
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=9040.97..9041.00 rows=2 width=8)
   ->  Hash Join  (cost=6.01..9040.96 rows=2 width=8)
         Hash Cond: ("outer".domain_id = "inner".domain_id)
         ->  Append  (cost=0.00..7738.01 rows=259383 width=16)
               ->  Seq Scan on views_ts  (cost=0.00..1138.50 rows=1 width=16)
                     Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
               ->  Seq Scan on views_ts_2007_04_01 views_ts  (cost=0.00..6599.51 rows=259382 width=16)
                     Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
         ->  Hash  (cost=6.01..6.01 rows=1 width=8)
               ->  Index Scan using domains_domain on domains  (cost=0.00..6.01 rows=1 width=8)
                     Index Cond: ("domain" = '1234.com'::text)
(11 rows)

live=> explain select ts::date,count(*) from views_ts where domain_id = (select domain_id from domains where domain = '1234.com') and ts >= '2007-04-01' and ts < '2007-04-02' group by ts::date;
                                                                                 QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1993.93..1995.99 rows=137 width=8)
   InitPlan
     ->  Index Scan using domains_domain on domains  (cost=0.00..6.01 rows=1 width=8)
           Index Cond: ("domain" = '1234.com'::text)
   ->  Result  (cost=0.00..1986.69 rows=247 width=8)
         ->  Append  (cost=0.00..1986.07 rows=247 width=8)
               ->  Seq Scan on views_ts  (cost=0.00..1245.75 rows=1 width=8)
                     Filter: ((domain_id = $0) AND (ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
               ->  Bitmap Heap Scan on views_ts_2007_04_01 views_ts  (cost=2.86..740.32 rows=246 width=8)
                     Recheck Cond: (domain_id = $0)
                     Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
                     ->  Bitmap Index Scan on views_ts_2007_04_01_domain_id  (cost=0.00..2.86 rows=246 width=0)
                           Index Cond: (domain_id = $0)

Re: Join vs Subquery

От
Gregory Stark
Дата:
"Brian Herlihy" <btherl@yahoo.com.au> writes:

> There is a unique index mapping domains to domain_ids.
...
> The issue: the second query results in a lower cost estimate. I am wondering
> why the second query plan was not chosen for the first query.

Well the unique index you mentioned is critical to being able to conclude the
queries are equivalent. Postgres in the past hasn't been able to use things
like unique indexes to make planning decisions because it had no
infrastructure to replan if you dropped the index. We do have such
infrastructure now so it may be possible to add features like this in the
future.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Join vs Subquery

От
Tom Lane
Дата:
Brian Herlihy <btherl@yahoo.com.au> writes:
> The issue: the second query results in a lower cost estimate.  I am wondering why the second query plan was not
chosenfor the first query. 

8.1 is incapable of pushing indexable join conditions down below an Append.
Try 8.2.

            regards, tom lane