Обсуждение: query from partitions

Поиск
Список
Период
Сортировка

query from partitions

От
Ключников А.С.
Дата:
Hi.

create table device(id int);

insert into device values(1);
insert into device values(2);
.....
insert into device values(250);

create table base (
                    id int,
                    data float,
                    datatime timestamp,
                    mode int,
                    status int);

create table base_1 (
                    check ( id = 1 and datatime >= DATE '2005-01-01'
                    and datatime < DATE '2006-01-01' )
                    ) INHERITS (base);

create table base_2 (
                    check ( id = 2 and datatime >= DATE '2005-01-01'
                    and datatime < DATE '2006-01-01' )
                    ) INHERITS (base);
....
create table base_250


And
select * from base
    where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
10 seconds

select * from base
    where id in (select id from device where id = 1 or id = 2) and
    datatime between '2005-05-15' and '2005-05-17';
10 minits

Why?

--
mailto: alexs@analytic.mv.ru

Re: query from partitions

От
Richard Huxton
Дата:
Ключников А.С. wrote:
> And
> select * from base
>     where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
> 10 seconds
>
> select * from base
>     where id in (select id from device where id = 1 or id = 2) and
>     datatime between '2005-05-15' and '2005-05-17';
> 10 minits
>
> Why?

Run EXPLAIN ANALYSE on both queries to see how the plan has changed.

My guess for why the plans are different is that in the first case your
query ends up as ...where (id=1 or id=2)...

In the second case, the planner doesn't know what it's going to get back
from the subquery until it's executed it, so can't tell it just needs to
scan base_1,base_2. Result: you'll scan all child tables of base.

I think the planner will occasionally evaluate constants before
planning, but I don't think it will ever execute a subquery and then
re-plan the outer query based on those results. Of course, someone might
pop up and tell me I'm wrong now...

--
   Richard Huxton
   Archonet Ltd


Re: query from partitions

От
"Steinar H. Gunderson"
Дата:
On Tue, Dec 13, 2005 at 06:18:19PM +0300, Ключников А.С. wrote:
> select * from base
>     where id in (select id from device where id = 1 or id = 2) and
>     datatime between '2005-05-15' and '2005-05-17';
> 10 minits

That's a really odd way of saying "1 or 2". It probably has to go through all
the records in device, not realizing it can just scan for two of them (using
two index scans). I'd guess an EXPLAIN ANALYZE would confirm something like
this happening (you'd want to run that and post the results here anyhow).

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: query from partitions

От
Ключников А.С.
Дата:
* Richard Huxton <dev@archonet.com> [2005-12-13 15:59:11 +0000]:

> Ключников А.С. wrote:
> >And
> >select * from base
> >    where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
> >10 seconds
> >
> >select * from base
> >    where id in (select id from device where id = 1 or id = 2) and
> >    datatime between '2005-05-15' and '2005-05-17';
> >10 minits
> >
> >Why?
>
> Run EXPLAIN ANALYSE on both queries to see how the plan has changed.
explain select distinct on(id) * from base where id in (1,2) and
data_type=2 and datatime < '2005-11-02' order by id, datatime desc;

Unique  (cost=10461.14..10527.30 rows=2342 width=38)
   ->  Sort  (cost=10461.14..10494.22 rows=13232 width=38)
         Sort Key: public.base.id, public.base.datatime
         ->  Result  (cost=0.00..9555.29 rows=13232 width=38)
               ->  Append  (cost=0.00..9555.29 rows=13232 width=38)
                     ->  Seq Scan on base  (cost=0.00..32.60 rows=1
width=38)
                           Filter: (((id = 1) OR (id = 2)) AND (data_type =
2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on base_batch base  (cost=0.00..32.60
rows=1 width=38)
.......................

->  Seq Scan on base_1_2004 base  (cost=0.00..32.60 rows=1 width=38)
                           Filter: (((id = 1) OR (id = 2)) AND (data_type =
2) AND (datatime < '2005-11-02 00:00:00'::timestamp without time zone))
(записей: 34)


and
explain select distinct on(id) * from base where id in (select id from
device where id = 1 or id = 2) and data_type=2 and datatime < '2005-11-02'
order by id, datatime desc;

Unique  (cost=369861.89..369872.52 rows=2126 width=38)
   ->  Sort  (cost=369861.89..369867.21 rows=2126 width=38)
         Sort Key: public.base.id, public.base.datatime
         ->  Hash IN Join  (cost=5.88..369744.39 rows=2126 width=38)
               Hash Cond: ("outer".id = "inner".id)
               ->  Append  (cost=0.00..368654.47 rows=212554 width=38)
                     ->  Seq Scan on base  (cost=0.00..26.95 rows=2
width=38)
                           Filter: ((data_type = 2) AND (datatime <
'2005-11-02 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on base_batch base  (cost=0.00..26.95
rows=2 width=38)
                           Filter: ((data_type = 2) AND (datatime <
'2005-11-02 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on base_lines_05_12 base
(cost=0.00..26.95 rows=2 width=38)
............................
 ->  Hash  (cost=5.88..5.88 rows=2 width=4)
                     ->  Seq Scan on device  (cost=0.00..5.88 rows=2
width=4)
                           Filter: ((id = 1) OR (id = 2))
(записей: 851)

>
> My guess for why the plans are different is that in the first case your
> query ends up as ...where (id=1 or id=2)...
>
> In the second case, the planner doesn't know what it's going to get back
> from the subquery until it's executed it, so can't tell it just needs to
> scan base_1,base_2. Result: you'll scan all child tables of base.
>
> I think the planner will occasionally evaluate constants before
> planning, but I don't think it will ever execute a subquery and then
> re-plan the outer query based on those results. Of course, someone might
> pop up and tell me I'm wrong now...
>
> --
>   Richard Huxton
>   Archonet Ltd
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match

--
С уважением,
Ключников А.С.
Ведущий инженер ПРП "Аналитприбор"
432030 г.Ульяновск, а/я 3117
тел./факс +7 (8422) 43-44-78
mailto: alexs@analytic.mv.ru

Re: query from partitions

От
Simon Riggs
Дата:
On Tue, 2005-12-13 at 15:59 +0000, Richard Huxton wrote:
> Ключников А.С. wrote:
> > And
> > select * from base
> >     where id in (1,2) and datatime between '2005-05-15' and '2005-05-17';
> > 10 seconds
> >
> > select * from base
> >     where id in (select id from device where id = 1 or id = 2) and
> >     datatime between '2005-05-15' and '2005-05-17';
> > 10 minits
> >
> > Why?
>
> Run EXPLAIN ANALYSE on both queries to see how the plan has changed.
>
> My guess for why the plans are different is that in the first case your
> query ends up as ...where (id=1 or id=2)...
>
> In the second case, the planner doesn't know what it's going to get back
> from the subquery until it's executed it, so can't tell it just needs to
> scan base_1,base_2. Result: you'll scan all child tables of base.
>
> I think the planner will occasionally evaluate constants before
> planning, but I don't think it will ever execute a subquery and then
> re-plan the outer query based on those results. Of course, someone might
> pop up and tell me I'm wrong now...

Thats right. Partitioning doesn't work for joins in 8.1.

Best Regards, Simon Riggs