Re: [PERFORM] select subquery versus join subquery

Поиск
Список
Период
Сортировка
От Gunnar \"Nick\" Bluth
Тема Re: [PERFORM] select subquery versus join subquery
Дата
Msg-id a7cc091e-b73e-389e-739d-c7d5b480d58d@elster.de
обсуждение исходный текст
Ответ на [PERFORM] select subquery versus join subquery  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: [PERFORM] select subquery versus join subquery
Список pgsql-performance
Am 05/22/2017 um 09:57 PM schrieb Jeff Janes:
> I need to make a view which decorates rows from a parent table with
> aggregated values from a child table.  I can think of two ways to write
> it, one which aggregates the child table to make a new query table and
> joins the parent to that, as shown in "view1" below.  Or does subselect
> in the select list to aggregate just the currently matching rows, and
> returns that value, as in "view2" below.
>
> While these two are semantically equivalent, the planner doesn't
> understand that, and always executes them pretty much the way you would
> naively do it based on the text of the query.
>
> But view1 is pretty slow if the WHERE clause is highly selective (like
> "WHERE cutoff<0.00001") because it has to summarize the entire child
> table just to pull out a few rows.  But view2 is pretty slow if the
> entire view or most of it (like "WHERE cutoff<0.9") is being returned.
>
> Is there some 3rd way to write the query which allows the planner to
> switch between strategies (summarize whole table vs summarize values on
> demand) depending on the known selectivity of the where clause?
>
> In this case, the planner is getting the relative cost estimates roughly
> correct.  It is not a problem of mis-estimation.
>
> I can always create two views, view_small and view_large, and swap
> between them based on my own knowledge of how restrictive a query is
> likely to be, but that is rather annoying.  Especially in the real-world
> situation, which is quite a bit more complex than this.
>
> create table thing as select x as id, random() as cutoff from
> generate_series(1,2000000) f(x);
>
> create table thing_alias as select
> floor(power(random()*power(2000000,5),0.2))::int thing_id, md5(x::text),
> random() as priority from generate_series(1,150000) f(x);
>
> create index on thing_alias (thing_id );
>
> create index on thing (cutoff );
>
> vacuum; analyze;
>
> create view view1 as select id, md5,cutoff from thing left join
>   (
>      select distinct on (thing_id) thing_id, md5 from thing_alias
>      order by thing_id, priority desc
>   ) as foo
>   on (thing_id=id);
>
> create view view2 as select id,
>   (
>      select md5 from thing_alias where thing_id=id
>         order by priority desc limit 1
>   ) as md5,
>   cutoff from thing;
>
> Cheers,
>
> Jeff

Hi Jeff,

how does something like

CREATE OR REPLACE VIEW public.view3 AS
 SELECT thing.id,
    foo.md5,
    thing.cutoff
   FROM thing,
    LATERAL ( SELECT DISTINCT ON (thing_alias.thing_id)
thing_alias.thing_id,
            thing_alias.md5
           FROM thing_alias
          WHERE thing_alias.thing_id = thing.id
          ORDER BY thing_alias.thing_id, thing_alias.priority DESC) foo

work for you? At least that's always using an index scan here, as
opposed to view1, which (for me) defaults to a SeqScan on thing_alias at
a low cutoff.

*****
Note btw. that both view1 and view2 don't return any md5 values for me,
while view3 does!
*****

Results (ms, median of 3 runs):
cutoff<  0.1   0.9
view1:   348   1022
view2:   844   6484
view3:   842   5976

With

 LATERAL ( SELECT string_agg(thing_alias.md5, ','::text) AS md5
           FROM thing_alias
          WHERE thing_alias.thing_id = thing.id
          GROUP BY thing_alias.thing_id) foo

(which seems to make more sense ;-)

I yield 483 (0.1) and 3410 (0.9) ms (and return md5-Aggregates).

Cheers,
--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [PERFORM] Can postgresql plan a query using multiple CPU cores?
Следующее
От: Dinesh Chandra 12108
Дата:
Сообщение: [PERFORM] Log update query along with username who has executed the same.