[PERFORM] select subquery versus join subquery

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема [PERFORM] select subquery versus join subquery
Дата
Msg-id CAMkU=1zxu+xZ7X44FpZ=Syot_STrUiGR9miB8K-HCqvb=KMtwA@mail.gmail.com
обсуждение исходный текст
Ответы Re: [PERFORM] select subquery versus join subquery
Список pgsql-performance
I need to make a view which decorates rows from a parent table with aggregated values from a child table.  I can think of two ways to write it, one which aggregates the child table to make a new query table and joins the parent to that, as shown in "view1" below.  Or does subselect in the select list to aggregate just the currently matching rows, and returns that value, as in "view2" below.

While these two are semantically equivalent, the planner doesn't understand that, and always executes them pretty much the way you would naively do it based on the text of the query.

But view1 is pretty slow if the WHERE clause is highly selective (like "WHERE cutoff<0.00001") because it has to summarize the entire child table just to pull out a few rows.  But view2 is pretty slow if the entire view or most of it (like "WHERE cutoff<0.9") is being returned.

Is there some 3rd way to write the query which allows the planner to switch between strategies (summarize whole table vs summarize values on demand) depending on the known selectivity of the where clause?

In this case, the planner is getting the relative cost estimates roughly correct.  It is not a problem of mis-estimation.

I can always create two views, view_small and view_large, and swap between them based on my own knowledge of how restrictive a query is likely to be, but that is rather annoying.  Especially in the real-world situation, which is quite a bit more complex than this.

create table thing as select x as id, random() as cutoff from generate_series(1,2000000) f(x);

create table thing_alias as select floor(power(random()*power(2000000,5),0.2))::int thing_id, md5(x::text), random() as priority from generate_series(1,150000) f(x);

create index on thing_alias (thing_id );

create index on thing (cutoff );

vacuum; analyze;

create view view1 as select id, md5,cutoff from thing left join 
  (
     select distinct on (thing_id) thing_id, md5 from thing_alias 
     order by thing_id, priority desc
  ) as foo 
  on (thing_id=id);

create view view2 as select id, 
  (
     select md5 from thing_alias where thing_id=id 
        order by priority desc limit 1
  ) as md5, 
  cutoff from thing;

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: [PERFORM] Bulk persistence strategy
Следующее
От: Clemens Eisserer
Дата:
Сообщение: [PERFORM] Can postgresql plan a query using multiple CPU cores?