[PERFORM] select subquery versus join subquery

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	[PERFORM] select subquery versus join subquery
Дата	22 мая 2017 г. 22:57:39
Msg-id	CAMkU=1zxu+xZ7X44FpZ=Syot_STrUiGR9miB8K-HCqvb=KMtwA@mail.gmail.com обсуждение исходный текст
Ответы	Re: [PERFORM] select subquery versus join subquery
Список	pgsql-performance

Дерево обсуждения

I need to make a view which decorates rows from a parent table with aggregated values from a child table. I can think of two ways to write it, one which aggregates the child table to make a new query table and joins the parent to that, as shown in "view1" below. Or does subselect in the select list to aggregate just the currently matching rows, and returns that value, as in "view2" below.

While these two are semantically equivalent, the planner doesn't understand that, and always executes them pretty much the way you would naively do it based on the text of the query.

But view1 is pretty slow if the WHERE clause is highly selective (like "WHERE cutoff<0.00001") because it has to summarize the entire child table just to pull out a few rows. But view2 is pretty slow if the entire view or most of it (like "WHERE cutoff<0.9") is being returned.

Is there some 3rd way to write the query which allows the planner to switch between strategies (summarize whole table vs summarize values on demand) depending on the known selectivity of the where clause?

In this case, the planner is getting the relative cost estimates roughly correct. It is not a problem of mis-estimation.

I can always create two views, view_small and view_large, and swap between them based on my own knowledge of how restrictive a query is likely to be, but that is rather annoying. Especially in the real-world situation, which is quite a bit more complex than this.

create table thing as select x as id, random() as cutoff from generate_series(1,2000000) f(x);

create table thing_alias as select floor(power(random()*power(2000000,5),0.2))::int thing_id, md5(x::text), random() as priority from generate_series(1,150000) f(x);

create index on thing_alias (thing_id );

create index on thing (cutoff );

vacuum; analyze;

create view view1 as select id, md5,cutoff from thing left join

(

select distinct on (thing_id) thing_id, md5 from thing_alias

order by thing_id, priority desc

) as foo

on (thing_id=id);

create view view2 as select id,

(

select md5 from thing_alias where thing_id=id

order by priority desc limit 1

) as md5,

cutoff from thing;

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PERFORM] select subquery versus join subquery