Re: Experimental evaluation of PostgreSQL's query optimizer

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Experimental evaluation of PostgreSQL's query optimizer
Дата
Msg-id CAMsr+YGpbK_OvSp+gE_Q4PuV2dWHc1ByJSv4MbM2vTHi4pNieg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Experimental evaluation of PostgreSQL's query optimizer  (Viktor Leis <leis@in.tum.de>)
Ответы Re: Experimental evaluation of PostgreSQL's query optimizer  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers


On 21 December 2015 at 20:53, Viktor Leis <leis@in.tum.de> wrote:
 
I think your suggestion amounts to caching the cardinalities of all
two-way joins. One major issue is that for a query like

select * from r1, r2 where r1.x = r2.y and r1.a = ? and r2.b;

it depends on the specific values of r1.a and r2.b whether there is
any (anti-)correlation. And obviously one cannot store correction
factors for each value of a and b.


I see a parallel with indexing and partial indexes here.

We obviously cannot afford to keep cross-table correlations for every possible pairing of join conditions across every possible set of joined tables. Much like we can't afford to keep indexes for every possible set of columns, but even worse.

Much as we let users CREATE INDEX to tell us what cols to index, maybe we should let them CREATE a cross-table join statistics collector for a particular set of tables, optionally qualified with a filter condition just like we do on partial indexes, and optionally transformed via an immutable expression like we do for expression indexes, e.g.:

CREATE JOIN STATISTICS ON t1 JOIN t2 ON (t1.col1 = t2.col2);

CREATE JOIN STATISTICS ON t1 JOIN t2 ON (lower(t1.col1) = lower(t2.col2)) WHERE t1.othercol IS NOT NULL;

CREATE JOIN STATISTICS ON t1 JOIN t2 ON (t1.colx = t2.colx AND t1.coly = t2.coly);

plus a simplified form like

CREATE JOIN STATISTICS ON t1 JOIN t2 USING (somecol);


That way we let an admin who's tuning queries direct effort at problem areas. It's not automagical, but it's an area where tools could analyze pg_stat_statements to direct effort, much like is currently done for index creation. Like index creation I don't think it's practical to do this entirely automatically and behind the scenes since collecting the stats for all possibilities rapidly gets prohibitive.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Experimental evaluation of PostgreSQL's query optimizer
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Declarative partitioning