Query optimizer plans with very small selectivity estimates

Поиск

Список

Период

Сортировка

От	Matthew Bellew
Тема	Query optimizer plans with very small selectivity estimates
Дата	29 октября 2015 г. 19:52:39
Msg-id	CAJnjrPOziOSDWCUgfb1U9QUGdfLFspF6AZ9Xj=x5cOBhyqDFyA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Query optimizer plans with very small selectivity estimates
Список	pgsql-performance

Дерево обсуждения

This related to a post in the general bugs forum, but I found this forum, and

this seems more appropriate. This is my second attempt to post, I believe

the first attempt last week did not work, apologies if I'm duplicating.

http://comments.gmane.org/gmane.comp.db.postgresql.bugs/39011

I made have several users encounter performance problems, which all

seem to come down to this problem: multiplying selectivity estimates can

cause tuple estimates to grow very small very quickly, once the estimator

gets to 1 row, the planner may choose plans that are very good ONLY WHEN

there is exactly 1 row (maybe even O(N^large)). Unfortunately, these may

be the worst plans if the estimate is even slightly off (even just returning

2 or 3 rows versus 1).

Using the patch below, I discovered that clamping relation tuple estimates

to a number as small as 2 seemed to avoid all the catastrophic query

plans.

In the scenarios I'm seeing, I have several examples of queries

that take >1m to run that should run in <1s. The estimate of 1 row

(versus thousands actual) leads the planner to tee up several nest loop joins

which causes thousands of table scans.

I have been working on a more complete which tracks uniqueness along

with selectivity so that optimizer can benefit from knowing when a

relation must have 1 (or fewer) tuples, while clamping all other relations

to 2 rather than 1.

typedef struct

{

double selectivity;

boolean unique;

} Selectivity;

I am interested in hearing discussion about this problem, and if the community

is open to a patch if I continue pursuing the development.

Matt

FIRST ARTIFACT

plan with expensive (80s join) and join estimate of 1

note the first Nested Loop join and 81s join

(I gave up trying to post the full explain, because of the 80 char limit)

"Sort (cost=7000.04..7000.04 rows=1 width=49)

(actual time=81739.426..81740.023 rows=5091 loops=1)"

" Sort Key: c.ten DESC"

" Sort Method: quicksort Memory: 948kB"

" CTE cte"

" -> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=4)

(actual time=0.001..0.001 rows=5 loops=1)"

" -> Nested Loop (cost=1.36..6999.97 rows=1 width=49)

(actual time=0.059..81725.475 rows=5091 loops=1)"

"Planning time: 1.912 ms"

"Execution time: 81740.328 ms"

SECOND ARTIFACT

force join row estimate to be minimun of 2

query completes very quickly

"Sort (cost=7000.06..7000.06 rows=2 width=49)

(actual time=84.610..85.192 rows=5142 loops=1)"

" Sort Key: c.ten DESC"

" Sort Method: quicksort Memory: 956kB"

" CTE cte"

" -> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=4)

(actual time=0.002..0.003 rows=5 loops=1)"

" -> Hash Join (cost=2518.99..6999.98 rows=2 width=49)

(actual time=17.629..82.886 rows=5142 loops=1)"

"Planning time: 2.982 ms"

"Execution time: 85.514 ms"

THIRD ARTIFACT

patch I used to make experimenting easier w/o recompiling

index 1b61fd9..444703c 100644

--- a/src/backend/optimizer/path/costsize.c

+++ b/src/backend/optimizer/path/costsize.c

@@ -68,6 +68,12 @@

*-------------------------------------------------------------------------

+/* These parameters are set by GUC */

+int join_row_estimate_clamp=1;

#include "postgres.h"

#ifdef _MSC_VER

@@ -175,6 +181,17 @@ clamp_row_est(double nrows)

}

+double

+clamp_join_row_est(double nrows)

+ nrows = clamp_row_est(nrows);

+ if (nrows >= (double)join_row_estimate_clamp)

+ return nrows;

+ return (double)join_row_estimate_clamp;

* cost_seqscan

* Determines and returns the cost of scanning a relation sequentially.

@@ -3886,7 +3903,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,

break;

}

- return clamp_row_est(nrows);

+ return clamp_join_row_est(nrows);

}

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c

index 71090f2..fabb8ac 100644

--- a/src/backend/utils/misc/guc.c

+++ b/src/backend/utils/misc/guc.c

@@ -2664,6 +2664,16 @@ static struct config_int ConfigureNamesInt[] =

NULL, NULL, NULL

+ {

+ {"join_row_estimate_clamp", PGC_USERSET, QUERY_TUNING_OTHER,

+ gettext_noop("Set the minimum estimated size of a join result."),

+ NULL

+ },

+ &join_row_estimate_clamp,

+ 1, 1, 10000,

+ NULL, NULL, NULL

+ },

/* End-of-list marker */

{

{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL

diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h

index 25a7303..0161c4b 100644

--- a/src/include/optimizer/cost.h

+++ b/src/include/optimizer/cost.h

@@ -67,8 +67,10 @@ extern bool enable_material;

extern bool enable_mergejoin;

extern bool enable_hashjoin;

extern int constraint_exclusion;

+extern int join_row_estimate_clamp;

extern double clamp_row_est(double nrows);

+extern double clamp_join_row_est(double nrows);

extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,

double index_pages, PlannerInfo *root);

extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,

В списке pgsql-performance по дате отправления:

Предыдущее

От: Alex Ignatov
Дата: 29 октября 2015 г., 16:46:55
Сообщение: Re: Scalability to more than 64 cores With PG 9.4 and RHEL 7.1 Kernel 3.10

Следующее

От: Tom Lane
Дата: 29 октября 2015 г., 21:24:12
Сообщение: Re: Query optimizer plans with very small selectivity estimates

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Query optimizer plans with very small selectivity estimates

Предыдущее

Следующее