Re: Fix gin index cost estimation

Поиск
Список
Период
Сортировка
От Ronan Dunklau
Тема Re: Fix gin index cost estimation
Дата
Msg-id 4212593.ejJDZkT8p0@aivenronan
обсуждение исходный текст
Ответ на Re: Fix gin index cost estimation  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Ответы Re: Fix gin index cost estimation
Список pgsql-hackers
Le lundi 12 septembre 2022, 16:41:16 CEST Ronan Dunklau a écrit :
> But I realised that another approach might be better suited: since we want
to
> charge a cpu cost for every page visited, actually basing that on the
already
> estimated entryPagesFetched and dataPagesFetched would be better, instead of
> copying what is done for other indexes type and estimating the tree height.
It
> would be simpler, as we don't need to estimate the tree height anymore.
>
> I will submit a patch doing that.

The attached does that and is much simpler. I only took into account
entryPagesFetched, not sure if we should also charge something for data pages.

Instead of trying to estimate the height of the tree, we rely on the
(imperfect) estimation of the number of entry pages fetched, and charge 50
times cpu_operator_cost to that, in addition to the cpu_operator_cost charged
per entry visited.

I also adapted to take into accounts multiple scans induced by scalar array
operations.

As it is, I don't understand the following calculation:

/*
* Estimate number of entry pages read.  We need to do
* counts.searchEntries searches.  Use a power function as it should be,
 * but tuples on leaf pages usually is much greater. Here we include all
 * searches in entry tree, including search of first entry in partial
 * match algorithm
 */
entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages,
0.15)));

Is the power(0.15) used an approximation for a log ? If so why ? Also
shouldn't we round that up ?
It seems to me it's unlikely to affect the total too much in normal cases
(adding at worst random_page_cost) but if we start to charge cpu operator
costs as proposed here it makes a big difference and it is probably
safer to overestimate a bit than the opposite.

With those changes, the gin cost (purely cpu-wise) stays above the btree one
as I think it should be.

--
Ronan Dunklau
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: SUBTRANS: Minimizing calls to SubTransSetParent()
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Assertion failure in WaitForWALToBecomeAvailable state machine