Re: WIP: Fast GiST index build

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WIP: Fast GiST index build
Дата
Msg-id 4E4A5CD5.2040601@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
Looking at the calculation of levelStep:

> +     /*
> +      * Calculate levelStep by available amount of memory. We should be able to
> +      * load into main memory one page of each underlying node buffer (which
> +      * are in levelStep below). That give constraint over
> +      * maintenance_work_mem. Also we should be able to have subtree of
> +      * levelStep level in cache. That give constraint over
> +      * effective_cache_size.
> +      *
> +      * i'th underlying level of sub-tree can consists of
> +      * i^maxIndexTuplesPerPage pages at maximum. So, subtree of levelStep
> +      * levels can't be greater then 2 * maxIndexTuplesPerPage ^ levelStep
> +      * pages. We use some more reserve due to we probably can't take whole
> +      * effective cache and use formula 4 * maxIndexTuplesPerPage ^ levelStep =
> +      * effectiveCache. We use similar logic with maintenance_work_mem. We
> +      * should be able to store at least last pages of all buffers where we are
> +      * emptying current buffer to.
> +      */
> +     effectiveMemory = Min(maintenance_work_mem * 1024 / BLCKSZ,
> +                           effective_cache_size);
> +     levelStep = (int) log((double) effectiveMemory / 4.0) /
> +         log((double) maxIndexTuplesPerPage);
> +

I can see that that's equal to the formula given in the paper, 
log_B(M/4B), but I couldn't see any explanation for that formula in the 
paper. Your explanation makes sense, but where did it come from?

It seems a bit pessimistic: while it's true that the a subtree can't be 
larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter 
upper bound on it. The max size of a subtree of depth n can be 
calculated as the geometric series:

r^0 + r^1 + r^2 + ... + r^n = (1 - r^(n + 1)) / (1 - r)

where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but 
for a large r and small n (which is typical), the 2 * 
maxIndexTuplesPerPage^levelStep formula gives a value that's almost 
twice as large as the real max size of a subtree.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Re: Should we have an optional limit on the recursion depth of recursive CTEs?
Следующее
От: Magnus Hagander
Дата:
Сообщение: non-ipv6 vs hostnames