Re: [HACKERS] Effect of caching hash bucket size while costing

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Effect of caching hash bucket size while costing
Дата
Msg-id CA+TgmoZkOZqzyDQ8dKmEAKMf5RDHoN2Rnu73wbauM4p_93Oh=Q@mail.gmail.com
обсуждение исходный текст
Ответ на Effect of caching hash bucket size while costing  (Srinivas Karthik V <skarthikv.iitb@gmail.com>)
Ответы Re: [HACKERS] Effect of caching hash bucket size while costing  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Dec 8, 2016 at 3:53 AM, Srinivas Karthik V
<skarthikv.iitb@gmail.com> wrote:
> Dear PostgreSQL Hackers,
>
> I am working in PostgreSQL 9.4.* optimizer module. In costsize.c file and
> final_cost_hashjoin() function, the innerbucketsize is either:
>
> a) calculated using a cached copy
>                                                       OR
> b)  calculated afresh using statistics captured by the following code
> snippet:
> thisbucketsize =   estimate_hash_bucketsize(root,
> get_leftop(restrictinfo->clause),virtualbuckets);
>
> For the query I used, if I disable the caching for calculating the
> innerbucketsize, I get a different plan with cost change of around 1000
> units.
>
> 1) Can you please let me know if innerbucketsize*innerpathrows captures the
> maximum bucket size?
> 2) why is it not calculated afresh all the time?

Well, #2 is answered there right in the comments:
                        * Since we tend to visit the same clauses
over and over when                        * planning a large query, we cache the
bucketsize estimate in the                        * RestrictInfo node to avoid repeated lookups
of statistics.

I assume the person who wrote the comment thought that the answer
wouldn't change from one call to the next, and therefore it was safe
to cache.  I don't know why that isn't the case for you.

As to question #1, there's a comment for that, too, a little further down:
                * The number of tuple comparisons needed is the number of outer                * tuples times the
typicalnumber of tuples in a hash
 
bucket, which                * is the inner relation size times its bucketsize
fraction.  At each                * one, we need to evaluate the hashjoin quals.  But actually,

So innerbucketsize*innerpathrows represents the expected number of
comparisons that we expect to need to perform per hash probe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: [HACKERS] postgres_fdw bug in 9.6
Следующее
От: Christophe Pettus
Дата:
Сообщение: Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.