Re: Hash index initial size is too large given NULLs or partialindexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Hash index initial size is too large given NULLs or partialindexes
Дата
Msg-id b0c1be5e-94ce-0496-fe96-b61be7a3067c@2ndquadrant.com
обсуждение исходный текст
Ответ на Hash index initial size is too large given NULLs or partial indexes  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: Hash index initial size is too large given NULLs or partial indexes
Список pgsql-hackers

On 3/8/19 7:14 PM, Jeff Janes wrote:
> Referring to this thread:
> 
> https://dba.stackexchange.com/questions/231647/why-are-partial-postgresql-hash-indices-not-smaller-than-full-indices
> 
> When a hash index is created on a populated table, it estimates the
> number of buckets to start out with based on the number of tuples
> returned by estimate_rel_size.  But this number ignores both the fact
> that NULLs are not stored in hash indexes, and that partial indexes
> exist.  This can lead to much too large hash indexes.  Doing a re-index
> just repeats the logic, so doesn't fix anything.  Fill factor also can't
> fix it, as you are not allowed to increase that beyond 100.
> 

Hmmm :-(

> This goes back to when the pre-sizing was implemented in 2008
> (c9a1cc694abef737548a2a).  It seems to be an oversight, rather than
> something that was considered.
> 
> Is this a bug that should be fixed?  Or if getting a more accurate
> estimate is not possible or not worthwhile, add a code comment about that?
> 

I'd agree this smells like a bug (or perhaps two). The sizing probably
should consider both null_frac and selectivity of the index predicate.
When those two are redundant (i.e. when there's IS NOT NULL condition on
indexed column), this will result in under-estimate. That means the
index build will do a an extra split, but that's probably better than
having permanently bloated index.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Hash index initial size is too large given NULLs or partial indexes
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: PostgreSQL vs SQL/XML Standards