Re: [HACKERS] pgsql 10: hash indexes testing

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] pgsql 10: hash indexes testing
Дата
Msg-id CAA4eK1J1qp-AQ3nrq6JSMkPtFSQvfq7pgxXho1duWxotBQW_YA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] pgsql 10: hash indexes testing  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] pgsql 10: hash indexes testing  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On Tue, Jul 11, 2017 at 6:51 AM, AP <ap@zip.com.au> wrote:
> On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
>> On Fri, Jul 7, 2017 at 8:22 AM, AP <ap@zip.com.au> wrote:
>> > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
>> >> I think if you are under development, it is always advisable to create
>> >> indexes after initial bulk load.  That way it will be faster and will
>> >> take lesser space atleast in case of hash index.
>> >
>> > This is a bit of a pickle, actually:
>> > * if I do have a hash index I'll wind up with a bloated one at some stage
>> >   that refused to allow more inserts until the index is re-created
>> > * if I don't have an index then I'll wind up with a table where I cannot
>> >   create a hash index because it has too many rows for it to handle
>> >
>> > I'm at a bit of a loss as to how to deal with this.
>>
>> I can understand your concerns.  To address first concern we need to
>> work on one or more of following work items: (a) work on vacuums that
>> can be triggered on insert only workload (it should perform index
>> vacuum as well) (b) separate utility statement/function to squeeze
>> hash index (c) db internally does squeezing like after each split, so
>> that chances of such a problem can be reduced, but that will be at the
>> cost of performance reduction in other workloads, so not sure if it is
>> advisable.  Among these (b) is simplest to do but may not be
>> convenient for the user.
>
> (a) seems like a good compromise on (c) if it can be done without disruption
>     and in time.
> (b) seems analogous to the path autovcauum took. Unless I misremember, before
>     autovacuum we had a cronjob to do similar work. It's probably a sane path
>     to take as a first step on the way to (a)
> (c) may not be worth the effort if it compromises general use, though perhaps
>     it could be used to indicate to (a) that now is a good time to handle
>     this bit?
>

Nice summarization!  I think before doing anything of that sort we
need opinions from others as well.  If some other community members
also see value in doing one or multiple of above things, then I can
write a patch.

>> To address your second concern, we need to speed up the creation of
>> hash index which is a relatively big project.  Having said that, I
>> think in your case, this is one-time operation so spending once more
>> time might be okay.
>
> Yup. Primarily I just wanted the idea out there that this isn't that easy
> to cope with manually and to get it onto a todo list (unless it was an
> easy thing to do given a bit of thought but it appears not).
>
> Out of curiosity, and apologies if you explained it already and I missed
> the signficance of the words, how does this bloat happen?
>

You might want to read src/backend/access/hash/README.  During split
operation, we copy tuples from the old bucket (bucket being split) to
new bucket (bucket being populated) and once all the tuples are copied
and there is no prior scan left which has started during split on the
buckets involved in the split, we remove the tuples from the old
bucket.  Now, as we might need to wait for the scans to finish, we
have preferred to perform it during vacuum or during next split from
that bucket.  Till the tuples are removed from the old bucket, there
will be some bloat in the system.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] New partitioning - some feedback
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] pgsql 10: hash indexes testing