Re: [HACKERS] pgsql 10: hash indexes testing

Поиск
Список
Период
Сортировка
От AP
Тема Re: [HACKERS] pgsql 10: hash indexes testing
Дата
Msg-id 20170711012156.aotofk65eu2itjkj@zip.com.au
обсуждение исходный текст
Ответ на Re: [HACKERS] pgsql 10: hash indexes testing  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
> On Fri, Jul 7, 2017 at 8:22 AM, AP <ap@zip.com.au> wrote:
> > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> >> I think if you are under development, it is always advisable to create
> >> indexes after initial bulk load.  That way it will be faster and will
> >> take lesser space atleast in case of hash index.
> >
> > This is a bit of a pickle, actually:
> > * if I do have a hash index I'll wind up with a bloated one at some stage
> >   that refused to allow more inserts until the index is re-created
> > * if I don't have an index then I'll wind up with a table where I cannot
> >   create a hash index because it has too many rows for it to handle
> >
> > I'm at a bit of a loss as to how to deal with this.
> 
> I can understand your concerns.  To address first concern we need to
> work on one or more of following work items: (a) work on vacuums that
> can be triggered on insert only workload (it should perform index
> vacuum as well) (b) separate utility statement/function to squeeze
> hash index (c) db internally does squeezing like after each split, so
> that chances of such a problem can be reduced, but that will be at the
> cost of performance reduction in other workloads, so not sure if it is
> advisable.  Among these (b) is simplest to do but may not be
> convenient for the user.

(a) seems like a good compromise on (c) if it can be done without disruption   and in time.
(b) seems analogous to the path autovcauum took. Unless I misremember, before   autovacuum we had a cronjob to do
similarwork. It's probably a sane path   to take as a first step on the way to (a)
 
(c) may not be worth the effort if it compromises general use, though perhaps   it could be used to indicate to (a)
thatnow is a good time to handle   this bit?
 

> To address your second concern, we need to speed up the creation of
> hash index which is a relatively big project.  Having said that, I
> think in your case, this is one-time operation so spending once more
> time might be okay.

Yup. Primarily I just wanted the idea out there that this isn't that easy
to cope with manually and to get it onto a todo list (unless it was an
easy thing to do given a bit of thought but it appears not).

Out of curiosity, and apologies if you explained it already and I missed
the signficance of the words, how does this bloat happen? There tables
obly cop COPY. There is no UPDATE or DELETE; all transactions get COMMITted
so there's no ROLLBACK undoing the COPY and yet the bloat occurs.

AP



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] New partitioning - some feedback
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected