Re: Parallel CREATE INDEX for GIN indexes

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: Parallel CREATE INDEX for GIN indexes
Дата	28 мая 12:29:48
Msg-id	87jzjes2ia.fsf@163.com обсуждение исходный текст
Ответ на	Re: Parallel CREATE INDEX for GIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: Parallel CREATE INDEX for GIN indexes
Список	pgsql-hackers

Дерево обсуждения

Hi Tomas,

I have completed my first round of review, generally it looks good to
me, more testing need to be done in the next days. Here are some tiny
comments from my side, just FYI. 

1. Comments about GinBuildState.bs_leader looks not good for me. 
   
    /*
     * bs_leader is only present when a parallel index build is performed, and
     * only in the leader process. (Actually, only the leader process has a
     * GinBuildState.)
     */
    GinLeader  *bs_leader;

In the worker function _gin_parallel_build_main:
initGinState(&buildstate.ginstate, indexRel); is called, and the
following members in workers at least: buildstate.funcCtx,
buildstate.accum and so on.  So is the comment "only the leader process
has a GinBuildState" correct?

2. progress argument is not used?
_gin_parallel_scan_and_build(GinBuildState *state,
                             GinShared *ginshared, Sharedsort *sharedsort,
                             Relation heap, Relation index,
                             int sortmem, bool progress)


3. In function tuplesort_begin_index_gin, comments about nKeys takes me
some time to think about why 1 is correct(rather than
IndexRelationGetNumberOfKeyAttributes) and what does the "only the index
key" means. 

base->nKeys = 1;            /* Only the index key */

finally I think it is because gin index stores each attribute value into
an individual index entry for a multi-column index, so each index entry
has only 1 key.  So we can comment it as the following? 

"Gin Index stores the value of each attribute into different index entry
for mutli-column index, so each index entry has only 1 key all the
time." This probably makes it easier to understand.


4. GinBuffer: The comment "Similar purpose to BuildAccumulator, but much
simpler." makes me think why do we need a simpler but
similar structure, After some thoughts, they are similar at accumulating
TIDs only. GinBuffer is designed for "same key value" (hence
GinBufferCanAddKey). so IMO, the first comment is good enough and the 2 
comments introduce confuses for green hand and is potential to remove
it. 

/*
 * State used to combine accumulate TIDs from multiple GinTuples for the same
 * key value.
 *
 * XXX Similar purpose to BuildAccumulator, but much simpler.
 */
typedef struct GinBuffer


5. GinBuffer: ginMergeItemPointers always allocate new memory for the
new items and hence we have to pfree old memory each time. However it is
not necessary in some places, for example the new items can be appended
to Buffer->items (and this should be a common case). So could we
pre-allocate some spaces for items and reduce the number of pfree/palloc
and save some TID items copy in the desired case?

6. GinTuple.ItemPointerData first;        /* first TID in the array */

is ItemPointerData.ip_blkid good enough for its purpose?  If so, we can
save the memory for OffsetNumber for each GinTuple.

Item 5) and 6) needs some coding and testing. If it is OK to do, I'd
like to take it as an exercise in this area. (also including the item
1~4.)


-- 
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Nisha Moond
Дата: 28 мая, 12:17:16
Сообщение: Re: Conflict Detection and Resolution

Следующее

От: Andy Fan
Дата: 28 мая, 12:47:17
Сообщение: Re: why memoize is not used for correlated subquery

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel CREATE INDEX for GIN indexes

Предыдущее

Следующее