Re: [HACKERS] Parallel Index Scans

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] Parallel Index Scans
Дата
Msg-id CAA4eK1LRx4sxaO1pXPgh=PMDoG0+zOHOdimdvTiWO4RcApZ8tA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel Index Scans  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Parallel Index Scans  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Parallel Index Scans  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Re: [HACKERS] Parallel Index Scans  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Jan 13, 2017 at 11:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 13, 2017 at 9:28 AM, Anastasia Lubennikova
> <a.lubennikova@postgrespro.ru> wrote:
>> I didn't find any case of noticeable performance degradation,
>> so set status to "Ready for committer".
>
> The very first hunk of doc changes looks like it makes the whitespace
> totally wrong - surely it can't be right to have 0-space indents in C
> code.
>

Fixed.

> +    The <literal>index_size</> parameter indicate the size of generic parallel
>
> indicate -> indicates
> size of generic -> size of the generic
>

Fixed.

> +   index-type-specific parallel information which will be stored immediatedly
>
> Typo.
>

Fixed.

> +   Initialize the parallel index scan state.  It will be used to initialize
> +   index-type-specific parallel information which will be stored immediatedly
> +   after generic parallel information required for parallel index scans.  The
> +   required state information will be set in <literal>target</>.
> +  </para>
> +
> +   <para>
> +     The <function>aminitparallelscan</> and
> <function>amestimateparallelscan</>
> +     functions need only be provided if the access method supports
> <quote>parallel</>
> +     index scans.  If it doesn't, the <structfield>aminitparallelscan</> and
> +     <structfield>amestimateparallelscan</> fields in its
> <structname>IndexAmRoutine</>
> +     struct must be set to NULL.
> +   </para>
>
> Inconsistent indentation.

Fixed.

>  <quote> seems like a strange choice of tag.
>

I have seen that <quote> is used in indexam.sgml at multiple places to
refer to "bitmap" and "plain" index scans.  So thought of using same
for "parallel" index scans.

> +    /* amestimateparallelscan is optional; assume no-op if not
> provided by AM */
>
> The fact that amestimateparallelscan is optional even when parallel
> index scans are supported is undocumented.
>

Okay, I have added that information in docs.

>  Similarly for the other
> functions, which also seem to be optional but not documented as such.
> The code and documentation need to match.
>

All the functions introduced by this patch are documented in
indexam.sgml as optional.  Not sure, which other place you are
expecting an update.

> +    void       *amtarget = (char *) ((void *) target) + offset;
>
> Passing an unaligned pointer to the AM sounds like a recipe for
> crashes on obscure platforms that can't tolerate alignment violations,
> and possibly bad performance on others.  I'd arrange MAXALIGN the size
> of the generic structure in index_parallelscan_estimate and
> index_parallelscan_initialize.

Right, changed as per suggestion.

>  Also, why pass the size of the generic
> structure to the AM-specific estimate routine, anyway?  It can't
> legally return a smaller value, and we can add_size() just as well as
> the AM-specific code.  Wouldn't it make more sense for the AM-specific
> code to return the amount of space that is needed for AM-specific
> stuff, and let the generic code deal with the generic stuff?
>

makes sense, so changed accordingly.

> + *    status - True indicates that the block number returned is valid and scan
> + *             is continued or block number is invalid and scan has just begun
> + *             or block number is P_NONE and scan is finished.  False indicates
> + *             that we have reached the end of scan for current
> scankeys and for
> + *             that we return block number as P_NONE.
>
> It is hard to parse a sentence with that many "and" and "or" clauses,
> especially since English, unlike C, does not have strict operator
> precedence rules. Perhaps you could reword to make it more clear.
>

Okay, I have changed the comment.

> Also, does that survive pgindent with that indentation?
>

Yes.

> +    BTParallelScanDesc btscan = (BTParallelScanDesc) OffsetToPointer(
> +                                                (void *) scan->parallel_scan,
> +                                             scan->parallel_scan->ps_offset);
>
> You could avoid these uncomfortable line breaks by declaring the
> variable on one line and the initializing it on a separate line.
>

Okay, changed.

> +        SpinLockAcquire(&btscan->ps_mutex);
> +        pageStatus = btscan->ps_pageStatus;
> +        if (so->arrayKeyCount < btscan->ps_arrayKeyCount)
> +            *status = false;
> +        else if (pageStatus == BTPARALLEL_DONE)
> +            *status = false;
> +        else if (pageStatus != BTPARALLEL_ADVANCING)
> +        {
> +            btscan->ps_pageStatus = BTPARALLEL_ADVANCING;
> +            nextPage = btscan->ps_nextPage;
> +            exit_loop = true;
> +        }
> +        SpinLockRelease(&btscan->ps_mutex);
>
> IMHO, this needs comments.
>

Sure, added a comment.

> + * It updates the value of next page that allows parallel scan to move forward
> + * or backward depending on scan direction.  It wakes up one of the sleeping
> + * workers.
>
> This construction is commonly used in India but sounds awkward to
> other English-speakers, or at least to me.  You can either drop the
> word "it" and just start with the verb "Updates the value of ..." or
> you can replace the first instance of "It" with "This function".
> Although actually, I think this whole comment needs rewriting.  Maybe
> something like "Save information about scan position and wake up next
> worker to continue scan."
>

Changed as per suggestion.

> + * This must be called when there are no pages left to scan. Notify end of
> + * parallel scan to all the other workers associated with this scan.
>
> Suggest: When there are no pages left to scan, this function should be
> called to notify other workers.  Otherwise, they might wait forever
> for the scan to advance to the next page.
>
> +        if (status == false)
>
> if (!status) is usually preferred for bools.  (Multiple instances.)
>
> +#define BTPARALLEL_NOT_INITIALIZED 0x01
> +#define BTPARALLEL_ADVANCING 0x02
> +#define BTPARALLEL_DONE 0x03
> +#define BTPARALLEL_IDLE 0x04
>
> Let's change this to an enum.  We can keep the names of the members
> as-is, just use typedef enum { ... } instead of #defines.
>

Changed as per suggestion.

> +#define OffsetToPointer(base, offset)\
> +((void *)((char *)base + offset))
>
> Blech.  Aside from the bad formatting, this is an awfully generic
> thing to stick into relscan.h.

Agreed and moved to c.h where some similar defines are present.

>  I'm not sure we should have it at all,
> but certainly not in this file.
>

Yeah, but I think there is no harm in keeping it and maybe start using
in code at other places as well.

> +/*
> + * BTParallelScanDescData contains btree specific shared information required
> + * for parallel scan.
> + */
> +typedef struct BTParallelScanDescData
> +{
> +    BlockNumber ps_nextPage;    /* latest or next page to be scanned */
> +    uint8        ps_pageStatus;    /* indicates whether next page is available
> +                                 * for scan. see nbtree.h for possible states
> +                                 * of parallel scan. */
> +    int            ps_arrayKeyCount;        /* count indicating number of array
> +                                         * scan keys processed by parallel
> +                                         * scan */
> +    slock_t        ps_mutex;        /* protects above variables */
> +    ConditionVariable cv;        /* used to synchronize parallel scan */
> +}    BTParallelScanDescData;
>
> Why are the states declared a separate header file from the variable
> that uses them?   Let's put them all in the same place.
>

Agreed and changed accordingly.

> Why do all of these fields except for the last one have a ps_ prefix,
> but the last one doesn't?
>

No specific reason, so Changed as per suggestion.

> I assume "ps" stands for "parallel scan" but maybe "btps" would be
> better since this is btree-specific.
>

Changed as per suggestion.

> ps_nextPage sometimes contains something other than the next page, so
> maybe we should choose a different name, like ps_currentPage or
> ps_scanPage.
>

Changed as per suggestion.


I have also rebased the optimizer/executor support patch
(parallel_index_opt_exec_support_v4.patch) and added a test case in
it.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Tuple sort is broken. It crashes on simple test.
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: [HACKERS] Support for pg_receivexlog --format=plain|tar