Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)
Дата
Msg-id dce5799d-3a15-8ede-0b43-12838d61fc85@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)  (Ranier Vilela <ranier.vf@gmail.com>)
Список pgsql-hackers

On 12/29/23 14:53, Ranier Vilela wrote:
> 
> 
> Em sex., 29 de dez. de 2023 às 10:33, Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> escreveu:
> 
> 
> 
>     On 12/29/23 12:53, Ranier Vilela wrote:
>     > Em qui., 28 de dez. de 2023 às 22:16, Tomas Vondra
>     > <tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>
>     <mailto:tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>>>
>     > escreveu:
>     >
>     >
>     >
>     >     On 12/27/23 12:37, Ranier Vilela wrote:
>     >     > Em ter., 26 de dez. de 2023 às 19:07, Tomas Vondra
>     >     > <tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>
>     >     <mailto:tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>>
>     >     <mailto:tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>
>     >     <mailto:tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>>>>
>     >     > escreveu:
>     >     >
>     >     >
>     >     >
>     >     >     On 12/26/23 19:10, Ranier Vilela wrote:
>     >     >     > Hi,
>     >     >     >
>     >     >     > The commit b437571
>     >     >     <http://b437571714707bc6466abde1a0af5e69aaade09c
>     <http://b437571714707bc6466abde1a0af5e69aaade09c>
>     >     <http://b437571714707bc6466abde1a0af5e69aaade09c
>     <http://b437571714707bc6466abde1a0af5e69aaade09c>>
>     >     >     <http://b437571714707bc6466abde1a0af5e69aaade09c
>     <http://b437571714707bc6466abde1a0af5e69aaade09c>
>     >     <http://b437571714707bc6466abde1a0af5e69aaade09c
>     <http://b437571714707bc6466abde1a0af5e69aaade09c>>>> I
>     >     >     > think has an oversight.
>     >     >     > When allocate memory and initialize private spool in
>     function:
>     >     >     > _brin_leader_participate_as_worker
>     >     >     >
>     >     >     > The behavior is the bs_spool (heap and index fields)
>     >     >     > are left empty.
>     >     >     >
>     >     >     > The code affected is:
>     >     >     >   buildstate->bs_spool = (BrinSpool *)
>     >     palloc0(sizeof(BrinSpool));
>     >     >     > - buildstate->bs_spool->heap = buildstate->bs_spool->heap;
>     >     >     > - buildstate->bs_spool->index =
>     buildstate->bs_spool->index;
>     >     >     > + buildstate->bs_spool->heap = heap;
>     >     >     > + buildstate->bs_spool->index = index;
>     >     >     >
>     >     >     > Is the fix correct?
>     >     >     >
>     >     >
>     >     >     Thanks for noticing this.
>     >     >
>     >     > You're welcome.
>     >     >  
>     >     >
>     >     >     Yes, I believe this is a bug - the assignments
>     >     >     are certainly wrong, it leaves the fields set to NULL.
>     >     >
>     >     >     I wonder how come this didn't fail during testing.
>     Surely, if
>     >     the leader
>     >     >     participates as a worker, the tuplesort_begin_index_brin
>     shall
>     >     be called
>     >     >     with heap/index being NULL, leading to some failure
>     during the
>     >     sort. But
>     >     >     maybe this means we don't actually need the heap/index
>     fields,
>     >     it's just
>     >     >     a copy of TuplesortIndexArg, but BRIN does not need that
>     >     because we sort
>     >     >     the tuples by blkno, and we don't need the descriptors
>     for that.
>     >     >
>     >     > Unfortunately I can't test on Windows, since I can't build with
>     >     meson on
>     >     > Windows.
>     >     >
>     >     >
>     >     >     In any case, the _brin_parallel_scan_and_build does not
>     >     actually need
>     >     >     the separate heap/index arguments, those are already in
>     the spool.
>     >     >
>     >     > Yeah, for sure.
>     >     >
>     >     >
>     >     >     I'll try to figure out if we want to simplify the
>     tuplesort or
>     >     remove
>     >     >     the arguments from _brin_parallel_scan_and_build.
>     >     >
>     >
>     >     Here is a patch simplifying the BRIN parallel create code a
>     little bit.
>     >     As I suspected, we don't need the heap/index in the spool at
>     all, and we
>     >     don't need to pass it to tuplesort_begin_index_brin either -
>     we only
>     >     need blkno, and we have that in the datum1 field. This also
>     means we
>     >     don't need TuplesortIndexBrinArg.
>     >
>     > With Windows 10, msvc 2022, compile end pass ninja test.
>     >
>     > But, if you allow me, I would like to try another approach to
>     > simplification.
>     > Instead of increasing the arguments in the call, wouldn't it be better
>     > to decrease them 
>     > and this way all arguments will be passed in the registers instead
>     of on
>     > a stack?
>     >
> 
>     If this was beneficial, we'd be passing everything through structs and
>     not as explicit arguments. But we don't. If you're arguing it's
>     beneficial in this case, it'd be good to see it demonstrated.
> 
> Please see the https://www.agner.org/optimize/optimizing_cpp.pdf
> <https://www.agner.org/optimize/optimizing_cpp.pdf>
> Excerpt:
> "Use 64-bit mode
> Parameter transfer is more efficient in 64-bit mode than in 32-bit mode,
> and more efficient in 64-bit Linux than in 64-bit Windows. In 64-bit
> Linux, the first six integer parameters and the first eight floating
> point parameters are transferred in registers, totaling up to fourteen
> register parameters. In 64-bit Windows, the first four parameters are
> transferred in registers, regardless of whether they are integers or
> floating point numbers."
> 
> With function:
> _brin_parallel_scan_and_build(buildstate, buildstate->bs_spool, 
> brinshared, sharedsort,  heapRel, indexRel, sortmem, false);
> We have:
> Linux -> six first parameters in registers and two parameters in stack
> Windows -> four parameters in registers and four parameters in stack
> 

I suggested you demonstrate this actually makes a difference in
practice. Quoting a document is not that.

Also, that document is about C++, and while C and C++ are very close, I
wouldn't be surprised if there were differences. Furthermore, that
section talks about integer/floating point arguments, while we're
dealing with pointers, and it's not clear if that changes something (the
document has a separate section about pointers/references, which
suggests pointers and integers are not 100% the same thing).

And finally, I haven't tried disassembling the code, but I'd be quite
surprised if these things were not heavily dependent on the compiler
and/or optimization level.

> 
>     > bs_spool may well contain this data and will probably be useful in the
>     > future.
>     >
>     > I made a v1 version, based on your patch, for your consideration.
>     >
> 
>     I did actually consider doing it this way yesterday, but I don't like
>     this approach. I don't believe it's faster (and even if it was, the
>     difference is going to be negligible), and parameters hidden in some
>     struct increase the cognitive load. I like explicit arguments.
> 
> Personally I prefer data in structs, of course,
> always thinking about size and alignment, to optimize loading.
> 

As I said, I think this is quite irrelevant because we'll call the
function maybe 10-times during the whole index build. With millions of
other function calls.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)
Следующее
От: Zhang Mingli
Дата:
Сообщение: Remove useless GROUP BY columns considering unique index