Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От Zhihong Yu
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id CALNJ-vRbftTop7SMjjWc1OJMoz1UwOSc10wskhCN3W4Y49bJFQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
> +   /* Make a guess at a good size when we're not given a valid size. */
> +   if (size == 0)
> +       size = 1024;
>
> Should the default size be logged ?

> I'm not too sure if I know what you mean here. Should it be a power of
> 2? It is.  Or do you mean I shouldn't use a magic number?

Using 1024 seems to be fine. I meant logging the default value of 1024 so that user / dev can have better idea the actual value used (without looking at the code).

>> Or do you think 98% is not a good number?

Since you have played with Result Cache, I would trust your judgment in choosing the percentage.
It is fine not to expose this constant until the need arises.

Cheers

On Sun, Dec 6, 2020 at 5:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, 5 Dec 2020 at 16:51, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> There are two blocks with almost identical code (second occurrence in cache_store_tuple):
>
> +   if (rcstate->mem_used > rcstate->mem_upperlimit)
> +   {
> It would be nice if the code can be extracted to a method and shared.

It's true, they're *almost* identical.  I quite like the fact that one
of the cases can have an unlikely() macro in there. It's pretty
unlikely that we'd go into cache overflow just when adding a new cache
entry. work_mem would likely have to be set to a couple of dozen bytes
for that to happen. 64k is the lowest it can be set.  However, I
didn't really check to see if having that unlikely() macro increases
performance.  I've changed things locally here to add a new function
named cache_check_mem(). I'll keep that for now, but I'm not sure if
there's enough code there to warrant a function. The majority of the
additional lines are from the comment being duplicated.

>                     node->rc_status = RC_END_OF_SCAN;
>                     return NULL;
>                 }
>                 else
>
> There are several places where the else keyword for else block can be omitted because the if block ends with return.
> This would allow the code in else block to move leftward (for easier reading).

OK, I've removed the "else" in places where it can be removed.

>        if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
>
> I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the warning would be treated as error) ?
> Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current meaning (find both hash fn's).

It's fine not to use output parameters.  It's not the only one in the
code base ignoring one from that very function. See
execTuplesHashPrepare().

>     rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
>
> Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98.

I don't think exposing such knobs for users to adjust is a good idea.
Can you think of a case where you'd want to change it? Or do you think
98% is not a good number?

>
> For +paraminfo_get_equal_hashops :
>
> +       else
> +           Assert(false);

I'm keen to leave it like it is.  I don't see any need to bloat the
compiled code with an elog(ERROR).

There's a comment in RelOptInfo.lateral_vars that mentions:

/* LATERAL Vars and PHVs referenced by rel */

So, if anyone, in the future, wants to add some other node type to
that list then they'll have a bit more work to do. Plus, I'm only
doing the same as what's done in create_lateral_join_info().

I'll run the updated patch which includes the cache_check_mem()
function for a bit and post an updated patch to the main thread a bit
later.

Thanks for having a look at this patch.

David

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Proposed patch for key managment