Re: Speeding up parts of the planner using a binary search treestructure for nodes

Поиск

Список

Период

Сортировка

От	Ashutosh Bapat
Тема	Re: Speeding up parts of the planner using a binary search treestructure for nodes
Дата	29 мая 2020 г. 16:52:01
Msg-id	CAExHW5tgc_kVLCSTM=hnLQb5qj9SAnvCzivgg9BOty0mecrytQ@mail.gmail.com обсуждение исходный текст
Ответ на	Speeding up parts of the planner using a binary search tree structurefor nodes (David Rowley <dgrowleyml@gmail.com>)
Ответы	Re: Speeding up parts of the planner using a binary search treestructure for nodes (David Rowley <dgrowleyml@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, May 29, 2020 at 10:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> Hi,
>
> In [1] I mentioned that I'd like to look into coding a data structure
> to allow Node types to be looked up more efficiently than what List
> allows.  There are many places in the planner where we must determine
> if a given Node is in some List of Nodes. This is an O(n) operation.
> We could reduce these to as low as O(log n) with a binary search tree.
>
> A few places in the code that does this which can become a problem
> when the lists being searched are large:
>
> get_eclass_for_sort_expr()

eclass members have relids as their key. Can we use hash table instead
like how we manage join relations? I know that there are other cases
where we search for subset of relids instead of exact match. So the
hash table has to account for that. If we could somehow create a hash
value out of an expression node (we almost hash anything so that
should be possible) we should be able to use hash table for searching
expression as well.

> tlist_member()

hash table by expression as key here as well?

>
> For the former, it's quite easy to see this come up in the profiles by
> querying a partitioned table with many partitions. e.g
>
> create table listp (a int) partition by list(a);
> select 'create table listp'||x::Text || ' partition of listp for
> values in('||x::text||');' from generate_Series(1,10000)x;
> \gexec
> create index on listp(a);
> explain select * from listp order by a;
>
> There are a few places that this new data structure could be used. My
> primary interest at the moment is EquivalenceClass.ec_members. There
> are perhaps other interesting places, such as planner targetlists, but
> obviously, any changes would need to be proved worthwhile before
> they're made.
>
> Per what I mentioned in [1], I'd like this structure to be a binary
> search tree, (either a red/black or AVL binary search tree compacted
> into an array.) So instead of having ->left ->right pointers, we have
> left and right offsets back into the array.  This allows very fast and
> cache-friendly unordered traversals of the tree, as it's an array
> whenever we want it to be and a tree when we need to perform a lookup
> of a specific value. Because I'm not using a true tree, i.e pointers
> to elements, then it does not appear like I can use rbtree.c
>
> The first step to make this work is to modify equalfuncs.c to add an
> equalCompare() function which will return an int to indicate the sort
> order of the node against another node. I've drafted a patch to
> implement this and attached it here. I've done this in 2 phases, 0001
> creates and implements datatype specific macros for the comparisons.
> Doing this allows us to optimise the bool/char field comparison macros
> in 0002 so we can do a simple subtransaction rather than 2
> comparisons. The 0002 patch modifies each comparison macro and changes
> all the equal functions to return int rather than bool.  The external
> comparison function is equalCompare(). equal() just calls
> equalCompare() and checks the value returned is 0.
>
> With both patches, there is an increase in size of about 17% for the
> object file for equalfuncs:
>
> equalfuncs.o
> Master: 56768 bytes
> Patched: 66912 bytes
>
> If I don't use the macros specifically optimised for bool and char,
> then the size is 68240 bytes. So I think doing 0001 is worth it.
>
> This does break the API for ExtensibleNodeMethods as it's no longer
> enough to just have nodeEqual(). In the 0002 patch I've changed this
> to nodeCompare(). Extension authors who are using this will need to
> alter their code to implement a proper comparison function.
>
> For now, I'm mostly just interested in getting feedback about making
> this change to equalfuncs.c.  Does anyone object to it?
>
> David
>
> [1] https://www.postgresql.org/message-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug%40mail.gmail.com



-- 
Best Wishes,
Ashutosh Bapat

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Ashutosh Bapat
Дата: 29 мая 2020 г., 16:37:03
Сообщение: Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.

Следующее

От: "David G. Johnston"
Дата: 29 мая 2020 г., 16:56:36
Сообщение: Re: pg_dump fail to not dump public schema orders

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Speeding up parts of the planner using a binary search treestructure for nodes

Предыдущее

Следующее