Re: Support loser tree for k-way merge
| От | Sami Imseih |
|---|---|
| Тема | Re: Support loser tree for k-way merge |
| Дата | |
| Msg-id | CAA5RZ0u=WDa-v+8Y_yE4WeRS8VtmcmQGuWVjZFCqZcXx-hEP3w@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Support loser tree for k-way merge ("cca5507" <cca5507@qq.com>) |
| Список | pgsql-hackers |
Hi,
Thanks for raising this.
> Now we use 'heap' during the k-way merge, it's O(n log k). The 'loser tree' is also O(n log k), but
> it's usually has fewer comparisons than the 'heap'. When the tuple comparator is complex, the
> 'loser tree' can significantly speed up the k-way merge.
I was playing around with this and I do also see the performance
benefits that you
mention with loser tree.
```
DROP TABLE IF EXISTS t;
CREATE TABLE t (
col1 INT,
col2 INT
);
INSERT INTO t (col1, col2)
SELECT
(i % 10000000),
(i % 100)
FROM generate_series(1, 10000000) AS s(i);
```
Using something like the above, testing an "external merge" on the
high cardinality
column there is a bit of an improvement with loser tree, but there is
a bit of a loss on
low-cardinality.
In general though, I see a bit of an issue with allowing a GUC to
change strategies. What happens if there are multiple "external merge"
nodes and they can each benefit from different strategies? Maybe that's
not an issue since all the other enable_ GUCs have that problem, but
it is something to consider.
Can we drive the decision for what to do based on optimizer
stats, i.e. n_distinct and row counts? Not sure what the calculation would
be specifically, but something else to consider.
We can still provide the GUC to override the optimizer decisions,
but at least the optimizer, given up-to-date stats, may get it right most
of the time.
--
Sami Imseih
Amazon Web Services (AWS)
В списке pgsql-hackers по дате отправления: