Re: [HACKERS] [PATCH] Incremental sort

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: [HACKERS] [PATCH] Incremental sort
Дата
Msg-id CAPpHfdte2tr+_W9TmRZ5rrfqhOCteYrBkgKV07QCbNoycV=U5w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] [PATCH] Incremental sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] [PATCH] Incremental sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi!

On Tue, Apr 3, 2018 at 2:10 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I think solving this may be fairly straight-forward. Essentially, until
now we only had one way to do the sort, so it was OK to make the sort
implicit by checking if the path is sorted

    if (input not sorted)
    {
        ... add a Sort node ...
    }

But now we have multiple possible ways to do the sort, with different
startup/total costs. So the places that create the sorts need to
actually generate the Sort paths for each sort alternative, and store
the information in the Sort node (instead of relying on pathkeys).

Ultimately, this should simplify the createplan.c places making all the
make_sort calls unnecessary (i.e. the input should be already sorted
when needed). Otherwise it'd mean the decision needs to be done locally,
but I don't think that should be needed.

But it's surely a fairly invasive change to the patch ...

Right, there are situation when incremental sort has lower startup cost,
but higher total cost.  In order to find lower cost, we ideally should generate
paths for both full sort and incremental sort.  However, that would increase
number of total pathes, and could slowdown planning time.  Another issue
that we don't always generate pathes for sort.  And yes, it would be rather
invasive.  So, that doesn't look feasible to get into 11.

Intead, I decided to cut usage of incremental sort.  Now, incremental sort
is generated only in create_sort_path().  Cheaper path selected between
incremental sort and full sort with taking limit_tuples into account.
That limits usage of incremental sort, however risk of regression by this
patch is also minimal.  In fact, incremental sort will be used only when
sort is explicitly specified and simultaneously LIMIT is specified or
dataset to be sorted is large and incremental sort saves disk IO.

Attached patch also incorporates following commits made by Alexander Kuzmenkov:
* Rename fields of IncrementalSortState to snake_case for the sake of consistency.
* Rename group test function to isCurrentGroup.
* Comments from Tomas Vondra about nodeIncrementalSort.c
* Add a test for incremental sort.
* Add a separate function to calculate costs of incremental sort.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Следующее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning