Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Поиск
Список
Период
Сортировка
От	Justin Pryzby
Тема	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Дата	9 мая 2020 г. 20:18:36
Msg-id	20200509201836.GC18456@telsasoft.com обсуждение исходный текст
Ответ на	Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>)
Ответы	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Список	pgsql-hackers
Дерево обсуждения
Checking if it's possible to address this Opened Item before 13b1.

https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
consistency of explain output: two spaces, equals vs colons, semicolons (incremental sort) 

On Sun, Apr 19, 2020 at 09:46:55AM -0400, James Coleman wrote:
> On Sat, Apr 18, 2020 at 10:36 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote:
> > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > > > And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
> > > ...
> > > > I read through that subthread, and the ending seemed to be Peter
> > > > wanting things to be unified. Was there a conclusion beyond that?
> > >
> > > This discussion is ongoing.  I think let's wait until that's settled before
> > > addressing this more complex and even newer case.  We can add "explain, two
> > > spaces and equals vs colon" to the "Open items" list if need be - I hope the
> > > discussion will not delay the release.
> >
> > The change proposed on the WAL thread is minimal, and makes new explain(WAL)
> > output consistent with the that of explain(BUFFERS).
> >
> > That uses a different format from "Sort", which is what incremental sort should
> > follow.  (Hashjoin also uses the Sort's format of two-spaces and colons rather
> > than equals).
> 
> I think it's not great that buffers/sort are different, but I agree
> that we should follow sort.
> 
> > So the attached 0001 makes explain output for incremental sort more consistent
> > with sort:
> >
> >      - Two spaces;
> >      - colons rather than equals;
> >      - Don't use semicolon, which isn't in use anywhere else;
> >
> > I tested with this:
> > template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT a-(a%100), a%1000 FROM
generate_series(1,99999)a;CREATE INDEX ON t(i); VACUUM VERBOSE ANALYZE t;
 
> > template1=# explain analyze SELECT * FROM t a ORDER BY i,j;
> > ...
> >    Full-sort Groups: 1000  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB  Pre-sorted Groups: 1000
Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB
 
> >
> > On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote:
> > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > > Should "Pre-sorted Groups:" be on a separate line ?
> > > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB Pre-sorted Groups: 1 Sort Method:
quicksortMemory: avg=30kB peak=30kB
 
> > > >
> > > > I'd originally had that, but Tomas wanted it to be more compact. It's
> > > > easy to adjust though if the consensus changes on that.
> > >
> > > I'm OK with changing the format if there's a consensus. The current
> > > format seemed better to me, but I'm not particularly attached to it.
> >
> > I still think Pre-sorted groups should be on a separate line, as in 0002.
> > In addition to looking better (to me), and being easier to read, another reason
> > is that there are essentially key=>values here, but the keys are repeated (Sort
> > Method, etc).
> 
> I collapsed this into 0001 because I think that if we're going to do
> away with the key=value style then we effectively to have to do this
> to avoid the repeated values being confusing (with key=value it kinda
> worked, because that made it seem like the avg/peak were clearly a
> subset of the Sort Groups info).
> 
> I also cleaned up the newline patch a bit in the process (we already
> have a way to indent with a flag so don't need to do it directly).
> 
> > I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that here.
> 
> I went ahead and did that too; we already use "Full-sort", so the
> proposed change makes both parallel.
> 
> > Michael already patched most of the comment typos, the remainder I'm including
> > here as a "nearby patch"..
> 
> Modified slightly.
> 
> James

> From becd60ba348558fa241db5cc2100a84b757cbdc5 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Mon, 6 Apr 2020 17:37:31 -0500
> Subject: [PATCH v2 2/2] comment typos: Incremental Sort
> 
> commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da
> Author: Tomas Vondra <tomas.vondra@postgresql.org>
> 
> Previously reported here:
> https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com
> ---
>  src/backend/commands/explain.c             |  4 ++--
>  src/backend/executor/nodeIncrementalSort.c | 10 ++++++----
>  src/backend/utils/sort/tuplesort.c         |  8 ++++----
>  3 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 5f91c569a0..86c10458f4 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2865,7 +2865,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>  }
>  
>  /*
> - * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
> + * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node
>   */
>  static void
>  show_incremental_sort_info(IncrementalSortState *incrsortstate,
> @@ -2913,7 +2913,7 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
>              &incrsortstate->shared_info->sinfo[n];
>  
>              /*
> -             * If a worker hasn't process any sort groups at all, then exclude
> +             * If a worker hasn't processed any sort groups at all, then exclude
>               * it from output since it either didn't launch or didn't
>               * contribute anything meaningful.
>               */
> diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
> index 39ba11cdf7..05c60ec3e0 100644
> --- a/src/backend/executor/nodeIncrementalSort.c
> +++ b/src/backend/executor/nodeIncrementalSort.c
> @@ -987,8 +987,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
>  
>      /*
>       * Incremental sort can't be used with either EXEC_FLAG_REWIND,
> -     * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
> -     * batches in the current sort state.
> +     * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because the current sort state
> +     * contains only one sort batch rather than the full result set.
>       */
>      Assert((eflags & (EXEC_FLAG_BACKWARD |
>                        EXEC_FLAG_MARK)) == 0);
> @@ -1153,8 +1153,10 @@ ExecReScanIncrementalSort(IncrementalSortState *node)
>      /*
>       * If we've set up either of the sort states yet, we need to reset them.
>       * We could end them and null out the pointers, but there's no reason to
> -     * repay the setup cost, and because guard setting up pivot comparator
> -     * state similarly, doing so might actually cause a leak.
> +     * repay the setup cost, and because ExecIncrementalSort guards
> +     * presorted column functions by checking to see if the full sort state
> +     * has been initialized yet, setting the sort states to null here might
> +     * actually cause a leak.
>       */
>      if (node->fullsort_state != NULL)
>      {
> diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
> index de38c6c7e0..d59e3d5a8d 100644
> --- a/src/backend/utils/sort/tuplesort.c
> +++ b/src/backend/utils/sort/tuplesort.c
> @@ -1428,11 +1428,11 @@ tuplesort_updatemax(Tuplesortstate *state)
>      }
>  
>      /*
> -     * Sort evicts data to the disk when it didn't manage to fit those data to
> -     * the main memory.  This is why we assume space used on the disk to be
> +     * Sort evicts data to the disk when it wasn't able to fit that data into
> +     * main memory.  This is why we assume space used on the disk to be
>       * more important for tracking resource usage than space used in memory.
> -     * Note that amount of space occupied by some tuple set on the disk might
> -     * be less than amount of space occupied by the same tuple set in the
> +     * Note that the amount of space occupied by some tupleset on the disk might
> +     * be less than amount of space occupied by the same tupleset in
>       * memory due to more compact representation.
>       */
>      if ((isSpaceDisk && !state->isMaxSpaceDisk) ||
> -- 
> 2.17.1
> 

> From 8e80be2b345c3940f76ffbd5e3c201a7ae855784 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Wed, 15 Apr 2020 08:45:21 -0500
> Subject: [PATCH v2 1/2] Fix explain output for incr sort:
> 
>  - Two spaces
>  - colons rather than equals
>  - Don't use semicolon
>  - Put Pre-sorted groups on a separate line
>  - Rename Presorted to Pre-sorted (to match Full-sort)
> ---
>  src/backend/commands/explain.c                | 22 ++++++++-----------
>  .../regress/expected/incremental_sort.out     | 21 +++++++++---------
>  src/test/regress/sql/incremental_sort.sql     |  4 ++--
>  3 files changed, 22 insertions(+), 25 deletions(-)
> 
> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 7ae6131676..5f91c569a0 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2778,7 +2778,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>      {
>          if (indent)
>              appendStringInfoSpaces(es->str, es->indent * 2);
> -        appendStringInfo(es->str, "%s Groups: " INT64_FORMAT " Sort Method", groupLabel,
> +        appendStringInfo(es->str, "%s Groups: " INT64_FORMAT "  Sort Method", groupLabel,
>                           groupInfo->groupCount);
>          /* plural/singular based on methodNames size */
>          if (list_length(methodNames) > 1)
> @@ -2798,24 +2798,20 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>              const char *spaceTypeName;
>  
>              spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
> -            appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
> +            appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
>                               spaceTypeName, avgSpace,
> -                             groupInfo->maxMemorySpaceUsed);
> +                             spaceTypeName, groupInfo->maxMemorySpaceUsed);
>          }
>  
>          if (groupInfo->maxDiskSpaceUsed > 0)
>          {
>              long        avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
> -
>              const char *spaceTypeName;
>  
>              spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
> -            /* Add a semicolon separator only if memory stats were printed. */
> -            if (groupInfo->maxMemorySpaceUsed > 0)
> -                appendStringInfo(es->str, ";");
> -            appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
> +            appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
>                               spaceTypeName, avgSpace,
> -                             groupInfo->maxDiskSpaceUsed);
> +                             spaceTypeName, groupInfo->maxDiskSpaceUsed);
>          }
>      }
>      else
> @@ -2899,8 +2895,8 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
>          if (prefixsortGroupInfo->groupCount > 0)
>          {
>              if (es->format == EXPLAIN_FORMAT_TEXT)
> -                appendStringInfo(es->str, " ");
> -            show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
> +                appendStringInfo(es->str, "\n");
> +            show_incremental_sort_group_info(prefixsortGroupInfo, "Pre-sorted", true, es);
>          }
>          if (es->format == EXPLAIN_FORMAT_TEXT)
>              appendStringInfo(es->str, "\n");
> @@ -2943,8 +2939,8 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
>              if (prefixsortGroupInfo->groupCount > 0)
>              {
>                  if (es->format == EXPLAIN_FORMAT_TEXT)
> -                    appendStringInfo(es->str, " ");
> -                show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
> +                    appendStringInfo(es->str, "\n");
> +                show_incremental_sort_group_info(prefixsortGroupInfo, "Pre-sorted", true, es);
>              }
>              if (es->format == EXPLAIN_FORMAT_TEXT)
>                  appendStringInfo(es->str, "\n");
> diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
> index 238d89a206..2b40a26d82 100644
> --- a/src/test/regress/expected/incremental_sort.out
> +++ b/src/test/regress/expected/incremental_sort.out
> @@ -106,7 +106,7 @@ declare
>    space_key text;
>  begin
>    for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
> -    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
> +    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
>        for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
>          node := jsonb_set(node, array[group_key, space_key, 'Average Sort Space Used'], '"NN"', false);
>          node := jsonb_set(node, array[group_key, space_key, 'Peak Sort Space Used'], '"NN"', false);
> @@ -128,7 +128,7 @@ declare
>    space_key text;
>  begin
>    for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
> -    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
> +    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
>        group_stats := node->group_key;
>        for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
>          if (group_stats->space_key->'Peak Sort Space Used')::bigint < (group_stats->space_key->'Peak Sort Space
Used')::bigintthen
 
> @@ -533,13 +533,13 @@ select * from (select * from t order by a) s order by a, b limit 55;
>  
>  -- Test EXPLAIN ANALYZE with only a fullsort group.
>  select explain_analyze_without_memory('select * from (select * from t order by a) s order by a, b limit 55');
> -                                 explain_analyze_without_memory                                 
> -------------------------------------------------------------------------------------------------
> +                                        explain_analyze_without_memory                                         
> +---------------------------------------------------------------------------------------------------------------
>   Limit (actual rows=55 loops=1)
>     ->  Incremental Sort (actual rows=55 loops=1)
>           Sort Key: t.a, t.b
>           Presorted Key: t.a
> -         Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: avg=NNkB peak=NNkB
> +         Full-sort Groups: 2  Sort Methods: top-N heapsort, quicksort  Average Memory: NNkB  Peak Memory: NNkB
>           ->  Sort (actual rows=101 loops=1)
>                 Sort Key: t.a
>                 Sort Method: quicksort  Memory: NNkB
> @@ -708,18 +708,19 @@ select * from t left join (select * from (select * from t order by a) v order by
>  rollback;
>  -- Test EXPLAIN ANALYZE with both fullsort and presorted groups.
>  select explain_analyze_without_memory('select * from (select * from t order by a) s order by a, b limit 70');
> -                                                                    explain_analyze_without_memory
                                                
 
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> +                                         explain_analyze_without_memory                                         
> +----------------------------------------------------------------------------------------------------------------
>   Limit (actual rows=70 loops=1)
>     ->  Incremental Sort (actual rows=70 loops=1)
>           Sort Key: t.a, t.b
>           Presorted Key: t.a
> -         Full-sort Groups: 1 Sort Method: quicksort Memory: avg=NNkB peak=NNkB Presorted Groups: 5 Sort Methods:
top-Nheapsort, quicksort Memory: avg=NNkB peak=NNkB
 
> +         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: NNkB  Peak Memory: NNkB
> +         Pre-sorted Groups: 5  Sort Methods: top-N heapsort, quicksort  Average Memory: NNkB  Peak Memory: NNkB
>           ->  Sort (actual rows=1000 loops=1)
>                 Sort Key: t.a
>                 Sort Method: quicksort  Memory: NNkB
>                 ->  Seq Scan on t (actual rows=1000 loops=1)
> -(9 rows)
> +(10 rows)
>  
>  select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * from (select * from t order by a) s
orderby a, b limit 70'));
 
>                    jsonb_pretty                   
> @@ -747,7 +748,7 @@ select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * from
>                   "Average Sort Space Used": "NN"+
>               }                                  +
>           },                                     +
> -         "Presorted Groups": {                  +
> +         "Pre-sorted Groups": {                 +
>               "Group Count": 5,                  +
>               "Sort Methods Used": [             +
>                   "top-N heapsort",              +
> diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
> index 2241cc9c02..6f70b36a81 100644
> --- a/src/test/regress/sql/incremental_sort.sql
> +++ b/src/test/regress/sql/incremental_sort.sql
> @@ -82,7 +82,7 @@ declare
>    space_key text;
>  begin
>    for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
> -    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
> +    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
>        for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
>          node := jsonb_set(node, array[group_key, space_key, 'Average Sort Space Used'], '"NN"', false);
>          node := jsonb_set(node, array[group_key, space_key, 'Peak Sort Space Used'], '"NN"', false);
> @@ -105,7 +105,7 @@ declare
>    space_key text;
>  begin
>    for node in select * from jsonb_array_elements(explain_analyze_inc_sort_nodes(query)) t loop
> -    for group_key in select unnest(array['Full-sort Groups', 'Presorted Groups']::text[]) t loop
> +    for group_key in select unnest(array['Full-sort Groups', 'Pre-sorted Groups']::text[]) t loop
>        group_stats := node->group_key;
>        for space_key in select unnest(array['Sort Space Memory', 'Sort Space Disk']::text[]) t loop
>          if (group_stats->space_key->'Peak Sort Space Used')::bigint < (group_stats->space_key->'Peak Sort Space
Used')::bigintthen
 
> -- 
> 2.17.1
> 


-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Incremental sort (was: PoC: Partial sort)