Optimize WindowAgg's use of tuplestores

Поиск
Список
Период
Сортировка
От David Rowley
Тема Optimize WindowAgg's use of tuplestores
Дата
Msg-id CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Optimize WindowAgg's use of tuplestores
Список pgsql-hackers
As part of making tuplestores faster [1], I noticed that in WindowAgg, when we end one partition we call tuplestore_end() and then we do tuplestore_begin_heap() again for the next partition in begin_partition() and then go on to set up the tuplestore read pointers according to what's required for the given frameOptions of the WindowAgg.  This might make sense if the frameOptions could change between partitions, but they can't, so I don't see any reason why we can't just do tuplestore_clear() at the end of a partition.  That resets the read pointer positions back to the start again ready for the next partition.

I wrote the attached patch and checked how it affects performance. It helps quite a bit when there are lots of partitions.

CREATE TABLE a (a INT NOT NULL);
INSERT INTO a SELECT x FROM generate_series(1,1000000)x;
VACUUM FREEZE ANALYZE a;

bench.sql:
SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000;

master:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 293.488 ms
latency average = 295.509 ms
latency average = 297.772 ms

patched:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 203.234 ms
latency average = 204.538 ms
latency average = 203.877 ms

About 45% faster.

Here's the top of perf top of each:

master:
     8.61%  libc.so.6          [.] _int_malloc
     6.71%  libc.so.6          [.] _int_free
     6.42%  postgres           [.] heap_form_minimal_tuple
     6.40%  postgres           [.] tuplestore_alloc_read_pointer
     5.87%  libc.so.6          [.] unlink_chunk.constprop.0
     3.86%  libc.so.6          [.] __memmove_avx512_unaligned_erms
     3.80%  postgres           [.] AllocSetFree
     3.51%  postgres           [.] spool_tuples
     3.45%  postgres           [.] ExecWindowAgg
     2.09%  postgres           [.] tuplesort_puttuple_common
     1.92%  postgres           [.] comparetup_datum
     1.88%  postgres           [.] AllocSetAlloc
     1.85%  postgres           [.] tuplesort_heap_replace_top
     1.84%  postgres           [.] ExecStoreBufferHeapTuple
     1.69%  postgres           [.] window_gettupleslot

patched:
     8.95%  postgres           [.] ExecWindowAgg
     8.10%  postgres           [.] heap_form_minimal_tuple
     5.16%  postgres           [.] spool_tuples
     4.03%  libc.so.6          [.] __memmove_avx512_unaligned_erms
     4.02%  postgres           [.] begin_partition
     3.11%  postgres           [.] tuplesort_puttuple_common
     2.97%  postgres           [.] AllocSetAlloc
     2.96%  postgres           [.] comparetup_datum
     2.83%  postgres           [.] tuplesort_heap_replace_top
     2.79%  postgres           [.] window_gettupleslot
     2.22%  postgres           [.] AllocSetFree
     2.11%  postgres           [.] MemoryContextReset
     1.99%  postgres           [.] LogicalTapeWrite
     1.98%  postgres           [.] advance_windowaggregate

Lots less malloc/free work going on.

I'm also tempted to do a cleanup of the state machine inside nodeWindowAgg.c as I had to add another bool flag to WindowAggState to replace the previous test that checked if the tuplestore was empty (i.e just finished a partition) with if (winstate->buffer == NULL). I couldn't do that anymore due to no longer nullifying that field. I think such a cleanup would be a separate patch. Annoyingly the extra bool is the 9th bool flag and widens that struct by 8 bytes and leaves a 7 byte hole.

David

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: tests fail on windows with default git settings
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: 010_pg_basebackup.pl vs multiple filesystems