Re: Use generation context to speed up tuplesorts
От | Tomas Vondra |
---|---|
Тема | Re: Use generation context to speed up tuplesorts |
Дата | |
Msg-id | 13808af0-2bb5-b506-62d0-1fb67e3385d0@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Use generation context to speed up tuplesorts (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Use generation context to speed up tuplesorts
(David Rowley <dgrowleyml@gmail.com>)
Re: Use generation context to speed up tuplesorts (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-hackers |
On 8/6/21 3:07 PM, David Rowley wrote: > On Wed, 4 Aug 2021 at 02:10, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> A review would be nice, although it can wait - It'd be interesting to >> know if those patches help with the workload(s) you've been looking at. > > I tried out the v2 set of patches using the attached scripts. The > attached spreadsheet includes the original tests and compares master > with the patch which uses the generation context vs that patch plus > your v2 patch. > > I've also included 4 additional tests, each of which starts with a 1 > column table and then adds another 32 columns testing the performance > after adding each additional column. I did this because I wanted to > see if the performance was more similar to master when the allocations > had less power of 2 wastage from allocset. If, for example, you look > at row 123 of the spreadsheet you can see both patched and unpatched > the allocations were 272 bytes each yet there was still a 50% > performance improvement with just the generation context patch when > compared to master. > > Looking at the spreadsheet, you'll also notice that in the 2 column > test of each of the 4 new tests the number of bytes used for each > allocation is larger with the generation context. 56 vs 48. This is > due to the GenerationChunk struct size being later than the Allocset's > version by 8 bytes. This is because it also holds the > GenerationBlock. So with the patch there are some cases where we'll > use slightly more memory. > > Additional tests: > > 1. Sort 10000 tuples on a column with values 0-99 in memory. > 2. As #1 but with 1 million tuples. > 3 As #1 but with a large OFFSET to remove the overhead of sending to the client. > 4. As #2 but with a large OFFSET. > > Test #3 above is the most similar one to the original tests and shows > similar gains. When the sort becomes larger (1 million tuple test), > the gains reduce. This indicates the gains are coming from improved > CPU cache efficiency from the removal of the power of 2 wastage in > memory allocations. > > All of the tests show that the patches to improve the allocation > efficiency of generation.c don't help to improve the results of the > test cases. I wondered if it's maybe worth trying to see what happens > if instead of doubling the allocations each time, quadruple them > instead. I didn't try this. > Thanks for the scripts and the spreadsheet with results. I doubt quadrupling the allocations won't help very much, but I suspect the problem might be in the 0004 patch - at least that's what shows regression in my results. Could you try with just 0001-0003 applied? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Soumyadeep ChakrabortyДата:
Сообщение: Re: Changes to recovery_min_apply_delay are ignored while waiting for delay
Следующее
От: Peter GeogheganДата:
Сообщение: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE