Обсуждение: pgsql: Specialize tuplesort routines for different kinds of abbreviated
Specialize tuplesort routines for different kinds of abbreviated keys Previously, the specialized tuplesort routine inlined handling for reverse-sort and NULLs-ordering but called the datum comparator via a pointer in the SortSupport struct parameter. Testing has showed that we can get a useful performance gain by specializing datum comparison for the different representations of abbreviated keys -- signed and unsigned 64-bit integers and signed 32-bit integers. Almost all abbreviatable data types will benefit -- the only exception for now is numeric, since the datum comparison is more complex. The performance gain depends on data type and input distribution, but often falls in the range of 10-20% faster. Thomas Munro Reviewed by Peter Geoghegan, review and performance testing by me Discussion: https://www.postgresql.org/message-id/CA%2BhUKGKKYttZZk-JMRQSVak%3DCXSJ5fiwtirFf%3Dn%3DPAbumvn1Ww%40mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/6974924347c908335607a4a2f252213d58e21b7c Modified Files -------------- src/backend/access/gist/gistproc.c | 18 +--- src/backend/access/nbtree/nbtcompare.c | 22 ++--- src/backend/utils/adt/date.c | 15 +--- src/backend/utils/adt/mac.c | 23 +---- src/backend/utils/adt/network.c | 17 +--- src/backend/utils/adt/timestamp.c | 11 +++ src/backend/utils/adt/uuid.c | 25 ++---- src/backend/utils/adt/varlena.c | 34 ++----- src/backend/utils/sort/tuplesort.c | 160 ++++++++++++++++++++++++++++++++- src/include/utils/sortsupport.h | 115 ++++++++++++++++++++++++ 10 files changed, 308 insertions(+), 132 deletions(-)
So far, kestrel and tamandua don't like this, and they both use "-fsanitize=undefined,alignment". I'll try to reproduce locally in a bit. -- John Naylor EDB: http://www.enterprisedb.com
Hi, On 2022-04-02 16:06:40 +0700, John Naylor wrote: > So far, kestrel and tamandua don't like this, and they both use > "-fsanitize=undefined,alignment". I'll try to reproduce locally in a > bit. Just hit this in my development environment. FWIW, I found it hard to work with ubsan without applying 0002 from https://postgr.es/m/20220323173537.ll7klrglnp4gn2um%40alap3.anarazel.de I plan to polish that to be able to commit it, but won't have cycles before the CF ends. with that applied CFLAGS=-fsanitize=alignment,undefined,address -fno-sanitize-recover=all and export ASAN_OPTIONS="detect_leaks=0:disable_coredump=0:print_stacktrace=1:abort_on_error=1" UBSAN_OPTIONS="disable_coredump=0:print_stacktrace=1:abort_on_error=1" I get a backtrace to investigate. Unfortunately abort_on_error=1 also removes the nicer error message :(. So I ran both. It triggers reliably on CLUSTER clstr_expression USING clstr_expression_minus_a; for me. /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:723:23: runtime error: load of value 150, which is not a validvalue for type '_Bool' #0 __ubsan::__ubsan_handle_load_invalid_value_abort (Data=0x7ff65e8fada0, Val=70) at ../../../../src/libsanitizer/ubsan/ubsan_handlers.cpp:548 #1 0x00007ff65a649093 in qsort_tuple_int32_compare (a=0x62b000fc0240, b=0x62b000fc0258, state=0x625000066a20) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:723 #2 0x00007ff65a64bc7b in qsort_tuple_int32 (data=0x62b000fc0240, n=133, arg=0x625000066a20) at /home/andres/src/postgresql/src/include/lib/sort_template.h:313 #3 0x00007ff65a672d59 in tuplesort_sort_memtuples (state=0x625000066a20) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:3613 #4 0x00007ff65a6617ec in tuplesort_performsort (state=0x625000066a20) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:2154 #5 0x00007ff6586965dd in heapam_relation_copy_for_cluster (OldHeap=0x7ff64d893308, NewHeap=0x7ff64d896fc0, OldIndex=0x7ff64d89c860,use_sort=true, OldestXmin=23706, xid_cutoff=0x7ffd4650a480, multi_cutoff=0x7ffd4650a490, num_tuples=0x7ffd4650a4a0, tups_vacuumed=0x7ffd4650a4c0, tups_recently_dead=0x7ffd4650a4e0) at /home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:955 #6 0x00007ff658d6c4b7 in table_relation_copy_for_cluster (OldTable=0x7ff64d893308, NewTable=0x7ff64d896fc0, OldIndex=0x7ff64d89c860,use_sort=true, OldestXmin=23706, xid_cutoff=0x7ffd4650a480, multi_cutoff=0x7ffd4650a490, num_tuples=0x7ffd4650a4a0, tups_vacuumed=0x7ffd4650a4c0, tups_recently_dead=0x7ffd4650a4e0) at /home/andres/src/postgresql/src/include/access/tableam.h:1658 #7 0x00007ff658d728aa in copy_table_data (OIDNewHeap=73728, OIDOldHeap=47052, OIDOldIndex=47078, verbose=false, pSwapToastByContent=0x7ffd4650a670, pFreezeXid=0x7ffd4650a680, pCutoffMulti=0x7ffd4650a690) at /home/andres/src/postgresql/src/backend/commands/cluster.c:913 #8 0x00007ff658d6ff9e in rebuild_relation (OldHeap=0x7ff64d893308, indexOid=47078, verbose=false) at /home/andres/src/postgresql/src/backend/commands/cluster.c:606 #9 0x00007ff658d6e622 in cluster_rel (tableOid=47052, indexOid=47078, params=0x7ffd4650a7b0) at /home/andres/src/postgresql/src/backend/commands/cluster.c:427 #10 0x00007ff658d6d774 in cluster (pstate=0x619000001aa0, stmt=0x625000005d00, isTopLevel=true) at /home/andres/src/postgresql/src/backend/commands/cluster.c:195 #11 0x00007ff659d425b1 in standard_ProcessUtility (pstmt=0x625000006010, queryString=0x625000005220 "CLUSTER clstr_expression USING clstr_expression_minus_a;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,params=0x0, queryEnv=0x0, dest=0x6250000060e0, qc=0x7ffd4650ae40) at /home/andres/src/postgresql/src/backend/tcop/utility.c:862 #12 0x00007ff659d40901 in ProcessUtility (pstmt=0x625000006010, queryString=0x625000005220 "CLUSTER clstr_expression USINGclstr_expression_minus_a;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x6250000060e0, qc=0x7ffd4650ae40) (rr) p/x *b $4 = {tuple = 0x62500006ba78, datum1 = 0x7ff659cc7e9e, isnull1 = 0x46, srctape = 0x7ff6} There's definitely something borked - looks like this is ending up with bogus pointers? Using rr to set a watchpoint on isnull1, and continuing backward I see the memory written to with the following stack: (rr) watch -l b->isnull1 Hardware watchpoint 3: -location b->isnull1 (rr) reverse-continue Continuing. Hardware watchpoint 3: -location b->isnull1 Old value = 70 New value = 190 0x00007ff65a65fbe4 in puttuple_common (state=0x625000066a20, tuple=0x7ffd4650a150) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:2000 2000 state->memtuples[state->memtupcount++] = *tuple; (rr) bt #0 0x00007ff65a65fbe4 in puttuple_common (state=0x625000066a20, tuple=0x7ffd4650a150) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:2000 #1 0x00007ff65a65a478 in tuplesort_putheaptuple (state=0x625000066a20, tup=0x625000071d28) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:1810 #2 0x00007ff658696300 in heapam_relation_copy_for_cluster (OldHeap=0x7ff64d893308, NewHeap=0x7ff64d896fc0, OldIndex=0x7ff64d89c860,use_sort=true, OldestXmin=23706, xid_cutoff=0x7ffd4650a480, multi_cutoff=0x7ffd4650a490, num_tuples=0x7ffd4650a4a0, tups_vacuumed=0x7ffd4650a4c0, (note new/old are kind of switched around due to continuing in reverse) and then Hardware watchpoint 3: -location b->isnull1 Old value = 190 New value = false __memset_evex_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:143 143 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. (rr) bt #0 __memset_evex_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:143 #1 0x00007ff654d2c3c4 in __asan::Allocator::Allocate (this=this@entry=0x7ff654e3ce00 <__asan::instance>, size=<optimizedout>, size@entry=24640, alignment=<optimized out>, alignment@entry=8, stack=stack@entry=0x7ffd465092c0, alloc_type=alloc_type@entry=__asan::FROM_MALLOC, can_fill=can_fill@entry=true) at ../../../../src/libsanitizer/asan/asan_allocator.cpp:598 #2 0x00007ff654d28a9b in __asan::asan_malloc (size=size@entry=24640, stack=stack@entry=0x7ffd465092c0) at ../../../../src/libsanitizer/asan/asan_allocator.cpp:964 #3 0x00007ff654db998f in __interceptor_malloc (size=24640) at ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:70 #4 0x00007ff65a5c4f37 in AllocSetAlloc (context=0x625000066900, size=24576) at /home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:740 #5 0x00007ff65a60e385 in palloc (size=24576) at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:1082 #6 0x00007ff65a6513dc in tuplesort_begin_batch (state=0x625000066a20) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:963 #7 0x00007ff65a6501d9 in tuplesort_begin_common (workMem=65536, coordinate=0x0, randomAccess=false) at /home/andres/src/postgresql/src/backend/utils/sort/tuplesort.c:878 Greetings, Andres Freund
Hi, On 2022-04-02 13:15:57 -0700, Andres Freund wrote: > I get a backtrace to investigate. Unfortunately abort_on_error=1 also removes > the nicer error message :(. So I ran both. This bit was just me being stupid and missing the error message... > with that applied > CFLAGS=-fsanitize=alignment,undefined,address -fno-sanitize-recover=all > and > export ASAN_OPTIONS="detect_leaks=0:disable_coredump=0:print_stacktrace=1:abort_on_error=1" UBSAN_OPTIONS="disable_coredump=0:print_stacktrace=1:abort_on_error=1" Forgot to add the warning that disable_coredump=0 on non-linux (or old linux) systems can end up generating humongous core files... To be able to run rr with ubsan/asan one needs verify_asan_link_order=0 in ASAN_OPTIONS btw. Greetings, Andres Freund
Hi, On 2022-04-02 13:15:57 -0700, Andres Freund wrote: > There's definitely something borked - looks like this is ending up with bogus > pointers? Using rr to set a watchpoint on isnull1, and continuing backward I > see the memory written to with the following stack: Looks like it's just plain old uninitialized data. tuplesort_putheaptuple() doesn't compute SortTuple.{isnull1,datum1}. And comparetup_heap() doesn't rely on them being set. But tuplesort_sort_memtuples() will now bypass comparetup_heap(), so comparetup_heap computing those doesn't help anymore. Greetings, Andres Freund
On Sat, 2 Apr 2022 at 21:30, John Naylor <john.naylor@postgresql.org> wrote: > src/backend/utils/sort/tuplesort.c | 160 ++++++++++++++++++++++++++++++++- I was just looking over this change and wondered a few things: 1. Shouldn't ssup_datum_signed_cmp and ssup_datum_int32_cmp be using DatumGetInt32 and DatumGetInt64? 2. I also feel that it's relevant to comment about 32-bit safety for usages of these functions. I'm trying to work out what the point of ssup_datum_signed_cmp is when SIZEOF_DATUM == 4. As far as I see, we just never use it. However, looking at btint8sortsupport(), we seem to set the comparator based on USE_FLOAT8_BYVAL rather than SIZEOF_DATUM. It's true in master that USE_FLOAT8_BYVAL will be 1 if SIZEOF_VOIDP >= 8, per: #if SIZEOF_VOID_P >= 8 #define USE_FLOAT8_BYVAL 1 #endif #define SIZEOF_DATUM SIZEOF_VOID_P This is hypothetical, but if for some reason SIZEOF_VOIDP was larger than 8, say 16, then the above would define USE_FLOAT8_BYVAL resulting timestamp and bigint using the new comparators. However, the code you've added to ssup_datum_signed_cmp checks for SIZEOF_DATUM == 8. It would assume 32-bit accidentally. That would cause issues. From what I can see, ssup_datum_signed_cmp just shouldn't exist in 32-bit builds and we should be consistent about how we determine when comparators to use. I've attached a patch which is along the lines of how I imagined this should look. What do you think? David
Вложения
On Tue, May 10, 2022 at 3:46 PM David Rowley <dgrowleyml@gmail.com> wrote: > I was just looking over this change and wondered a few things: > > 1. Shouldn't ssup_datum_signed_cmp and ssup_datum_int32_cmp be using > DatumGetInt32 and DatumGetInt64? Right. > This is hypothetical, but if for some reason SIZEOF_VOIDP was larger > than 8, say 16, then the above would define USE_FLOAT8_BYVAL resulting > timestamp and bigint using the new comparators. However, the code > you've added to ssup_datum_signed_cmp checks for SIZEOF_DATUM == 8. It > would assume 32-bit accidentally. That would cause issues. Testing for Datum size seems more in line with the intent. (Even aside from that hypothetical, making all Datums 8 bytes has been suggested before, and this might make that change easier.) > From what I can see, ssup_datum_signed_cmp just shouldn't exist in > 32-bit builds and we should be consistent about how we determine when > comparators to use. > > I've attached a patch which is along the lines of how I imagined this > should look. > > What do you think? +1 -- John Naylor EDB: http://www.enterprisedb.com
On Tue, 10 May 2022 at 21:45, John Naylor <john.naylor@enterprisedb.com> wrote: > > On Tue, May 10, 2022 at 3:46 PM David Rowley <dgrowleyml@gmail.com> wrote: > > What do you think? > > +1 Thanks for having a look. I pushed this after I made a small adjustment to #ifdef out the qsort_tuple_signed sort specialization in 32-bit builds. On testing a 32-bit build with the patch I proposed, I was getting a warning about that function being unused. I found the 32-bit postgres binary is now an entire 4376 bytes smaller! David
On Wed, May 11, 2022 at 6:43 AM David Rowley <dgrowleyml@gmail.com> wrote: > > I pushed this after I made a small adjustment to #ifdef out the > qsort_tuple_signed sort specialization in 32-bit builds. On testing a > 32-bit build with the patch I proposed, I was getting a warning about > that function being unused. Earlier I looked at your patch, but didn't think to check the rest of the code affected by this commit. Do we also need something like the attached, for the ApplyXYZSortComparator functions? (I don't have a 32-bit platform to test on) -- John Naylor EDB: http://www.enterprisedb.com
Вложения
On Wed, 11 May 2022 at 20:42, John Naylor <john.naylor@enterprisedb.com> wrote: > Earlier I looked at your patch, but didn't think to check the rest of > the code affected by this commit. Do we also need something like the > attached, for the ApplyXYZSortComparator functions? (I don't have a > 32-bit platform to test on) I didn't notice those. I'm pretty sure those macros need to be used any time we want to convert a Datum into a C type. David
Re: pgsql: Specialize tuplesort routines for different kinds of abbreviated
От
Michael Paquier
Дата:
On Wed, May 11, 2022 at 03:42:05PM +0700, John Naylor wrote: > Earlier I looked at your patch, but didn't think to check the rest of > the code affected by this commit. Do we also need something like the > attached, for the ApplyXYZSortComparator functions? (I don't have a > 32-bit platform to test on) You could use for example gcc's -m32 to do that even on a 64b platform, no? -- Michael
Вложения
On Wed, 11 May 2022 at 20:42, John Naylor <john.naylor@enterprisedb.com> wrote: > (I don't have a > 32-bit platform to test on) I had to do as Michael mentioned yesterday to test this. FWIW, I compiled and ran the tests on my machine with -m32 and no compile or test failures. Also, looking at the patch, it seems fine to me. Do you want to push it? David
On Thu, May 12, 2022 at 4:50 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 11 May 2022 at 20:42, John Naylor <john.naylor@enterprisedb.com> wrote: > > (I don't have a > > 32-bit platform to test on) > > I had to do as Michael mentioned yesterday to test this. > > FWIW, I compiled and ran the tests on my machine with -m32 and no > compile or test failures. Also, looking at the patch, it seems fine > to me. Do you want to push it? Thanks for looking -- I'll push soon after experimenting a bit with -m32 for education's sake. -- John Naylor EDB: http://www.enterprisedb.com