Обсуждение: Re: The documentation for storage type 'plain' actually allows single byte header
Re: The documentation for storage type 'plain' actually allows single byte header
От
Bruce Momjian
Дата:
On Fri, Sep 29, 2023 at 06:45:52PM -0400, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: > > On Fri, 2023-09-29 at 18:19 -0400, Bruce Momjian wrote: > >> Where did we end with this? Is a doc patch the solution? > > > I don't think this went anywhere, and a doc patch is not the solution. > > Tom has argued convincingly that single-byte headers are an effect of the TOAST > > system, and that STORAGE PLAIN should disable all effects of TOAST. > > Well, that was the original idea: you could use STORAGE PLAIN if you > had C code that wasn't yet toast-aware. However, given the lack of > complaints, it seems there's no non-toast-aware code left anywhere. > And that's not too surprising, because the evolutionary pressure to > fix such code would be mighty strong, and a lot of time has passed. > > I'm now inclined to think that changing the docs is better than > changing the code; we'd be more likely to create new problems than > fix anything useful. > > I wonder though if there's really just one place claiming that > that's how it works. A trawl through the code comments might > be advisable. [ Discussion moved to hackers, same subject. ] Here is the original thread from pgsql-docs: https://www.postgresql.org/message-id/flat/167336599095.2667301.15497893107226841625%40wrigleys.postgresql.org The report is about single-byte headers being used for varlena values with PLAIN storage. Here is the reproducible report: CREATE EXTENSION pageinspect; CREATE TABLE test(a VARCHAR(10000) STORAGE PLAIN); INSERT INTO test VALUES (repeat('A',10)); Now peek into the page with pageinspect functions SELECT left(encode(t_data, 'hex'), 40) FROM heap_page_items(get_raw_page('test', 0)); This returned value of "1741414141414141414141". Here the first byte 0x17 = 0001 0111 in binary. Length + 1 is stored in the length bits (1-7). So Len = 0001011-1 = (11-1) [base-10] = 10 [base-10] which exactly matches the expected length. Further the data "41" repeated 10 times also indicates character A (65 or 0x41 in ASCII) repeated 10 times. I researched this and thought it would be a case where we were lacking a check before creating a single-byte header, but I couldn't find anything missing. I think the problem is that the _source_ tupleDesc attstorage attribute is being used to decide if we should use a short header, while it is really the storage type of the destination that we should be checking. Unfortunately, I don't think the destination is accessible at the location were we are deciding about a short header. I am confused how to proceed. I feel we need to fully understand why this happening before we adjust anything. Here is a backtrace --- the short header is being created in fill_val() and the attstorage value there is 'x'/EXTENDED. --------------------------------------------------------------------------- #0 fill_val (att=0x56306f61dae8, bit=0x0, bitmask=0x7ffcfcfc1fb4, dataP=0x7ffcfcfc1f90, infomask=0x56306f61e25c, datum=94766026487048,isnull=false) at heaptuple.c:278 #1 0x000056306e7800eb in heap_fill_tuple (tupleDesc=0x56306f61dad0, values=0x56306f61dc20, isnull=0x56306f61dc28, data=0x56306f61e260"", data_size=11, infomask=0x56306f61e25c, bit=0x0) at heaptuple.c:427 #2 0x000056306e781708 in heap_form_tuple (tupleDescriptor=0x56306f61dad0, values=0x56306f61dc20, isnull=0x56306f61dc28)at heaptuple.c:1181 #3 0x000056306ea13dcb in tts_virtual_copy_heap_tuple (slot=0x56306f61dbd8) at execTuples.c:280 #4 0x000056306ea1346e in ExecCopySlotHeapTuple (slot=0x56306f61dbd8) at ../../../src/include/executor/tuptable.h:463 #5 0x000056306ea14928 in tts_buffer_heap_copyslot (dstslot=0x56306f61e1a8, srcslot=0x56306f61dbd8) at execTuples.c:798 #6 0x000056306ea4342e in ExecCopySlot (dstslot=0x56306f61e1a8, srcslot=0x56306f61dbd8) at ../../../src/include/executor/tuptable.h:487 #7 0x000056306ea44785 in ExecGetInsertNewTuple (relinfo=0x56306f61d678, planSlot=0x56306f61dbd8) at nodeModifyTable.c:685 #8 0x000056306ea49123 in ExecModifyTable (pstate=0x56306f61d470) at nodeModifyTable.c:3789 #9 0x000056306ea0ef3c in ExecProcNodeFirst (node=0x56306f61d470) at execProcnode.c:464 #10 0x000056306ea03702 in ExecProcNode (node=0x56306f61d470) at ../../../src/include/executor/executor.h:273 #11 0x000056306ea05fe2 in ExecutePlan (estate=0x56306f61d228, planstate=0x56306f61d470, use_parallel_mode=false, operation=CMD_INSERT,sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x56306f588170, execute_once=true)at execMain.c:1670 #12 0x000056306ea03c63 in standard_ExecutorRun (queryDesc=0x56306f527888, direction=ForwardScanDirection, count=0, execute_once=true)at execMain.c:365 #13 0x000056306ea03aee in ExecutorRun (queryDesc=0x56306f527888, direction=ForwardScanDirection, count=0, execute_once=true)at execMain.c:309 #14 0x000056306ec70cf5 in ProcessQuery (plan=0x56306f588020, sourceText=0x56306f552a98 "INSERT INTO test VALUES (repeat('A',10));",params=0x0, queryEnv=0x0, dest=0x56306f588170, qc=0x7ffcfcfc25c0) at pquery.c:160 #15 0x000056306ec72514 in PortalRunMulti (portal=0x56306f5cccf8, isTopLevel=true, setHoldSnapshot=false, dest=0x56306f588170,altdest=0x56306f588170, qc=0x7ffcfcfc25c0) at pquery.c:1277 #16 0x000056306ec71b3a in PortalRun (portal=0x56306f5cccf8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x56306f588170,altdest=0x56306f588170, qc=0x7ffcfcfc25c0) at pquery.c:791 #17 0x000056306ec6b465 in exec_simple_query (query_string=0x56306f552a98 "INSERT INTO test VALUES (repeat('A',10));") atpostgres.c:1273 #18 0x000056306ec6fdd3 in PostgresMain (dbname=0x56306f58ab88 "test", username=0x56306f50ee68 "postgres") at postgres.c:4657 #19 0x000056306ebb304c in BackendRun (port=0x56306f57dc20) at postmaster.c:4423 #20 0x000056306ebb26fc in BackendStartup (port=0x56306f57dc20) at postmaster.c:4108 #21 0x000056306ebaf134 in ServerLoop () at postmaster.c:1767 #22 0x000056306ebaead5 in PostmasterMain (argc=1, argv=0x56306f50ce30) at postmaster.c:1466 #23 0x000056306ea8108c in main (argc=1, argv=0x56306f50ce30) at main.c:198 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: The documentation for storage type 'plain' actually allows single byte header
От
Bruce Momjian
Дата:
On Fri, Oct 20, 2023 at 09:48:05PM -0400, Bruce Momjian wrote: > Here is the original thread from pgsql-docs: > > https://www.postgresql.org/message-id/flat/167336599095.2667301.15497893107226841625%40wrigleys.postgresql.org > > The report is about single-byte headers being used for varlena values > with PLAIN storage. > > Here is the reproducible report: > > CREATE EXTENSION pageinspect; > CREATE TABLE test(a VARCHAR(10000) STORAGE PLAIN); > INSERT INTO test VALUES (repeat('A',10)); > > Now peek into the page with pageinspect functions > > SELECT left(encode(t_data, 'hex'), 40) FROM > heap_page_items(get_raw_page('test', 0)); > > This returned value of "1741414141414141414141". > Here the first byte 0x17 = 0001 0111 in binary. > Length + 1 is stored in the length bits (1-7). So Len = 0001011-1 = (11-1) > [base-10] = 10 [base-10] > which exactly matches the expected length. Further the data "41" repeated 10 > times also indicates character A (65 or 0x41 in ASCII) repeated 10 times. > > I researched this and thought it would be a case where we were lacking a > check before creating a single-byte header, but I couldn't find anything > missing. I think the problem is that the _source_ tupleDesc attstorage > attribute is being used to decide if we should use a short header, while > it is really the storage type of the destination that we should be > checking. Unfortunately, I don't think the destination is accessible at > the location were we are deciding about a short header. > > I am confused how to proceed. I feel we need to fully understand why > this happening before we adjust anything. Here is a backtrace --- the > short header is being created in fill_val() and the attstorage value > there is 'x'/EXTENDED. I did some more research. It turns out that the source slot/planSlot is populating its pg_attribute information via makeTargetEntry() and it has no concept of a storage type. Digging further, I found that we cannot get rid of the the use of att->attstorage != TYPSTORAGE_PLAIN in macros ATT_IS_PACKABLE and VARLENA_ATT_IS_PACKABLE macros in src/backend/access/common/heaptuple.c because there are internal uses of fill_val() that can't handle packed varlena headers. I ended up with a doc patch that adds a C comment about this odd behavior and removes doc text about PLAIN storage not using packed headers. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: The documentation for storage type 'plain' actually allows single byte header
От
Bruce Momjian
Дата:
On Sat, Oct 21, 2023 at 09:56:13PM -0400, Bruce Momjian wrote: > I did some more research. It turns out that the source slot/planSlot is > populating its pg_attribute information via makeTargetEntry() and it > has no concept of a storage type. > > Digging further, I found that we cannot get rid of the the use of > att->attstorage != TYPSTORAGE_PLAIN in macros ATT_IS_PACKABLE and > VARLENA_ATT_IS_PACKABLE macros in src/backend/access/common/heaptuple.c > because there are internal uses of fill_val() that can't handle packed > varlena headers. > > I ended up with a doc patch that adds a C comment about this odd > behavior and removes doc text about PLAIN storage not using packed > headers. Oops, patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Вложения
Re: The documentation for storage type 'plain' actually allows single byte header
От
Bruce Momjian
Дата:
On Sat, Oct 21, 2023 at 09:59:04PM -0400, Bruce Momjian wrote: > On Sat, Oct 21, 2023 at 09:56:13PM -0400, Bruce Momjian wrote: > > I did some more research. It turns out that the source slot/planSlot is > > populating its pg_attribute information via makeTargetEntry() and it > > has no concept of a storage type. > > > > Digging further, I found that we cannot get rid of the the use of > > att->attstorage != TYPSTORAGE_PLAIN in macros ATT_IS_PACKABLE and > > VARLENA_ATT_IS_PACKABLE macros in src/backend/access/common/heaptuple.c > > because there are internal uses of fill_val() that can't handle packed > > varlena headers. > > > > I ended up with a doc patch that adds a C comment about this odd > > behavior and removes doc text about PLAIN storage not using packed > > headers. > > Oops, patch attached. Patch applied to all supported versions. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.