Обсуждение: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE
Hi.
With the commit mentioned in the $subject, I am seeing the
change in behaviour with the varlena header size. Please
consider the below test:
postgres@83795=#CREATE TABLE test_storage_char(d char(20));
CREATE TABLE
postgres@83795=#INSERT INTO test_storage_char SELECT REPEAT('e', 20);
INSERT 0 1
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
eeeeeeeeeeeeeeeeeeee | 21
(1 row)
postgres@83795=#ALTER TABLE test_storage_char ALTER COLUMN d SET STORAGE PLAIN;
ALTER TABLE
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
eeeeeeeeeeeeeeeeeeee | 21
(1 row)
postgres@83795=#UPDATE test_storage_char SET d='ab' WHERE d LIKE '%e%';
UPDATE 1
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
ab | 24
(1 row)
After changing the STORAGE for the column and UPDATE, pg_column_size
CREATE TABLE
postgres@83795=#INSERT INTO test_storage_char SELECT REPEAT('e', 20);
INSERT 0 1
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
eeeeeeeeeeeeeeeeeeee | 21
(1 row)
postgres@83795=#ALTER TABLE test_storage_char ALTER COLUMN d SET STORAGE PLAIN;
ALTER TABLE
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
eeeeeeeeeeeeeeeeeeee | 21
(1 row)
postgres@83795=#UPDATE test_storage_char SET d='ab' WHERE d LIKE '%e%';
UPDATE 1
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
ab | 24
(1 row)
After changing the STORAGE for the column and UPDATE, pg_column_size
now returns the size as 24.
BEFORE Commit 86dc90056:
postgres@129158=#SELECT d, pg_column_size(d) FROM test_storage_char;
d | pg_column_size
----------------------+----------------
ab | 21
(1 row)
d | pg_column_size
----------------------+----------------
ab | 21
(1 row)
I am not sure whether this change is expected? Or missing something
in the toasting the attribute?
Thanks,
Rushabh Lathia
Rushabh Lathia <rushabh.lathia@gmail.com> writes: > With the commit mentioned in the $subject, I am seeing the > change in behaviour with the varlena header size. Interesting. AFAICS, the new behavior is correct and the old is wrong. SET STORAGE PLAIN is supposed to disable use of TOAST features, including short varlena headers. So now that's being honored by the UPDATE, but before it was not. I have no idea exactly why that changed though --- I'd expect that to be implemented in low-level tuple-construction logic that the planner rewrite wouldn't have changed. regards, tom lane
On Mon, Apr 19, 2021 at 10:00 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > > Hi. > > With the commit mentioned in the $subject, I am seeing the > change in behaviour with the varlena header size. Please > consider the below test: > > postgres@83795=#CREATE TABLE test_storage_char(d char(20)); > CREATE TABLE > postgres@83795=#INSERT INTO test_storage_char SELECT REPEAT('e', 20); > INSERT 0 1 > postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char; > d | pg_column_size > ----------------------+---------------- > eeeeeeeeeeeeeeeeeeee | 21 > (1 row) > > postgres@83795=#ALTER TABLE test_storage_char ALTER COLUMN d SET STORAGE PLAIN; > ALTER TABLE > postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char; > d | pg_column_size > ----------------------+---------------- > eeeeeeeeeeeeeeeeeeee | 21 > (1 row) > > postgres@83795=#UPDATE test_storage_char SET d='ab' WHERE d LIKE '%e%'; > UPDATE 1 > postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char; > d | pg_column_size > ----------------------+---------------- > ab | 24 > (1 row) > > After changing the STORAGE for the column and UPDATE, pg_column_size > now returns the size as 24. > > BEFORE Commit 86dc90056: > > postgres@129158=#SELECT d, pg_column_size(d) FROM test_storage_char; > d | pg_column_size > ----------------------+---------------- > ab | 21 > (1 row) > > I am not sure whether this change is expected? Or missing something > in the toasting the attribute? I haven't studied this closely enough yet to say if the new behavior is correct or not, but can say why this has changed. Before 86dc90056, the new tuple to pass to ExecUpdate would be computed with a TupleDesc that uses pg_type.typstorage for the column instead of the column's actual pg_attribute.attstorage. That's because the new tuple would be computed from the subplan's targetlist and the TupleDesc for that targetlist is computed with no regard to where the computed tuple will go; IOW ignoring the target table's actual TupleDesc. After 86dc90056, the new tuple is computed with the target table's actual TupleDesc, so the new value respects the column's attstorage, which makes me think the new behavior is not wrong. Will look more closely tomorrow. -- Amit Langote EDB: http://www.enterprisedb.com
I wrote: > Rushabh Lathia <rushabh.lathia@gmail.com> writes: >> With the commit mentioned in the $subject, I am seeing the >> change in behaviour with the varlena header size. > Interesting. AFAICS, the new behavior is correct and the old is wrong. > SET STORAGE PLAIN is supposed to disable use of TOAST features, including > short varlena headers. So now that's being honored by the UPDATE, but > before it was not. I have no idea exactly why that changed though --- > I'd expect that to be implemented in low-level tuple-construction logic > that the planner rewrite wouldn't have changed. Oh, after a bit of tracing I see it. In v13, the new value gets short-header-ified when a tuple is constructed here: /* * Ensure input tuple is the right format for the target relation. */ if (node->mt_scans[node->mt_whichplan]->tts_ops != planSlot->tts_ops) { ExecCopySlot(node->mt_scans[node->mt_whichplan], planSlot); planSlot = node->mt_scans[node->mt_whichplan]; } where the target slot has been made like this: mtstate->mt_scans[i] = ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]), table_slot_callbacks(resultRelInfo->ri_RelationDesc)); So that's using a tupdesc that's been constructed according to the default properties of the column datatypes, in particular attstorage will be 'x' for the 'd' column. Later we transpose the data into a slot that actually has the target table's rowtype, but the damage is already done; the value isn't un-short-headerized at that point. (I wonder if that should be considered a bug?) 86dc90056 got rid of the intermediate mt_scans slots, so the 'ab' value only gets put into a slot that has the table's real descriptor, and it never loses its original 4-byte header. I observe that the INSERT code path still does the wrong thing: regression=# insert into test_storage_char values('foo'); INSERT 0 1 regression=# SELECT d, pg_column_size(d) FROM test_storage_char; d | pg_column_size ----------------------+---------------- ab | 24 foo | 21 (2 rows) Maybe we oughta try to fix that sometime. It doesn't seem terribly high-priority though. regards, tom lane
On Mon, Apr 19, 2021 at 10:34 AM Amit Langote <amitlangote09@gmail.com> wrote: > After 86dc90056, the new tuple is computed with the target table's > actual TupleDesc, so the new value respects the column's attstorage, > which makes me think the new behavior is not wrong. I would not have expected SET STORAGE PLAIN to disable the use of short varlena headers. *Maybe* at some point in time there was enough code that couldn't operate directly on short varlenas to justify a theory that in some circumstances eschewing short headers would save on CPU cycles. But surely in 2021 this is not true and this behavior is not plausibly desired by anyone. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Apr 19, 2021 at 10:34 AM Amit Langote <amitlangote09@gmail.com> wrote: >> After 86dc90056, the new tuple is computed with the target table's >> actual TupleDesc, so the new value respects the column's attstorage, >> which makes me think the new behavior is not wrong. > I would not have expected SET STORAGE PLAIN to disable the use of > short varlena headers. Au contraire. The reason that mode exists at all (for varlena types) is to support data types that haven't been updated for TOAST. Perhaps that's now the empty set, but it's not really our job to take away the capability. If you really want MAIN you should say that, not quibble that PLAIN doesn't mean what it's always been understood to mean. I don't think that this behavior quite breaks such data types, because if you actually have a type like that then you've set typstorage = PLAIN and we will not allow there to be any tupdescs in the system that differ from that. The issue is just that if you set a particular column of an otherwise-toastable type to be PLAIN then we're not terribly rigorous about enforcing that, because values that have been toasted can get into the column without being fully detoasted. (I've not checked, but I suspect that you could also get a compressed or toasted-out-of-line value into such a column if you tried hard enough.) Related to this is that when you update some other column(s) of the table, we don't try to force detoasting of existing values in a column recently set to PLAIN. Personally I think that's fine, so it means that the lack of rigor is inherent. regards, tom lane
On Mon, Apr 19, 2021 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Au contraire. The reason that mode exists at all (for varlena types) > is to support data types that haven't been updated for TOAST. Perhaps > that's now the empty set, but it's not really our job to take away the > capability. If you really want MAIN you should say that, not quibble > that PLAIN doesn't mean what it's always been understood to mean. This kind of begs the question of whether you have the right idea about what PLAIN has always been understood to mean, and whether everyone understands it the same way. I formed my understanding of what PLAIN is understood to mean by reading the ALTER TABLE .. SET STORAGE documentation, and there's no real hint in there that this is some kind of backward-compatibility only feature. Rather, I read that paragraph to suggest that you can choose between the four options as a way of getting best performance. Both external storage and compression are trade-offs: they make the tuples smaller, which can be good for performance, but they also make the toasted columns more expensive to access, which can be bad for performance. It seems completely reasonable to suppose that some workloads may benefit from a non-default TOAST strategy; e.g. if you often access only a few columns but scan a lot of rows, toasting wins; if you typically access every column but only a few rows via an index scan, not toasting wins. And if that is your idea about what the feature does - an idea that seems perfectly defensible given what the documentation says about this - then I think it's going to be surprising to find that it also inhibits 1-byte headers from being used. But, IDK, maybe nobody will care (or maybe I'm the only one who will be surprised). Perhaps this whole area needs a broader rethink at some point. I'm inclined to think that compatibility with varlena data types that haven't been updated since PostgreSQL 8.3 came out is almost a non-goal and maybe we ought to just kick such data types and the associated code paths to the curb. It's unlikely that they get much testing. On the other hand, perhaps we'd like to have more control over the decision to compress or store externally than we have currently. I think if I were designing this from scratch, I'd want one switch for whether it's OK to compress, with values meaning "yes," "no," and "only if stored externally," a second switch for the *length* at which external storage should be used (so that I can push out rarely-used columns at lower size thresholds and commonly-used ones at higher thresholds), and a third for what should happen if we do the stuff allowed by the first two switches and the tuple still doesn't fit, with value meaning "fail" and "externalize anyway". -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Apr 19, 2021 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Au contraire. The reason that mode exists at all (for varlena types) >> is to support data types that haven't been updated for TOAST. > This kind of begs the question of whether you have the right idea > about what PLAIN has always been understood to mean, and whether > everyone understands it the same way. I formed my understanding of > what PLAIN is understood to mean by reading the ALTER TABLE .. SET > STORAGE documentation, and there's no real hint in there that this is > some kind of backward-compatibility only feature. That doco is explaining the users-eye view of it. Places addressed to datatype developers, such as the CREATE TYPE reference page, see it a bit differently. CREATE TYPE for instance points out that All storage values other than plain imply that the functions of the data type can handle values that have been toasted, as described in ... > I think if I were designing this from scratch, I'd want one > switch for whether it's OK to compress, with values meaning "yes," > "no," and "only if stored externally," a second switch for the > *length* at which external storage should be used (so that I can push > out rarely-used columns at lower size thresholds and commonly-used > ones at higher thresholds), and a third for what should happen if we > do the stuff allowed by the first two switches and the tuple still > doesn't fit, with value meaning "fail" and "externalize anyway". Yeah, I don't think the existing options for attstorage have much to recommend them except backwards compatibility. But if we do redesign them, I'd still say there should be a way for a data type to say that it doesn't support these weird header hacks that we've invented. The notion that short header doesn't cost anything seems extremely Intel-centric to me. regards, tom lane
On Mon, Apr 19, 2021 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > That doco is explaining the users-eye view of it. Places addressed > to datatype developers, such as the CREATE TYPE reference page, see > it a bit differently. CREATE TYPE for instance points out that > > All storage values other than plain imply that the functions of the > data type can handle values that have been toasted, as described in ... Interesting. It feels to me like SET STORAGE PLAIN feels like it is really trying to be two different things. Either you want to inhibit compression and external storage for performance reasons, or your data type can't support either one. Maybe we should separate those concepts, since there's no mode right now that says "don't ever compress, and externalize only if there's absolutely no other way," and there's no way to disable compression and externalization without also killing off short headers. :-( > The notion that short header doesn't cost anything seems extremely Intel-centric to me. I don't think so. It's true that Intel is very forgiving about unaligned accesses compared to some other architectures, but I think if you have a terabyte of data, you want it to fit into as few disk pages as possible pretty much no matter what architecture you're using. The dominant costs are going to be the I/O costs, not the CPU costs of dealing with unaligned bytes. In fact, even if you have a gigabyte of data, I bet it's *still* better to use a more compact on-disk representation. Now, the dominant cost is going to be pumping the data through the L3 CPU cache, which is still - I think - going to be quite a lot more important than the CPU costs of dealing with unaligned bytes. The CPU bus is an I/O bottleneck not unlike the disk itself, just at a higher rate of speed which is still way slower than the CPU speed. Now if you have a megabyte of data, or better yet a kilobyte of data, then I think optimizing for CPU efficiency may well be the right thing to do. I don't know how much 4-byte varlena headers really save there, but if I were designing a storage representation for very small data sets, I'd definitely be thinking about how I could waste space to shave cycles. -- Robert Haas EDB: http://www.enterprisedb.com