Обсуждение: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

Поиск
Список
Период
Сортировка

Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Rushabh Lathia
Дата:
Hi.

With the commit mentioned in the $subject, I am seeing the
change in behaviour with the varlena header size.  Please
consider the below test:

postgres@83795=#CREATE TABLE test_storage_char(d char(20));
CREATE TABLE
postgres@83795=#INSERT INTO test_storage_char SELECT REPEAT('e', 20);
INSERT 0 1
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
          d           | pg_column_size
----------------------+----------------
 eeeeeeeeeeeeeeeeeeee |             21
(1 row)

postgres@83795=#ALTER TABLE test_storage_char ALTER COLUMN  d SET STORAGE PLAIN;
ALTER TABLE
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
          d           | pg_column_size
----------------------+----------------
 eeeeeeeeeeeeeeeeeeee |             21
(1 row)

postgres@83795=#UPDATE test_storage_char SET d='ab' WHERE d LIKE '%e%';
UPDATE 1
postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
          d           | pg_column_size
----------------------+----------------
 ab                   |             24
(1 row)

After changing the STORAGE for the column and UPDATE, pg_column_size 
now returns the size as 24. 

BEFORE Commit 86dc90056:

postgres@129158=#SELECT d, pg_column_size(d) FROM test_storage_char;
          d           | pg_column_size
----------------------+----------------
 ab                   |             21
(1 row)

I am not sure whether this change is expected? Or missing something
in the toasting the attribute?


Thanks,
Rushabh Lathia

Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Tom Lane
Дата:
Rushabh Lathia <rushabh.lathia@gmail.com> writes:
> With the commit mentioned in the $subject, I am seeing the
> change in behaviour with the varlena header size.

Interesting.  AFAICS, the new behavior is correct and the old is wrong.
SET STORAGE PLAIN is supposed to disable use of TOAST features, including
short varlena headers.  So now that's being honored by the UPDATE, but
before it was not.  I have no idea exactly why that changed though ---
I'd expect that to be implemented in low-level tuple-construction logic
that the planner rewrite wouldn't have changed.

            regards, tom lane



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Amit Langote
Дата:
On Mon, Apr 19, 2021 at 10:00 PM Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>
> Hi.
>
> With the commit mentioned in the $subject, I am seeing the
> change in behaviour with the varlena header size.  Please
> consider the below test:
>
> postgres@83795=#CREATE TABLE test_storage_char(d char(20));
> CREATE TABLE
> postgres@83795=#INSERT INTO test_storage_char SELECT REPEAT('e', 20);
> INSERT 0 1
> postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
>           d           | pg_column_size
> ----------------------+----------------
>  eeeeeeeeeeeeeeeeeeee |             21
> (1 row)
>
> postgres@83795=#ALTER TABLE test_storage_char ALTER COLUMN  d SET STORAGE PLAIN;
> ALTER TABLE
> postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
>           d           | pg_column_size
> ----------------------+----------------
>  eeeeeeeeeeeeeeeeeeee |             21
> (1 row)
>
> postgres@83795=#UPDATE test_storage_char SET d='ab' WHERE d LIKE '%e%';
> UPDATE 1
> postgres@83795=#SELECT d, pg_column_size(d) FROM test_storage_char;
>           d           | pg_column_size
> ----------------------+----------------
>  ab                   |             24
> (1 row)
>
> After changing the STORAGE for the column and UPDATE, pg_column_size
> now returns the size as 24.
>
> BEFORE Commit 86dc90056:
>
> postgres@129158=#SELECT d, pg_column_size(d) FROM test_storage_char;
>           d           | pg_column_size
> ----------------------+----------------
>  ab                   |             21
> (1 row)
>
> I am not sure whether this change is expected? Or missing something
> in the toasting the attribute?

I haven't studied this closely enough yet to say if the new behavior
is correct or not, but can say why this has changed.

Before 86dc90056, the new tuple to pass to ExecUpdate would be
computed with a TupleDesc that uses pg_type.typstorage for the column
instead of the column's actual pg_attribute.attstorage.  That's
because the new tuple would be computed from the subplan's targetlist
and the TupleDesc for that targetlist is computed with no regard to
where the computed tuple will go; IOW ignoring the target table's
actual TupleDesc.

After 86dc90056, the new tuple is computed with the target table's
actual TupleDesc, so the new value respects the column's attstorage,
which makes me think the new behavior is not wrong.

Will look more closely tomorrow.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Tom Lane
Дата:
I wrote:
> Rushabh Lathia <rushabh.lathia@gmail.com> writes:
>> With the commit mentioned in the $subject, I am seeing the
>> change in behaviour with the varlena header size.

> Interesting.  AFAICS, the new behavior is correct and the old is wrong.
> SET STORAGE PLAIN is supposed to disable use of TOAST features, including
> short varlena headers.  So now that's being honored by the UPDATE, but
> before it was not.  I have no idea exactly why that changed though ---
> I'd expect that to be implemented in low-level tuple-construction logic
> that the planner rewrite wouldn't have changed.

Oh, after a bit of tracing I see it.  In v13, the new value gets
short-header-ified when a tuple is constructed here:

        /*
         * Ensure input tuple is the right format for the target relation.
         */
        if (node->mt_scans[node->mt_whichplan]->tts_ops != planSlot->tts_ops)
        {
            ExecCopySlot(node->mt_scans[node->mt_whichplan], planSlot);
            planSlot = node->mt_scans[node->mt_whichplan];
        }

where the target slot has been made like this:

        mtstate->mt_scans[i] =
            ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
                                   table_slot_callbacks(resultRelInfo->ri_RelationDesc));

So that's using a tupdesc that's been constructed according to the
default properties of the column datatypes, in particular attstorage
will be 'x' for the 'd' column.  Later we transpose the data into
a slot that actually has the target table's rowtype, but the damage
is already done; the value isn't un-short-headerized at that point.
(I wonder if that should be considered a bug?)

86dc90056 got rid of the intermediate mt_scans slots, so the 'ab'
value only gets put into a slot that has the table's real descriptor,
and it never loses its original 4-byte header.

I observe that the INSERT code path still does the wrong thing:

regression=# insert into test_storage_char values('foo');
INSERT 0 1
regression=# SELECT d, pg_column_size(d) FROM test_storage_char;
          d           | pg_column_size
----------------------+----------------
 ab                   |             24
 foo                  |             21
(2 rows)

Maybe we oughta try to fix that sometime.  It doesn't seem terribly
high-priority though.

            regards, tom lane



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Robert Haas
Дата:
On Mon, Apr 19, 2021 at 10:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
> After 86dc90056, the new tuple is computed with the target table's
> actual TupleDesc, so the new value respects the column's attstorage,
> which makes me think the new behavior is not wrong.

I would not have expected SET STORAGE PLAIN to disable the use of
short varlena headers. *Maybe* at some point in time there was enough
code that couldn't operate directly on short varlenas to justify a
theory that in some circumstances eschewing short headers would save
on CPU cycles. But surely in 2021 this is not true and this behavior
is not plausibly desired by anyone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 19, 2021 at 10:34 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> After 86dc90056, the new tuple is computed with the target table's
>> actual TupleDesc, so the new value respects the column's attstorage,
>> which makes me think the new behavior is not wrong.

> I would not have expected SET STORAGE PLAIN to disable the use of
> short varlena headers.

Au contraire.  The reason that mode exists at all (for varlena types)
is to support data types that haven't been updated for TOAST.  Perhaps
that's now the empty set, but it's not really our job to take away the
capability.  If you really want MAIN you should say that, not quibble
that PLAIN doesn't mean what it's always been understood to mean.

I don't think that this behavior quite breaks such data types, because
if you actually have a type like that then you've set typstorage = PLAIN
and we will not allow there to be any tupdescs in the system that differ
from that.  The issue is just that if you set a particular column of
an otherwise-toastable type to be PLAIN then we're not terribly rigorous
about enforcing that, because values that have been toasted can get into
the column without being fully detoasted.  (I've not checked, but I
suspect that you could also get a compressed or toasted-out-of-line value
into such a column if you tried hard enough.)

Related to this is that when you update some other column(s) of the table,
we don't try to force detoasting of existing values in a column recently
set to PLAIN.  Personally I think that's fine, so it means that the
lack of rigor is inherent.

            regards, tom lane



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Robert Haas
Дата:
On Mon, Apr 19, 2021 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Au contraire.  The reason that mode exists at all (for varlena types)
> is to support data types that haven't been updated for TOAST.  Perhaps
> that's now the empty set, but it's not really our job to take away the
> capability.  If you really want MAIN you should say that, not quibble
> that PLAIN doesn't mean what it's always been understood to mean.

This kind of begs the question of whether you have the right idea
about what PLAIN has always been understood to mean, and whether
everyone understands it the same way. I formed my understanding of
what PLAIN is understood to mean by reading the ALTER TABLE .. SET
STORAGE documentation, and there's no real hint in there that this is
some kind of backward-compatibility only feature. Rather, I read that
paragraph to suggest that you can choose between the four options as a
way of getting best performance. Both external storage and compression
are trade-offs: they make the tuples smaller, which can be good for
performance, but they also make the toasted columns more expensive to
access, which can be bad for performance. It seems completely
reasonable to suppose that some workloads may benefit from a
non-default TOAST strategy; e.g. if you often access only a few
columns but scan a lot of rows, toasting wins; if you typically access
every column but only a few rows via an index scan, not toasting wins.
And if that is your idea about what the feature does - an idea that
seems perfectly defensible given what the documentation says about
this - then I think it's going to be surprising to find that it also
inhibits 1-byte headers from being used. But, IDK, maybe nobody will
care (or maybe I'm the only one who will be surprised).

Perhaps this whole area needs a broader rethink at some point. I'm
inclined to think that compatibility with varlena data types that
haven't been updated since PostgreSQL 8.3 came out is almost a
non-goal and maybe we ought to just kick such data types and the
associated code paths to the curb. It's unlikely that they get much
testing. On the other hand, perhaps we'd like to have more control
over the decision to compress or store externally than we have
currently. I think if I were designing this from scratch, I'd want one
switch for whether it's OK to compress, with values meaning "yes,"
"no," and "only if stored externally," a second switch for the
*length* at which external storage should be used (so that I can push
out rarely-used columns at lower size thresholds and commonly-used
ones at higher thresholds), and a third for what should happen if we
do the stuff allowed by the first two switches and the tuple still
doesn't fit, with value meaning "fail" and "externalize anyway".

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 19, 2021 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Au contraire.  The reason that mode exists at all (for varlena types)
>> is to support data types that haven't been updated for TOAST.

> This kind of begs the question of whether you have the right idea
> about what PLAIN has always been understood to mean, and whether
> everyone understands it the same way. I formed my understanding of
> what PLAIN is understood to mean by reading the ALTER TABLE .. SET
> STORAGE documentation, and there's no real hint in there that this is
> some kind of backward-compatibility only feature.

That doco is explaining the users-eye view of it.  Places addressed
to datatype developers, such as the CREATE TYPE reference page, see
it a bit differently.  CREATE TYPE for instance points out that

    All storage values other than plain imply that the functions of the
    data type can handle values that have been toasted, as described in ...

> I think if I were designing this from scratch, I'd want one
> switch for whether it's OK to compress, with values meaning "yes,"
> "no," and "only if stored externally," a second switch for the
> *length* at which external storage should be used (so that I can push
> out rarely-used columns at lower size thresholds and commonly-used
> ones at higher thresholds), and a third for what should happen if we
> do the stuff allowed by the first two switches and the tuple still
> doesn't fit, with value meaning "fail" and "externalize anyway".

Yeah, I don't think the existing options for attstorage have much
to recommend them except backwards compatibility.  But if we do
redesign them, I'd still say there should be a way for a data
type to say that it doesn't support these weird header hacks that
we've invented.  The notion that short header doesn't cost anything
seems extremely Intel-centric to me.

            regards, tom lane



Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE

От
Robert Haas
Дата:
On Mon, Apr 19, 2021 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That doco is explaining the users-eye view of it.  Places addressed
> to datatype developers, such as the CREATE TYPE reference page, see
> it a bit differently.  CREATE TYPE for instance points out that
>
>     All storage values other than plain imply that the functions of the
>     data type can handle values that have been toasted, as described in ...

Interesting. It feels to me like SET STORAGE PLAIN feels like it is
really trying to be two different things. Either you want to inhibit
compression and external storage for performance reasons, or your data
type can't support either one. Maybe we should separate those
concepts, since there's no mode right now that says "don't ever
compress, and externalize only if there's absolutely no other way,"
and there's no way to disable compression and externalization without
also killing off short headers. :-(

> The notion that short header doesn't cost anything seems extremely Intel-centric to me.

I don't think so. It's true that Intel is very forgiving about
unaligned accesses compared to some other architectures, but I think
if you have a terabyte of data, you want it to fit into as few disk
pages as possible pretty much no matter what architecture you're
using. The dominant costs are going to be the I/O costs, not the CPU
costs of dealing with unaligned bytes. In fact, even if you have a
gigabyte of data, I bet it's *still* better to use a more compact
on-disk representation. Now, the dominant cost is going to be pumping
the data through the L3 CPU cache, which is still - I think - going to
be quite a lot more important than the CPU costs of dealing with
unaligned bytes. The CPU bus is an I/O bottleneck not unlike the disk
itself, just at a higher rate of speed which is still way slower than
the CPU speed. Now if you have a megabyte of data, or better yet a
kilobyte of data, then I think optimizing for CPU efficiency may well
be the right thing to do. I don't know how much 4-byte varlena headers
really save there, but if I were designing a storage representation
for very small data sets, I'd definitely be thinking about how I could
waste space to shave cycles.

-- 
Robert Haas
EDB: http://www.enterprisedb.com