Обсуждение: ALTER TABLE ADD COLUMN fast default

Поиск

Список

Период

Сортировка

ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

06 декабря 2017 г., 17:36:23

Attached is a patch for $subject. It's based on Serge Reilau's patch of
about a year ago, taking into account comments made at the time. with
bitrot removed and other enhancements such as documentation.

Essentially it stores a value in pg_attribute that is used when the
stored tuple is missing the attribute. This works unless the default
expression is volatile, in which case a table rewrite is forced as
happens now.

Comments welcome.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v2.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

06 декабря 2017 г., 18:16:34

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> Attached is a patch for $subject. It's based on Serge Reilau's patch of
> about a year ago, taking into account comments made at the time. with
> bitrot removed and other enhancements such as documentation.

> Essentially it stores a value in pg_attribute that is used when the
> stored tuple is missing the attribute. This works unless the default
> expression is volatile, in which case a table rewrite is forced as
> happens now.

I'm quite disturbed both by the sheer invasiveness of this patch
(eg, changing the API of heap_attisnull()) and by the amount of logic
it adds to fundamental tuple-deconstruction code paths.  I think
we'll need to ask some hard questions about whether the feature gained
is worth the distributed performance loss that will ensue.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

06 декабря 2017 г., 18:47:12


On 12/06/2017 10:16 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> Attached is a patch for $subject. It's based on Serge Reilau's patch of
>> about a year ago, taking into account comments made at the time. with
>> bitrot removed and other enhancements such as documentation.
>> Essentially it stores a value in pg_attribute that is used when the
>> stored tuple is missing the attribute. This works unless the default
>> expression is volatile, in which case a table rewrite is forced as
>> happens now.
> I'm quite disturbed both by the sheer invasiveness of this patch
> (eg, changing the API of heap_attisnull()) and by the amount of logic
> it adds to fundamental tuple-deconstruction code paths.  I think
> we'll need to ask some hard questions about whether the feature gained
> is worth the distributed performance loss that will ensue.
>
>             


Thanks for prompt comments.

I'm sure we can reduce the footprint some.

w.r.t. heap_attisnull, only a handful of call sites actually need the
new functionality, 8 or possibly 10 (I need to check more on those in
relcache.c). So we could instead of changing the function provide a new
one to be used at those sites. I'll work on that. and submit a new patch
fairly shortly.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

06 декабря 2017 г., 19:05:03

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 12/06/2017 10:16 AM, Tom Lane wrote:
>> I'm quite disturbed both by the sheer invasiveness of this patch
>> (eg, changing the API of heap_attisnull()) and by the amount of logic
>> it adds to fundamental tuple-deconstruction code paths.  I think
>> we'll need to ask some hard questions about whether the feature gained
>> is worth the distributed performance loss that will ensue.

> I'm sure we can reduce the footprint some.

> w.r.t. heap_attisnull, only a handful of call sites actually need the
> new functionality, 8 or possibly 10 (I need to check more on those in
> relcache.c). So we could instead of changing the function provide a new
> one to be used at those sites. I'll work on that. and submit a new patch
> fairly shortly.

Meh, I'm not at all sure that's an improvement.  A version of
heap_attisnull that ignores the possibility of a "missing" attribute value
will give the *wrong answer* if the other version should have been used
instead.  Furthermore it'd be an attractive nuisance because people would
fail to notice if an existing call were now unsafe.  If we're to do this
I think we have no choice but to impose that compatibility break on
third-party code.

In general, you need to be thinking about this in terms of making sure
that it's impossible to accidentally not update code that needs to be
updated.  Another example is that I noticed that some places the patch
has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
the former will fail to copy the missing-attribute support data.
I think that's an astonishingly bad idea.  There is basically no situation
in which CreateTupleDescCopy defined in that way will ever be safe to use.
The missing-attribute info is now a fundamental part of a tupdesc and it
has to be copied always, just as much as e.g. atttypid.

I would opine that it's a mistake to design TupleDesc as though the
missing-attribute data had anything to do with default values.  It may
have originated from the same place but it's a distinct thing.  Better
to store it in a separate sub-structure.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

06 декабря 2017 г., 19:26:03


On 12/06/2017 11:05 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 12/06/2017 10:16 AM, Tom Lane wrote:
>>> I'm quite disturbed both by the sheer invasiveness of this patch
>>> (eg, changing the API of heap_attisnull()) and by the amount of logic
>>> it adds to fundamental tuple-deconstruction code paths.  I think
>>> we'll need to ask some hard questions about whether the feature gained
>>> is worth the distributed performance loss that will ensue.
>> I'm sure we can reduce the footprint some.
>> w.r.t. heap_attisnull, only a handful of call sites actually need the
>> new functionality, 8 or possibly 10 (I need to check more on those in
>> relcache.c). So we could instead of changing the function provide a new
>> one to be used at those sites. I'll work on that. and submit a new patch
>> fairly shortly.
> Meh, I'm not at all sure that's an improvement.  A version of
> heap_attisnull that ignores the possibility of a "missing" attribute value
> will give the *wrong answer* if the other version should have been used
> instead.  Furthermore it'd be an attractive nuisance because people would
> fail to notice if an existing call were now unsafe.  If we're to do this
> I think we have no choice but to impose that compatibility break on
> third-party code.


Fair point.

>
> In general, you need to be thinking about this in terms of making sure
> that it's impossible to accidentally not update code that needs to be
> updated.  Another example is that I noticed that some places the patch
> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
> the former will fail to copy the missing-attribute support data.
> I think that's an astonishingly bad idea.  There is basically no situation
> in which CreateTupleDescCopy defined in that way will ever be safe to use.
> The missing-attribute info is now a fundamental part of a tupdesc and it
> has to be copied always, just as much as e.g. atttypid.
>
> I would opine that it's a mistake to design TupleDesc as though the
> missing-attribute data had anything to do with default values.  It may
> have originated from the same place but it's a distinct thing.  Better
> to store it in a separate sub-structure.


OK, will work on all that. Thanks again.

cheers

andrew

            

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

12 декабря 2017 г., 22:13:20

On 12/06/2017 11:26 AM, Andrew Dunstan wrote:
>> In general, you need to be thinking about this in terms of making sure
>> that it's impossible to accidentally not update code that needs to be
>> updated.  Another example is that I noticed that some places the patch
>> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
>> the former will fail to copy the missing-attribute support data.
>> I think that's an astonishingly bad idea.  There is basically no situation
>> in which CreateTupleDescCopy defined in that way will ever be safe to use.
>> The missing-attribute info is now a fundamental part of a tupdesc and it
>> has to be copied always, just as much as e.g. atttypid.
>>
>> I would opine that it's a mistake to design TupleDesc as though the
>> missing-attribute data had anything to do with default values.  It may
>> have originated from the same place but it's a distinct thing.  Better
>> to store it in a separate sub-structure.
>
> OK, will work on all that. Thanks again.
>

Looking closer at this. First, there is exactly one place where the
patch does  s/CreateTupleDescCopy/CreateTupleDescCopyConstr/.

making this like atttypid suggests that it belongs right outside the
TupleConstr structure. But then it seems to me that the change could
well end up being a darn site more invasive. For example, should we be
passing the number of missing values to CreateTemplateTupleDesc and
CreateTupleDesc? I'm happy to do whatever work is required, but I want
to have a firm understanding of the design before I spend lots of time
cutting code. Once I understand how tupdesc.h should look I should be good.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

01 января 2018 г., 02:48:29


On 12/12/2017 02:13 PM, Andrew Dunstan wrote:
>
> On 12/06/2017 11:26 AM, Andrew Dunstan wrote:
>>> In general, you need to be thinking about this in terms of making sure
>>> that it's impossible to accidentally not update code that needs to be
>>> updated.  Another example is that I noticed that some places the patch
>>> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
>>> the former will fail to copy the missing-attribute support data.
>>> I think that's an astonishingly bad idea.  There is basically no situation
>>> in which CreateTupleDescCopy defined in that way will ever be safe to use.
>>> The missing-attribute info is now a fundamental part of a tupdesc and it
>>> has to be copied always, just as much as e.g. atttypid.
>>>
>>> I would opine that it's a mistake to design TupleDesc as though the
>>> missing-attribute data had anything to do with default values.  It may
>>> have originated from the same place but it's a distinct thing.  Better
>>> to store it in a separate sub-structure.
>> OK, will work on all that. Thanks again.
>>
>
>
> Looking closer at this. First, there is exactly one place where the
> patch does  s/CreateTupleDescCopy/CreateTupleDescCopyConstr/.
>
> making this like atttypid suggests that it belongs right outside the
> TupleConstr structure. But then it seems to me that the change could
> well end up being a darn site more invasive. For example, should we be
> passing the number of missing values to CreateTemplateTupleDesc and
> CreateTupleDesc? I'm happy to do whatever work is required, but I want
> to have a firm understanding of the design before I spend lots of time
> cutting code. Once I understand how tupdesc.h should look I should be good.
>


Here is a new version of the patch. It separates the missing values
constructs and logic from the default values constructs and logic. The
missing values now sit alongside the default values in tupleConstr. In
some places the separation makes the logic a good bit cleaner.

Some of the logic is also reworked to remove an assumption that the
missing values structure is populated in attnum order, Also code to
support the missing stuctures is added to equalTupleDescs.

We still have that one extra place where we need to call
CreateTupleDescCopyConstr instead of CreateTupleDescCopy. I haven't seen
an easy way around that.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v4.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

01 января 2018 г., 19:01:32


On 12/31/2017 06:48 PM, Andrew Dunstan wrote:
>
> Here is a new version of the patch. It separates the missing values
> constructs and logic from the default values constructs and logic. The
> missing values now sit alongside the default values in tupleConstr. In
> some places the separation makes the logic a good bit cleaner.
>
> Some of the logic is also reworked to remove an assumption that the
> missing values structure is populated in attnum order, Also code to
> support the missing stuctures is added to equalTupleDescs.
>
> We still have that one extra place where we need to call
> CreateTupleDescCopyConstr instead of CreateTupleDescCopy. I haven't seen
> an easy way around that.
>


New version of the patch that fills in the remaining piece in
equalTupleDescs.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v5.patch

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

16 января 2018 г., 08:13:48

On 2 January 2018 at 05:01, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> New version of the patch that fills in the remaining piece in
> equalTupleDescs.

This no longer applies to current master. Can send an updated patch?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

16 января 2018 г., 16:21:35


On 01/16/2018 12:13 AM, David Rowley wrote:
> On 2 January 2018 at 05:01, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> New version of the patch that fills in the remaining piece in
>> equalTupleDescs.
> This no longer applies to current master. Can send an updated patch?
>


Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
updated version. Thanks for looking.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v6.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Thomas Munro

Дата:

25 января 2018 г., 10:40:17

On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
> updated version. Thanks for looking.

A boring semi-automated update:  this no long compiles, because
8561e4840c8 added a new call to heap_attisnull().  Pretty sure it just
needs NULL in the third argument.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

26 января 2018 г., 03:23:12

On Thu, Jan 25, 2018 at 6:10 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
>> updated version. Thanks for looking.
>
> A boring semi-automated update:  this no long compiles, because
> 8561e4840c8 added a new call to heap_attisnull().  Pretty sure it just
> needs NULL in the third argument.
>

Yeah, thanks. revised patch attached

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v7.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

01 февраля 2018 г., 16:54:08

Hi,

On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote:
> Yeah, thanks. revised patch attached

Given that this patch touches code that's a huge bottleneck in a lot of
cases, I think this needs benchmarks that heavily exercises tuple
deforming.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tomas Vondra

Дата:

03 февраля 2018 г., 03:44:30

On 02/01/2018 02:54 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote:
>> Yeah, thanks. revised patch attached
> 
> Given that this patch touches code that's a huge bottleneck in a lot of
> cases, I think this needs benchmarks that heavily exercises tuple
> deforming.
> 

That's a reasonable request, I guess, and I tried to collect such data
today. I measured two cases:

1) Table with 1000 columns and 64 rows, when there were no ALTER TABLE
adding columns with 'fast' defaults. This is meant to measure the best
case, with minimal impact from the patch. See the create.sql script
attached to this message, and the query.sql which was used for tests
using pgbench like this:

    pgbench -n -f q.sql -T 15 test

after 100 runs, the results (tps) look like this:

                 min      max      median
   --------------------------------------
    master      1827     1873        1860
   patched      2023     2066        2056

That is, the patch apparently improves the performance by about 10%
(according to perf profiles this is due to slot_deform_tuple getting
cheaper).

So this case seems fine.

2) Table with 64 rows and 1000 columns, all added by ALTER TABLE with
fast default without rewrite. See create-alter.sql.

Using the same query.sql as before, this shold significant drop to only
about 40 tps (from ~2000 tps for master). The profiles something like this:

    +   98.87%    98.87%  postgres            [.] slot_getmissingattrs
    +   98.77%     0.00%  postgres            [.] PortalRun
    +   98.77%     0.00%  postgres            [.] ExecAgg
    +   98.74%     0.01%  postgres            [.] ExecInterpExpr

which is kinda understandable, although the 2000 to 40 tps seems like a
pretty significant drop. But then again, this case is constructed like a
fairly extreme corner case.

However, there seems to be some sort of bug, because when I did VACUUM
FULL - ideally this would replace the "missing" default values with
actual values stored in the heap rows, eliminating the performance
impact. But the default values got lost and replaced by NULL values,
which seems like a clear data loss scenario.

I'm not quite sure what's wrong, but try this:

    \i create-alter.sql

    -- this returns 64, which is correct
    SELECT COUNT(*) FROM t;

    -- this actually retuns 64 rows with values "1"
    SELECT c1000 FROM t;

    -- this returns 63, which is incorrect (should be 64)
    SELECT count(c1000) FROM t;

    VACUUM FULL t;

    -- suddenly we only get NULL values for all 64 rows
    SELECT c1000 FROM t;

    -- and now we got 0 (instead of 64)
    SELECT count(c1000) FROM t;

regard

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: ALTER TABLE ADD COLUMN fast default

От

Thomas Munro

Дата:

04 февраля 2018 г., 23:49:43

On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> Yeah, thanks. revised patch attached

FYI the identity regression test started failing recently with this
patch applied (maybe due to commit
533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)

*** /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/identity.out
2018-02-04 00:56:12.146303616 +0000
--- /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/identity.out
2018-02-04 01:05:55.217932032 +0000
***************
*** 257,268 ****
  INSERT INTO itest13 VALUES (1), (2), (3);
  -- add column to populated table
  ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
  SELECT * FROM itest13;
!  a | b | c
! ---+---+---
!  1 | 1 | 1
!  2 | 2 | 2
!  3 | 3 | 3
  (3 rows)

  -- various ALTER COLUMN tests
--- 257,269 ----
  INSERT INTO itest13 VALUES (1), (2), (3);
  -- add column to populated table
  ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
+ ERROR:  column "c" contains null values
  SELECT * FROM itest13;
!  a | b
! ---+---
!  1 | 1
!  2 | 2
!  3 | 3
  (3 rows)

  -- various ALTER COLUMN tests

-- 
Thomas Munro
http://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

05 февраля 2018 г., 00:19:32

On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> Yeah, thanks. revised patch attached
>
> FYI the identity regression test started failing recently with this
> patch applied (maybe due to commit
> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>

Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

09 февраля 2018 г., 08:24:44

On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> Yeah, thanks. revised patch attached
>>
>> FYI the identity regression test started failing recently with this
>> patch applied (maybe due to commit
>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>>
>
> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>


Here's a version that fixes the above issue and also the issue with
VACUUM that Tomas Vondra reported. I'm still working on the issue with
aggregates that Tomas also reported.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v8.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Petr Jelinek

Дата:

11 февраля 2018 г., 07:20:41

On 09/02/18 06:24, Andrew Dunstan wrote:
> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>>> <andrew.dunstan@2ndquadrant.com> wrote:
>>>> Yeah, thanks. revised patch attached
>>>
>>> FYI the identity regression test started failing recently with this
>>> patch applied (maybe due to commit
>>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>>>
>>
>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>>
> 
> 
> Here's a version that fixes the above issue and also the issue with
> VACUUM that Tomas Vondra reported. I'm still working on the issue with
> aggregates that Tomas also reported.
> 

I see the patch does not update the ALTER TABLE docs section which
discusses table rewrites and it seems like it should.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

12 февраля 2018 г., 00:36:44

On Sun, Feb 11, 2018 at 2:50 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

>>
>>
>> Here's a version that fixes the above issue and also the issue with
>> VACUUM that Tomas Vondra reported. I'm still working on the issue with
>> aggregates that Tomas also reported.
>>
>
> I see the patch does not update the ALTER TABLE docs section which
> discusses table rewrites and it seems like it should.
>

Umm it changes the second and third paras of the Notes section, which
refer to rewrites. Not sure what else it should change.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

13 февраля 2018 г., 10:58:13

On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>>> <andrew.dunstan@2ndquadrant.com> wrote:
>>>> Yeah, thanks. revised patch attached
>>>
>>> FYI the identity regression test started failing recently with this
>>> patch applied (maybe due to commit
>>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>>>
>>
>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>>
>
>
> Here's a version that fixes the above issue and also the issue with
> VACUUM that Tomas Vondra reported. I'm still working on the issue with
> aggregates that Tomas also reported.
>

This version fixes that issue. Thanks to Tomas for his help in finding
where the problem was. There are now no outstanding issues that I'm
aware of.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v9.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

17 февраля 2018 г., 00:46:42

On Tue, Feb 13, 2018 at 6:28 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
>>> <thomas.munro@enterprisedb.com> wrote:
>>>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>>>> <andrew.dunstan@2ndquadrant.com> wrote:
>>>>> Yeah, thanks. revised patch attached
>>>>
>>>> FYI the identity regression test started failing recently with this
>>>> patch applied (maybe due to commit
>>>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>>>>
>>>
>>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>>>
>>
>>
>> Here's a version that fixes the above issue and also the issue with
>> VACUUM that Tomas Vondra reported. I'm still working on the issue with
>> aggregates that Tomas also reported.
>>
>
> This version fixes that issue. Thanks to Tomas for his help in finding
> where the problem was. There are now no outstanding issues that I'm
> aware of.
>


The attached version largely fixes with the performance degradation
that Tomas Vondra reported, replacing an O(N^2) piece of code in
slot_getmissingattrs() by one that is O(N).

cheers

andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v10.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Tomas Vondra

Дата:

17 февраля 2018 г., 02:23:40


On 02/16/2018 10:46 PM, Andrew Dunstan wrote:
> On Tue, Feb 13, 2018 at 6:28 PM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
>>> <andrew.dunstan@2ndquadrant.com> wrote:
>>>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
>>>> <thomas.munro@enterprisedb.com> wrote:
>>>>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>>>>> <andrew.dunstan@2ndquadrant.com> wrote:
>>>>>> Yeah, thanks. revised patch attached
>>>>>
>>>>> FYI the identity regression test started failing recently with this
>>>>> patch applied (maybe due to commit
>>>>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>>>>>
>>>>
>>>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>>>>
>>>
>>>
>>> Here's a version that fixes the above issue and also the issue with
>>> VACUUM that Tomas Vondra reported. I'm still working on the issue with
>>> aggregates that Tomas also reported.
>>>
>>
>> This version fixes that issue. Thanks to Tomas for his help in finding
>> where the problem was. There are now no outstanding issues that I'm
>> aware of.
>>
> 
> 
> The attached version largely fixes with the performance degradation
> that Tomas Vondra reported, replacing an O(N^2) piece of code in
> slot_getmissingattrs() by one that is O(N).
> 

Seems fine to me - for comparison, numbers for master and v9/v10 patch
versions look like this:

             create.sql    create-alter.sql
   -----------------------------------------
    master         1120                1320
        v9         1100                  50
       v10         1130                1370

This particular machine has a laptop-class CPU, so the timings are
somewhat noisy - which explains the small tps differences.

What I find interesting is that if I do VACUUM FULL after running the
SQL script, I get this:

             create.sql    create-alter.sql
   -----------------------------------------
    master         1120                1450
        v9         1100                1320
       v10         1100                1450

That is, the throughput for create-alter case jumps up by about 100 tps,
for some reason. Not sure why, though ...

Anyway, I consider the performance to be OK. But perhaps Andres could
comment on this too, as he requested the benchmarks.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

18 февраля 2018 г., 07:22:53

On 17 February 2018 at 10:46, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> The attached version largely fixes with the performance degradation
> that Tomas Vondra reported, replacing an O(N^2) piece of code in
> slot_getmissingattrs() by one that is O(N).

I've looked over this patch and had a play around with it.

Just a couple of things that I noticed while reading:

1. I believe "its" should be "it's" here:

+ /*
+ * Yes, and its of the by-value kind Copy in the Datum
+ */

copy does not need an upper case 'C'. There should also be a comma after "kind"

There's a similar problem with the following:

+ /*
+ * Yes, and its fixed length Copy it out and have teh Datum
+ * point to it.
+ */

there should be a comma after "length" too.

Also, "teh" ...

2. "dfferent" -> "different"

+-- Test a large sample of dfferent datatypes

I'll try to spend some time reviewing the code in detail soon.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

19 февраля 2018 г., 03:48:40

On Sun, Feb 18, 2018 at 2:52 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 17 February 2018 at 10:46, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> The attached version largely fixes with the performance degradation
>> that Tomas Vondra reported, replacing an O(N^2) piece of code in
>> slot_getmissingattrs() by one that is O(N).
>
> I've looked over this patch and had a play around with it.
>
> Just a couple of things that I noticed while reading:
>

[ various typos ]

>
> I'll try to spend some time reviewing the code in detail soon.
>


Thanks. I'll fix the typos in the next version of the patch, hopefully
after your review.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

19 февраля 2018 г., 05:48:27

On 19 February 2018 at 13:48, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> I'll try to spend some time reviewing the code in detail soon.
>>
>
> Thanks. I'll fix the typos in the next version of the patch, hopefully
> after your review.

Here's a more complete review... I reserve the right to be wrong about
a few things and be nitpicky on a few more...

3. All doc changes are indented 1 space too much. Many lines have also
been broken needlessly before 80 chars

4. Surplus "it" at end of line.

+ * Return the missing value of an attribute, or NULL if it
+ * there is no missing value.

5. "sttributes" -> "attributes"

+ /* initialize all the missing sttributes to null */

6. Surplus space before >=

+ if (missattn  >= startAttNum && !attrmiss[i].ammissingNull)

7. Why bother with the else here?

+ if (slot->tts_tupleDescriptor->constr)
+ {
+ missingnum = slot->tts_tupleDescriptor->constr->num_missing -1;
+ attrmiss = slot->tts_tupleDescriptor->constr->missing;
+ }
+ else
+ {
+ missingnum = -1;
+ attrmiss = NULL;
+ }

You may as well just do:

+ if (!slot->tts_tupleDescriptor->constr)
+ return;

+ missingnum = slot->tts_tupleDescriptor->constr->num_missing -1;
+ attrmiss = slot->tts_tupleDescriptor->constr->missing;

8. -1; should be - 1;

+ missingnum = slot->tts_tupleDescriptor->constr->num_missing -1;

9. Surplus braces can be removed:

+ if (attno < attnum)
  {
- slot->tts_values[attno] = (Datum) 0;
- slot->tts_isnull[attno] = true;
+ slot_getmissingattrs(slot, attno);
  }

10. You've made a behavioral change in the following code:

- for (; attno < attnum; attno++)
+ if (attno < attnum)
  {
- slot->tts_values[attno] = (Datum) 0;
- slot->tts_isnull[attno] = true;
+ slot_getmissingattrs(slot, attno);
  }

Previously this just used to initialise up to attnum, but
slot_getmissingattrs does not know about attnum and always seems to
fill the entire slot with all atts in the TupleDesc
This may cause some performance regression when in cases where only
part of the tuple needs to be deformed. I think Tomas' benchmarks used
the final column in the table, so likely wouldn't have notice this,
although it may have if he'd aggregated one of the middle columns.

Once that's fixed, you could ditch the Min() call in the following code:

attno = Min(attno, attnum);

slot_deform_tuple(slot, attno);

/*
* If tuple doesn't have all the atts indicated by tupleDesc, read the
* rest as NULLs or missing values
*/
if (attno < attnum)
{
    slot_getmissingattrs(slot, attno);
}

And change it to something more like:

/*
* If tuple doesn't have all the atts indicated by tupleDesc, deform as
* much as possible, then fill the remaining required atts with nulls or
* missing values.
*/
if (attno < attnum)
{
    slot_deform_tuple(slot, attno);
    slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter)
}
else
    slot_deform_tuple(slot, attnum);

Which should result in a small performance increase due to not having
to compare attno < attnum twice. Although, perhaps the average compile
may optimise this anyway. I've not checked.

11. Is there a reason why the following code in getmissingattr() can't
be made into a bsearch?

+ for (missingnum = tupleDesc->constr->num_missing - 1;
+ missingnum >= 0; missingnum--)
+ {
+ if (attrmiss[missingnum].amnum == attnum)
+ {
+ if (attrmiss[missingnum].ammissingNull)
+ {
+ *isnull = true;
+ return (Datum) 0;
+ }
+ else
+ {
+ *isnull = false;
+ return attrmiss[missingnum].ammissing;
+ }
+ }
+ }
+ Assert(false); /* should not be able to get here */
+ }

As far as I can see, the attrmiss is sorted by amnum. But maybe I just
failed to find a case where it's not.

I'd imagine doing this would further improve Tomas' benchmark score
for the patched version, since he was performing a sum() on the final
column.

Although, If done, I'm actually holding some regard to the fact that
users may one day complain that their query became slower after a
table rewrite :-)

Update: I've stumbled upon the following code:

+ /*
+ * We have a dependency on the attrdef array being filled in in
+ * ascending attnum order. This should be guaranteed by the index
+ * driving the scan. But we want to be double sure
+ */
+ if (!(attp->attnum > attnum))
+ elog(ERROR, "attribute numbers not ascending");

and the code below this seems to also assemble the missing values in
attnum order.

While I'm here, I'd rather see this if test written as: if
(attp->attnum <= attnum)

Since the attnum order in the missing values appears to be well
defined in ascending order, you can also simplify the comparison logic
in equalTupleDescs(). The inner-most nested loop shouldn't be
required.

12. Additional braces not required in:

+ if (tupleDesc &&
+ attnum <= tupleDesc->natts &&
+ TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing)
+ {
+ return false;
+ }
+ else
+ {
+ return true;
+ }

13. Also, just on study of the following condition:

+ if (tupleDesc &&
+ attnum <= tupleDesc->natts &&
+ TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing)

I think that instead of testing "attnum <= tupleDesc->natts", you
should remove that condition an just add;

Assert(!tupleDesc || attnum <= tupleDesc->natts);

to the start of the function. No code should ever pass an attnum
greater than the tupleDesc's natts, so there's probably not much point
in checking that now if we didn't before.

14. In expand_tuple, the following can  be declared in inner scope:

+ HeapTupleHeader targetTHeader;

(It would be nice if expand_tuple was a bit easier on the eye, but I
have no suggestions to improve it right now)

15. Why not: slot->tts_values[missattnum] = (Datum) 0;

+ slot->tts_values[missattnum] = PointerGetDatum(NULL);

There are a few other places you've done this. I'm unsure if there's
any sort of standard to follow.

16. Additional braces not required in:

  {
- *isnull = true;
- return (Datum) 0;
+ return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
  }

Likewise in:

- for (; attnum < tdesc_natts; attnum++)
+ if (attnum < tdesc_natts)
  {
- slot->tts_values[attnum] = (Datum) 0;
- slot->tts_isnull[attnum] = true;
+ slot_getmissingattrs(slot, attnum);
  }
+

and:

- for (; attno < attnum; attno++)
+ if (attno < attnum)
  {
- slot->tts_values[attno] = (Datum) 0;
- slot->tts_isnull[attno] = true;
+ slot_getmissingattrs(slot, attno);
  }

maybe just fix this in all places your patch is touching.

17. Unrelated change:

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index ed90a02303..3b0be39101 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -416,7 +416,6 @@ sub morph_row_for_pgattr
  }
  elsif ($priornotnull)
  {
-
  # attnotnull will automatically be set if the type is
  # fixed-width and prior columns are all NOT NULL ---
  # compare DefineAttr in bootstrap.c. oidvector and

18. Can you explain what you mean by "doomed" here?

+ * If we add a column that is not null and there is no missing value
+ * (i.e. the missing value is NULL) then this ADD COLUMN is doomed.
+ * Unless the table is empty...

19. A few lines below are belong 80 chars:

+ if (HeapTupleHeaderGetNatts(tuple.t_data) <
RelationGetDescr(erm->relation)->natts)
+ {
+ copyTuple = heap_expand_tuple(&tuple,
+   RelationGetDescr(erm->relation));

Likewise in:

+ if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) <
slot->tts_tupleDescriptor->natts)

and:

+ if (att_tup->atthasdef && rd_att->constr && rd_att->constr->num_defval > 0)

20. "missin" -> "missing", "Fill" -> "fill"

+ * If the column has a default or missin value Fill it into the
+ * attrdef array

21. Spaces instead of tabs. Maybe just pgindent the whole patch?

+ AttrMissing *missing;       /* missing attributes values, NULL if none */
  uint16 num_defval;
  uint16 num_check;
+ uint16      num_missing;

22. Missing <tab><space> before " 24"

+#define Anum_pg_attribute_attmissingval 24

Also, missing <tab> before "  15"

+#define Anum_pg_attribute_atthasmissing  15

23. Looks like you'll need to find a new home for the test, since the
comment claims we want no more than 19. You've added the 20th.

 # NB: temp.sql does a reconnect which transiently uses 2 connections,
 # so keep this parallel group to at most 19 tests
 # ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare
without_oid conversion truncate alter_table sequence polymorphism
rowtypes returning largeobject with xml
+test: plancache limit plpgsql copy2 temp domain rangefuncs prepare
without_oid conversion truncate alter_table sequence polymorphism
rowtypes returning largeobject with xml fast_default

I'll await the next version before looking again.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

20 февраля 2018 г., 09:33:07

http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2

On Mon, Feb 19, 2018 at 1:18 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 19 February 2018 at 13:48, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> I'll try to spend some time reviewing the code in detail soon.
>>>
>>
>> Thanks. I'll fix the typos in the next version of the patch, hopefully
>> after your review.
>
> Here's a more complete review... I reserve the right to be wrong about
> a few things and be nitpicky on a few more...

I've dealt with most of these issues. The only change of substance is
the suggested change in slot_getmissingattrs().


> Once that's fixed, you could ditch the Min() call in the following code:
>
> attno = Min(attno, attnum);
>
> slot_deform_tuple(slot, attno);
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, read the
> * rest as NULLs or missing values
> */
> if (attno < attnum)
> {
>     slot_getmissingattrs(slot, attno);
> }
>
> And change it to something more like:
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, deform as
> * much as possible, then fill the remaining required atts with nulls or
> * missing values.
> */
> if (attno < attnum)
> {
>     slot_deform_tuple(slot, attno);
>     slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter)
> }
> else
>     slot_deform_tuple(slot, attnum);
>
> Which should result in a small performance increase due to not having
> to compare attno < attnum twice. Although, perhaps the average compile
> may optimise this anyway. I've not checked.
>


Maybe it would make a difference, but it seems unlikely to be significant.


> 11. Is there a reason why the following code in getmissingattr() can't
> be made into a bsearch?
>
[...]
> As far as I can see, the attrmiss is sorted by amnum. But maybe I just
> failed to find a case where it's not.
>
> I'd imagine doing this would further improve Tomas' benchmark score
> for the patched version, since he was performing a sum() on the final
> column.
>
> Although, If done, I'm actually holding some regard to the fact that
> users may one day complain that their query became slower after a
> table rewrite :-)
>
> Update: I've stumbled upon the following code:
>
> + /*
> + * We have a dependency on the attrdef array being filled in in
> + * ascending attnum order. This should be guaranteed by the index
> + * driving the scan. But we want to be double sure
> + */
> + if (!(attp->attnum > attnum))
> + elog(ERROR, "attribute numbers not ascending");
>
> and the code below this seems to also assemble the missing values in
> attnum order.
>
> While I'm here, I'd rather see this if test written as: if
> (attp->attnum <= attnum)
>
> Since the attnum order in the missing values appears to be well
> defined in ascending order, you can also simplify the comparison logic
> in equalTupleDescs(). The inner-most nested loop shouldn't be
> required.
>


Well, this comment in the existing code in equalTupleDescs() suggests
that in fact we can't rely on the attrdef items being in attnum order:

            /*
             * We can't assume that the items are always read from the system
             * catalogs in the same order; so use the adnum field to identify
             * the matching item to compare.
             */

And in any case the code out of date since the code no longer ties
missing values to default values. So I've removed the comment and the
error check.

There are probably a few optimisations we can make if we can assume
that the missing values are in the Tuple Descriptor are in attnum
order. But it's unclear to me why they should be and the attrdef
entries not.



>
> 15. Why not: slot->tts_values[missattnum] = (Datum) 0;
>
> + slot->tts_values[missattnum] = PointerGetDatum(NULL);
>
> There are a few other places you've done this. I'm unsure if there's
> any sort of standard to follow.
>

This is a pretty widely used idiom in the source.


>
> I'll await the next version before looking again.
>


Thanks

attached.

cheers

andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v11.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

20 февраля 2018 г., 09:42:58

Hi,

On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote:
> Anyway, I consider the performance to be OK. But perhaps Andres could
> comment on this too, as he requested the benchmarks.

My performance concerns were less about CREATE TABLE related things than
about analytics workloads or such, where deforming is the primary
bottleneck.  I think it should be ok, but doing a before/after tpc-h of
scale 5-10 or so wouldn't be a bad thing to verify.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

20 февраля 2018 г., 09:59:04

On Tue, Feb 20, 2018 at 5:03 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2


Wow, my c&p was working overtime. Good thing this won't do anyone any good.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

20 февраля 2018 г., 13:10:15

Thanks for making those updates
.
On 20 February 2018 at 19:33, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Mon, Feb 19, 2018 at 1:18 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> Since the attnum order in the missing values appears to be well
>> defined in ascending order, you can also simplify the comparison logic
>> in equalTupleDescs(). The inner-most nested loop shouldn't be
>> required.
>
> Well, this comment in the existing code in equalTupleDescs() suggests
> that in fact we can't rely on the attrdef items being in attnum order:
>
>             /*
>              * We can't assume that the items are always read from the system
>              * catalogs in the same order; so use the adnum field to identify
>              * the matching item to compare.
>              */

On closer look, perhaps the reason that comment claims the order is
not well defined is that the use of the index depends on the value of
criticalRelcachesBuilt in the following:

pg_attribute_scan = systable_beginscan(pg_attribute_desc,
   AttributeRelidNumIndexId,
   criticalRelcachesBuilt,
   NULL,
   2, skey);

Otherwise, I see no reason for them to be out of order, as if I grep
for instances of "->missing = " I only see the place where the missing
attr details are set in RelationBuildTupleDesc() and one other place
in CreateTupleDescCopyConstr(), where it's simply just copied via
memcpy().

Nevertheless, it would be interesting to see how much a bsearch in
getmissingattr would help Tomas' case. Though, perhaps you're happy
enough with the performance already.

I'll make another pass over the updated code soon to see if I can see
anything else.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

20 февраля 2018 г., 14:38:27

On 20 February 2018 at 23:10, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Nevertheless, it would be interesting to see how much a bsearch in
> getmissingattr would help Tomas' case. Though, perhaps you're happy
> enough with the performance already.

I thought I'd give this a quick test using the attached (incomplete) patch.

My findings made me realise that Tomas' case actually exploited a best
case for the patch.

Tomas' query.sql is performing a sum(c1000), which is the final column
in the table. No doubt he's done this since it's the typical thing to
do to get a worst case for tuple deforming, but he might not have
realised it was the best case for your patch due to the way you're
performing the for loop in getmissingattr starting with the final
missing value first.

I noticed this when I tested the mockup bsearch code. I was surprised
to see that it actually slowed performance a little with the
sum(c1000) case.

The entire results for those using:

pgbench -n -T 60 -j 1 -c 1 -f query.sql postgres

is:

Using: select sum(c1000) from t;

v11 + bsearch + create.sql: tps = 1479.267518
v11 + bsearch + create-alter.sql: tps = 1366.885968
v11 + create.sql: tps = 1544.378091
v11 + create-alter.sql: tps = 1416.608320

(both the create.sql results should be about the same here since
they're not really exercising any new code.)

Notice that the bsearch version is slightly slower, 1366 tps vs 1416.

If I use sum(c10) instead, I get.

Using: select sum(c10) from t;

v11 + bsearch + create-alter.sql: tps = 1445.940061
v11 + bsearch + create.sql: tps = 3320.102543
v11 + create.sql: tps = 3330.131437
v11 + create-alter.sql: tps = 1398.635251

The bsearch here does help recover some performance, but it's a small
improvement on what is quite a big drop from the create.sql case.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

bsearch_getmissingattr.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Petr Jelinek

Дата:

20 февраля 2018 г., 16:03:58

On 20/02/18 07:42, Andres Freund wrote:
> Hi,
> 
> On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote:
>> Anyway, I consider the performance to be OK. But perhaps Andres could
>> comment on this too, as he requested the benchmarks.
> 
> My performance concerns were less about CREATE TABLE related things than
> about analytics workloads or such, where deforming is the primary
> bottleneck.  I think it should be ok, but doing a before/after tpc-h of
> scale 5-10 or so wouldn't be a bad thing to verify.
> 

The test Tomas is doing is analytical query, it's running sum on the new
fast default column.

He uses create and create-alter names as comparison between when the
table was created with the columns and when the columns were added using
fast default.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

20 февраля 2018 г., 20:43:45


On February 20, 2018 5:03:58 AM PST, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>On 20/02/18 07:42, Andres Freund wrote:
>> Hi,
>>
>> On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote:
>>> Anyway, I consider the performance to be OK. But perhaps Andres
>could
>>> comment on this too, as he requested the benchmarks.
>>
>> My performance concerns were less about CREATE TABLE related things
>than
>> about analytics workloads or such, where deforming is the primary
>> bottleneck.  I think it should be ok, but doing a before/after tpc-h
>of
>> scale 5-10 or so wouldn't be a bad thing to verify.
>>
>
>The test Tomas is doing is analytical query, it's running sum on the
>new
>fast default column.
>
>He uses create and create-alter names as comparison between when the
>table was created with the columns and when the columns were added
>using
>fast default.

It's still a fairly simplistic test case. Running some queries with reasonably well known characteristics seems like a
goodidea regardless.  It's not like a scale 5 run takes that long. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

20 февраля 2018 г., 21:07:30

... btw, I've not read this patch in any detail, but the recent thread
about toast tables for system catalogs prompts me to wonder what happens
if a "fast default" value is large enough to require out-of-line toasting.

I can easily think of problems that will ensue if we try to support that
case, because right now the toast mechanisms assume that OOL toasted
values can only be referenced from the associated table.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

20 февраля 2018 г., 21:20:23

Hi,

On 2018-02-20 13:07:30 -0500, Tom Lane wrote:
> ... btw, I've not read this patch in any detail, but the recent thread
> about toast tables for system catalogs prompts me to wonder what happens
> if a "fast default" value is large enough to require out-of-line toasting.

Hm, interesting.

> I can easily think of problems that will ensue if we try to support that
> case, because right now the toast mechanisms assume that OOL toasted
> values can only be referenced from the associated table.

What problem are you seeing with treating the toasted value to be from
pg_attribute? I'm only drinking my first coffee just now, so I might be
missing something...

Now we certainly would need to make sure that the corresponding
pg_attribute row containing the default value doesn't go away too early,
but that shouldn't be much of a problem given that we never remove
them. I wondered for a second if there's problematic cases where the
default value is referenced by an index, and then the default-adding
transaction rolls back. But I can't construct anything realistically
problematic.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

20 февраля 2018 г., 21:50:54

Andres Freund <andres@anarazel.de> writes:
> On 2018-02-20 13:07:30 -0500, Tom Lane wrote:
>> I can easily think of problems that will ensue if we try to support that
>> case, because right now the toast mechanisms assume that OOL toasted
>> values can only be referenced from the associated table.

> What problem are you seeing with treating the toasted value to be from
> pg_attribute? I'm only drinking my first coffee just now, so I might be
> missing something...

Locking/vacuuming is exactly what I'm worried about.  Right now, a
transaction cannot reference a toast value without holding at least
AccessShare on the parent table.  That would not be true of OOL fast
defaults that are living in pg_attribute's toast table (if it had one).
If you think this isn't potentially problematic, see the problems that
forced us into hacks like 08e261cbc.

I guess the fix equivalent to 08e261cbc would be to require any toasted
fast-default value to be detoasted into line whenever it's copied into
a tupledesc, rather than possibly being fetched later.

> Now we certainly would need to make sure that the corresponding
> pg_attribute row containing the default value doesn't go away too early,
> but that shouldn't be much of a problem given that we never remove
> them. I wondered for a second if there's problematic cases where the
> default value is referenced by an index, and then the default-adding
> transaction rolls back. But I can't construct anything realistically
> problematic.

Oooh ... but no, because we don't allow toasted values to be copied into
indexes, for (I think) exactly this reason.  See index_form_tuple.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

20 февраля 2018 г., 22:18:30

Hi,

On 2018-02-20 13:50:54 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-02-20 13:07:30 -0500, Tom Lane wrote:
> >> I can easily think of problems that will ensue if we try to support that
> >> case, because right now the toast mechanisms assume that OOL toasted
> >> values can only be referenced from the associated table.
> 
> > What problem are you seeing with treating the toasted value to be from
> > pg_attribute? I'm only drinking my first coffee just now, so I might be
> > missing something...
> 
> Locking/vacuuming is exactly what I'm worried about.  Right now, a
> transaction cannot reference a toast value without holding at least
> AccessShare on the parent table.

Is that actually reliably true? Doesn't e.g. use a variable from a
rolled-back subtransaction in plpgsql circumvent this already?  There
are a few bugs around this though, so that's not really a convincing
argument of there not being any danger :/

> That would not be true of OOL fast defaults that are living in
> pg_attribute's toast table (if it had one).  If you think this isn't
> potentially problematic, see the problems that forced us into hacks
> like 08e261cbc.

I don't think the equivalent problem quite applies here, the OOL default
should afaics be immutable. But yea, it's a concern.

> I guess the fix equivalent to 08e261cbc would be to require any toasted
> fast-default value to be detoasted into line whenever it's copied into
> a tupledesc, rather than possibly being fetched later.

Yea, thats probably a good idea. Correctness aside, it's probably better
to detoast once rather than do so for every single row for performance
reasons (detoasting the same row over and over in multiple backends
would lead to quite the contention on used buffer headers and relation
locks).

> > Now we certainly would need to make sure that the corresponding
> > pg_attribute row containing the default value doesn't go away too early,
> > but that shouldn't be much of a problem given that we never remove
> > them. I wondered for a second if there's problematic cases where the
> > default value is referenced by an index, and then the default-adding
> > transaction rolls back. But I can't construct anything realistically
> > problematic.
> 
> Oooh ... but no, because we don't allow toasted values to be copied into
> indexes, for (I think) exactly this reason.  See index_form_tuple.

I don't think that'd necessarily provide full protection - what about a
index recheck during a lossy bitmap index scan for example? But given
that any such index creation would also be rolled back I'm not too
concerned.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tomas Vondra

Дата:

20 февраля 2018 г., 22:57:36

On 02/20/2018 06:43 PM, Andres Freund wrote:
> 
> 
> On February 20, 2018 5:03:58 AM PST, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>> On 20/02/18 07:42, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote:
>>>> Anyway, I consider the performance to be OK. But perhaps Andres
>> could
>>>> comment on this too, as he requested the benchmarks.
>>>
>>> My performance concerns were less about CREATE TABLE related things
>> than
>>> about analytics workloads or such, where deforming is the primary
>>> bottleneck.  I think it should be ok, but doing a before/after tpc-h
>> of
>>> scale 5-10 or so wouldn't be a bad thing to verify.
>>>
>>
>> The test Tomas is doing is analytical query, it's running sum on the
>> new
>> fast default column.
>>
>> He uses create and create-alter names as comparison between when
>> the table was created with the columns and when the columns were
>> added using fast default.
> 
> It's still a fairly simplistic test case. Running some queries with 
> reasonably well known characteristics seems like a good idea
> regardless. It's not like a scale 5 run takes that long.
>

I can do that, although I don't really expect any surprises there.

The simple tests I did were supposed to test two corner cases - best
case, when there are no columns with "fast default", and worst case when
all columns (and tuples) have "fast default". Per David's findings, the
second test was not perfectly right, but that's a separate issue.

The question is how should the schema for TPC-H look like. Because if
you just do the usual test without any ALTER TABLE ADD COLUMN, then you
really won't get any difference at all. The fast default stuff will be
completely "inactive".

So, we need to tweak the TPC-H schema somehow ... We need (a) to add
some new columns with default values, (b) tweak some of the queries to
use those new columns. Suggestions?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

20 февраля 2018 г., 23:06:18

Hi,

On 2018-02-20 20:57:36 +0100, Tomas Vondra wrote:
> The question is how should the schema for TPC-H look like. Because if
> you just do the usual test without any ALTER TABLE ADD COLUMN, then you
> really won't get any difference at all. The fast default stuff will be
> completely "inactive".

Well, there'll still be changed behaviour due to the changed sizes of
structures and new out of line function calls. The deforming code is
quite sensitive to such changes.

I think just verifying that there's no meaningful effect with/without
the patch is good enough. As soon as there's actual new columns that
take advantage of the new functionality I think some degradation is fair
game, you got some benefit for it.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

20 февраля 2018 г., 23:14:26

Andres Freund <andres@anarazel.de> writes:
> On 2018-02-20 20:57:36 +0100, Tomas Vondra wrote:
>> The question is how should the schema for TPC-H look like. Because if
>> you just do the usual test without any ALTER TABLE ADD COLUMN, then you
>> really won't get any difference at all. The fast default stuff will be
>> completely "inactive".

> I think just verifying that there's no meaningful effect with/without
> the patch is good enough. As soon as there's actual new columns that
> take advantage of the new functionality I think some degradation is fair
> game, you got some benefit for it.

Yeah, I think what we want to know is how much penalty people are paying
for the feature to exist even though they aren't using it.  That seems
to break down into two cases:

(1) use of a table that's never had any added columns;

(2) use of a table that has had columns added, but they just default
to null the same as before.  Is there more relcache setup time anyway?
Is deforming slower even if there are no fast defaults?

Case (1) could be answered by a straight TPC-H test with/without patch.
Case (2) seems a bit harder, since presumably you want to measure accesses
to tuples that are "short" compared to the current table tupdesc, but for
which the extra columns are still default-null.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Tomas Vondra

Дата:

20 февраля 2018 г., 23:28:40

On 02/20/2018 09:14 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2018-02-20 20:57:36 +0100, Tomas Vondra wrote:
>>> The question is how should the schema for TPC-H look like. Because if
>>> you just do the usual test without any ALTER TABLE ADD COLUMN, then you
>>> really won't get any difference at all. The fast default stuff will be
>>> completely "inactive".
> 
>> I think just verifying that there's no meaningful effect with/without
>> the patch is good enough. As soon as there's actual new columns that
>> take advantage of the new functionality I think some degradation is fair
>> game, you got some benefit for it.
> 
> Yeah, I think what we want to know is how much penalty people are paying
> for the feature to exist even though they aren't using it.  That seems
> to break down into two cases:
> 
> (1) use of a table that's never had any added columns;
> 
> (2) use of a table that has had columns added, but they just default
> to null the same as before.  Is there more relcache setup time anyway?
> Is deforming slower even if there are no fast defaults?
> 
> Case (1) could be answered by a straight TPC-H test with/without patch.

I don't quite understand why would this case need the TPC-H tests, or
why would TPC-H give us more than the very focused tests we've already
done. The first test was testing a fairly short query where any such
additional overhead would be much more obvious, compared to the TPC-H
queries that usually do a lot of other expensive stuff.

I'm fine with doing the tests, of course.

> Case (2) seems a bit harder, since presumably you want to measure
> accesses to tuples that are "short" compared to the current table
> tupdesc, but for which the extra columns are still default-null.
> 

Yeah. I think we can add a column or two to the "fact" tables
(particularly lineitem), and then tweak the queries to also compute
aggregates on this new column.

Unless someone has better ideas, I'll do such tests once the machine
completes the stuff that's running now.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

20 февраля 2018 г., 23:36:16

Hi,

On 2018-02-20 21:28:40 +0100, Tomas Vondra wrote:
> I don't quite understand why would this case need the TPC-H tests, or
> why would TPC-H give us more than the very focused tests we've already
> done.

Because a more complex query shows the cost of changing cache access
costs better than a trivial query. Simplistic queries will often
e.g. not show cost of additional branch predictor usage, because the
branch history is large enough to fit the simple query. But once you go
to a more complex query, and that's not necessarily the case anymore.

> The first test was testing a fairly short query where any such
> additional overhead would be much more obvious, compared to the TPC-H
> queries that usually do a lot of other expensive stuff.

Unfortunately such reasoning IME doesn't work well with cpu-bound stuff.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tomas Vondra

Дата:

21 февраля 2018 г., 00:25:14


On 02/20/2018 09:36 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-02-20 21:28:40 +0100, Tomas Vondra wrote:
>> I don't quite understand why would this case need the TPC-H tests, or
>> why would TPC-H give us more than the very focused tests we've already
>> done.
> 
> Because a more complex query shows the cost of changing cache access
> costs better than a trivial query. Simplistic queries will often
> e.g. not show cost of additional branch predictor usage, because the
> branch history is large enough to fit the simple query. But once you go
> to a more complex query, and that's not necessarily the case anymore.
> 
> 
>> The first test was testing a fairly short query where any such
>> additional overhead would be much more obvious, compared to the TPC-H
>> queries that usually do a lot of other expensive stuff.
> 
> Unfortunately such reasoning IME doesn't work well with cpu-bound stuff.
> 

OK, point taken. I'll do the tests and report the results.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

21 февраля 2018 г., 02:32:55

On 21 February 2018 at 00:38, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Using: select sum(c10) from t;
>
...
> v11 + create.sql: tps = 3330.131437
> v11 + create-alter.sql: tps = 1398.635251

It seems the difference between these two cases is down to
slot_getsomeattrs being asked to deform up to attnum 1000 for the
create-alter.sql case, and only up to attnum 10 for the create.sql
case. Both plans are using physical tlists per EXPLAIN VERBOSE. I've
not managed to narrow down the reason for the difference yet. Looks
like it might take a bit of time with a debugger to find the point
where the code paths of the two cases diverge.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

21 февраля 2018 г., 02:47:59

David Rowley <david.rowley@2ndquadrant.com> writes:
> It seems the difference between these two cases is down to
> slot_getsomeattrs being asked to deform up to attnum 1000 for the
> create-alter.sql case, and only up to attnum 10 for the create.sql
> case. Both plans are using physical tlists per EXPLAIN VERBOSE. I've
> not managed to narrow down the reason for the difference yet.

There's logic in the execExpr compiler that detects the last attnum
we actually reference, if memory serves.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

21 февраля 2018 г., 12:18:06

Hi,

On 2018-02-20 17:03:07 +1030, Andrew Dunstan wrote:
> +     <row>
> +      <entry><structfield>attmissingval</structfield></entry>
> +      <entry><type>bytea</type></entry>
> +      <entry></entry>
> +      <entry>
> +       This column has a binary representation of the value used when the
> +       column is entirely missing from the row, as happens when the column is
> +       added after the row is created. The value is only used when
> +       <structname>atthasmissing</structname> is true.
> +      </entry>
> +     </row>

Hm. Why is this a bytea, and why is that a good idea? In pg_statistics,
which has to deal with something similar-ish, we deal with the ambiguity
of the storage by storing anyarrays, which then can display the contents
properly thanks to the embedded oid.  Now it'd be a bit sad to store a
one element array in the table. But that seems better than storing a
bytea that's typeless and can't be viewed reasonably. The best solution
would be to create a variadic any type that can actually be stored, but
I see why you'd not want to go there necessarily.

But also, a bit higher level, is it a good idea to store this
information directly into pg_attribute? That table is already one of the
hotter system catalog tables, and very frequently the largest. If we
suddenly end up storing a lot of default values in there, possibly ones
that aren't ever actually used because all the default values are long
since actually stored in the table, we'll spend more time doing cache
lookups.

I'm somewhat tempted to say that the default data should be in a
separate catalog table.  Or possibly use pg_attrdef.

>  
> +/*
> + * Return the missing value of an attribute, or NULL if there isn't one.
> + */
> +static Datum
> +getmissingattr(TupleDesc tupleDesc,
> +               int attnum, bool *isnull)
> +{
> +    int            missingnum;
> +    Form_pg_attribute att;
> +    AttrMissing *attrmiss;
> +
> +    Assert(attnum <= tupleDesc->natts);
> +    Assert(attnum > 0);
> +
> +    att = TupleDescAttr(tupleDesc, attnum - 1);
> +
> +    if (att->atthasmissing)
> +    {
> +        Assert(tupleDesc->constr);
> +        Assert(tupleDesc->constr->num_missing > 0);
> +
> +        attrmiss = tupleDesc->constr->missing;
> +
> +        /*
> +         * Look through the tupledesc's attribute missing values
> +         * for the one that corresponds to this attribute.
> +         */
> +        for (missingnum = tupleDesc->constr->num_missing - 1;
> +             missingnum >= 0; missingnum--)
> +        {
> +            if (attrmiss[missingnum].amnum == attnum)
> +            {
> +                if (attrmiss[missingnum].ammissingNull)
> +                {
> +                    *isnull = true;
> +                    return (Datum) 0;
> +                }
> +                else
> +                {
> +                    *isnull = false;
> +                    return attrmiss[missingnum].ammissing;
> +                }
> +            }
> +        }

I think requiring this is a bad idea. If, and only if, there's default
attributes with missing values we should build an array that can be
directly indexed by attribute number, accessing the missing value.  What
you're doing here is essentially O(n^2) afaict, with n being the number
of columns missing values.


> +/*
> + * Fill in missing values for a TupleTableSlot
> + */
> +static void
> +slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum)
> +{
> +    AttrMissing *attrmiss;
> +    int            missingnum;
> +    int         missattnum;
> +    int         i;
> +
> +    /* initialize all the missing attributes to null */
> +    for (missattnum = lastAttNum - 1; missattnum >= startAttNum; missattnum--)
> +    {
> +        slot->tts_values[missattnum] = PointerGetDatum(NULL);
> +        slot->tts_isnull[missattnum] = true;
> +    }

Any reason not to memset this in one (well two) go with memset? Also,
how come you're using PointerGetDatum()? Normally we just use (Datum) 0,
imo it's confusing to use PointerGetDatum(), as it implies things that
are quite possibly not the case.


> +/*
> + * Fill in one attribute, either a data value or a bit in the null bitmask
> + */

Imo this isn't sufficient documentation to explain what this function
does.  Even if you just add "per-attribute helper for heap_fill_tuple
and other routines building tuples" or such, I think it'd be clearer.

> +static inline void
> +fill_val(Form_pg_attribute att,
> +         bits8 **bit,
> +         int *bitmask,
> +         char **dataP,
> +         uint16 *infomask,
> +         Datum datum,
> +         bool isnull)
> +{
> +    Size        data_length;
> +    char       *data = *dataP;
> +
> +    /*
> +     * If we're building a null bitmap, set the appropriate bit for the
> +     * current column value here.
> +     */
> +    if (bit != NULL)
> +    {
> +        if (*bitmask != HIGHBIT)
> +            *bitmask <<= 1;
> +        else
> +        {
> +            *bit += 1;
> +            **bit = 0x0;
> +            *bitmask = 1;
> +        }
> +
> +        if (isnull)
> +        {
> +            *infomask |= HEAP_HASNULL;
> +            return;
> +        }
> +
> +        **bit |= *bitmask;
> +    }
> +
> +    Assert(att);

Why isn't this asserted at the beginning?



> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
>   * ----------------
>   */
>  bool
> -heap_attisnull(HeapTuple tup, int attnum)
> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
>  {
> +    /*
> +     * We allow a NULL tupledesc for relations not expected to have
> +     * missing values, such as catalog relations and indexes.
> +     */
> +    Assert(!tupleDesc || attnum <= tupleDesc->natts);

This seems fairly likely to hide bugs.


>      if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data))
> -        return true;
> +    {
> +        return !(tupleDesc &&
> +                 TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing);

Brrr.


> +/*
> + * Expand a tuple with missing values, or NULLs. The source tuple must have
> + * less attributes than the required number. Only one of targetHeapTuple
> + * and targetMinimalTuple may be supplied. The other argument must be NULL.
> + */

I can't quite make head or tails of this comment "with missing values, or NULLs"?


> +static void
> +expand_tuple(HeapTuple *targetHeapTuple,
> +             MinimalTuple *targetMinimalTuple,
> +             HeapTuple sourceTuple,
> +             TupleDesc tupleDesc)
> +{
> +    AttrMissing *attrmiss = NULL;
> +    int            attnum;
> +    int            missingnum;
> +    int            firstmissingnum = 0;
> +    int            num_missing = 0;
> +    bool        hasNulls = HeapTupleHasNulls(sourceTuple);
> +    HeapTupleHeader targetTHeader;
> +    HeapTupleHeader sourceTHeader = sourceTuple->t_data;
> +    int            sourceNatts = HeapTupleHeaderGetNatts(sourceTHeader);
> +    int            natts = tupleDesc->natts;
> +    int            sourceIndicatorLen;
> +    int            targetIndicatorLen;
> +    Size        sourceDataLen = sourceTuple->t_len - sourceTHeader->t_hoff;
> +    Size        targetDataLen;
> +    Size        len;
> +    int            hoff;
> +    bits8       *nullBits = NULL;
> +    int            bitMask = 0;
> +    char       *targetData;
> +    uint16       *infoMask;
> +
> +    Assert((targetHeapTuple && !targetMinimalTuple)
> +           || (!targetHeapTuple && targetMinimalTuple));
> +
> +    Assert(sourceNatts < natts);
> +
> +    sourceIndicatorLen = (hasNulls ? BITMAPLEN(sourceNatts) : 0);

Maybe I'm just grumpy right now (did response work on 2016 taxes just
now, brrrrr), but I don't find "indicatorLen" a descriptive term.


> +    targetDataLen = sourceDataLen;
> +
> +    if (tupleDesc->constr &&
> +        tupleDesc->constr->num_missing)
> +    {
> +        /*
> +         * If there are missing values we want to put them into the tuple.
> +         * Before that we have to compute the extra length for the values
> +         * array and the variable length data.
> +         */
> +        num_missing = tupleDesc->constr->num_missing;
> +        attrmiss = tupleDesc->constr->missing;

You're accessing a hell of a lot of expensive stuff outside outside of
this if block. E.g. HeapTupleHeaderGetNatts specifically is the prime
source of cache misses in a lot of workloads. Now a smart compiler might
be able to optimize this away, but...


> +        /*
> +         * Find the first item in attrmiss for which we don't have a value in
> +         * the source (i.e. where its amnum is > sourceNatts). We can ignore
> +         * all the missing entries before that.
> +         */
> +        for (firstmissingnum = 0;
> +             firstmissingnum < num_missing
> +             && attrmiss[firstmissingnum].amnum <= sourceNatts;
> +             firstmissingnum++)
> +        {                        /* empty */
> +        }
> +
> +        /*
> +         * If there are no more missing values everything else must be
> +         * NULL
> +         */
> +        if (firstmissingnum >= num_missing)
> +        {
> +            hasNulls = true;
> +        }
> +
> +        /*
> +         * Now walk the missing attributes one by one. If there is a
> +         * missing value make space for it. Otherwise, it's going to be NULL.
> +         */
> +        for (attnum = sourceNatts + 1;
> +             attnum <= natts;
> +             attnum++)
> +        {
> +            Form_pg_attribute att = TupleDescAttr(tupleDesc, attnum - 1);
> +
> +            for (missingnum = firstmissingnum; missingnum < num_missing; missingnum++)
> +            {
> +
> +                /*
> +                 * If there is a missing value for this attribute calculate
> +                 * the required space if it's not NULL.
> +                 */
> +                if (attrmiss[missingnum].amnum == attnum)
> +                {
> +                    if (attrmiss[missingnum].ammissingNull)
> +                        hasNulls = true;
> +                    else
> +                    {
> +                        targetDataLen = att_align_datum(targetDataLen,
> +                                                        att->attalign,
> +                                                        att->attlen,
> +                                                        attrmiss[missingnum].ammissing);
> +
> +                        targetDataLen = att_addlength_pointer(targetDataLen,
> +                                                              att->attlen,
> +                                                              attrmiss[missingnum].ammissing);
> +                    }
> +                    break;
> +                }
> +            }
> +            if (missingnum > num_missing)
> +            {
> +                /* there is no missing value for this attribute */
> +                hasNulls = true;
> +            }
> +        }
> +    }                            /* end if have missing values */
> +    else
> +    {
> +        /*
> +         * If there are no missing values at all then NULLS must be allowed,
> +         * since some of the attributes are known to be absent.
> +         */
> +        hasNulls = true;

Why are we doing anything at all in this branch? ISTM we're reforming
the tuple for absolutely no gain here, just making it larger by
including additionall null bits?


> +    /*
> +     * Allocate and zero the space needed.  Note that the tuple body and
> +     * HeapTupleData management structure are allocated in one chunk.
> +     */
> +    if (targetHeapTuple)
> +    {
> +        len += offsetof(HeapTupleHeaderData, t_bits);
> +        hoff = len = MAXALIGN(len); /* align user data safely */
> +        len += targetDataLen;

So len isn't the length of the tuple, but the null bitmap length? That
seems, uh, non-obvious.


> +        *targetHeapTuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len);
> +        (*targetHeapTuple)->t_data
> +            = targetTHeader
> +            = (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);

Yuck.


> +        (*targetHeapTuple)->t_len = len;
> +        (*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
> +        ItemPointerSetInvalid(&((*targetHeapTuple)->t_self));

Seems like a lot of this code would be more readable by storing
*targetHeapTuple in a local var.


> +        targetTHeader->t_infomask = sourceTHeader->t_infomask;
> +        targetTHeader->t_hoff = hoff;
> +        HeapTupleHeaderSetNatts(targetTHeader, natts);
> +        HeapTupleHeaderSetDatumLength(targetTHeader, len);
> +        HeapTupleHeaderSetTypeId(targetTHeader, tupleDesc->tdtypeid);
> +        HeapTupleHeaderSetTypMod(targetTHeader, tupleDesc->tdtypmod);
> +        /* We also make sure that t_ctid is invalid unless explicitly set */
> +        ItemPointerSetInvalid(&(targetTHeader->t_ctid));

This seems fairly expensive, why aren't we just memcpy'ing the old
header into the new one?




> +    if (targetIndicatorLen > 0)
> +    {
> +        if (sourceIndicatorLen > 0)
> +        {
> +            /* if bitmap pre-existed copy in - all is set */

can't parse.

> +            memcpy(nullBits,
> +                   ((char *) sourceTHeader)
> +                   + offsetof(HeapTupleHeaderData, t_bits),
> +                   sourceIndicatorLen);
> +            nullBits += sourceIndicatorLen - 1;
> +        }
> +        else
> +        {
> +            sourceIndicatorLen = BITMAPLEN(sourceNatts);
> +            /* Set NOT NULL for all existing attributes */
> +            memset(nullBits, 0xff, sourceIndicatorLen);
> +
> +            nullBits += sourceIndicatorLen - 1;
> +
> +            if (sourceNatts & 0x07)
> +            {
> +                /* build the mask (inverted!) */
> +                bitMask = 0xff << (sourceNatts & 0x07);
> +                /* Voila */
> +                *nullBits = ~bitMask;
> +            }

It's late, I'm tired. But how can this be correct?  Also, this code is
massively under-commented.



> +
> +/*
> + * Fill in the missing values for an ordinary HeapTuple
> + */
> +MinimalTuple
> +minimal_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc)
> +{
> +    MinimalTuple minimalTuple;
> +
> +    expand_tuple(NULL, &minimalTuple, sourceTuple, tupleDesc);
> +    return minimalTuple;
> +}
> +
> +/*
> + * Fill in the missing values for a minimal HeapTuple
> + */
> +HeapTuple
> +heap_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc)
> +{
> +    HeapTuple    heapTuple;
> +
> +    expand_tuple(&heapTuple, NULL, sourceTuple, tupleDesc);
> +    return heapTuple;
> +}

Comments appear to be swapped.



>  /* ----------------
>   *        heap_copy_tuple_as_datum
>   *
> @@ -1012,13 +1422,10 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
>  
>      /*
>       * If tuple doesn't have all the atts indicated by tupleDesc, read the
> -     * rest as null
> +     * rest as null, or a missing value if there is one.
>       */

"a missing value" isn't there quite possibly multiple ones?


>  /*
> @@ -1192,8 +1599,7 @@ slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
>      tup = tuple->t_data;
>      if (attnum > HeapTupleHeaderGetNatts(tup))
>      {
> -        *isnull = true;
> -        return (Datum) 0;
> +        return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
>      }

This turned a ~4 instruction case into considerably more. I've recently
removed one of the major slot_getattr sources (hashagg comparisons), but
we still call this super frequently for hashagg's hash computations.  I
suggest micro-benchmarking that case.



>  Oid
>  StoreAttrDefault(Relation rel, AttrNumber attnum,
> -                 Node *expr, bool is_internal)
> +                 Node *expr, bool is_internal, bool add_column_mode)
>  {
>      char       *adbin;
>      char       *adsrc;
> @@ -1939,9 +1945,20 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
>      Relation    attrrel;
>      HeapTuple    atttup;
>      Form_pg_attribute attStruct;
> +    Form_pg_attribute defAttStruct;
>      Oid            attrdefOid;
>      ObjectAddress colobject,
>                  defobject;
> +    ExprState  *exprState;
> +    Expr       *expr2 = (Expr *) expr;
> +    EState       *estate = NULL;
> +    ExprContext *econtext;
> +    char       *missingBuf = NULL;
> +    Datum        valuesAtt[Natts_pg_attribute];
> +    bool        nullsAtt[Natts_pg_attribute];
> +    bool        replacesAtt[Natts_pg_attribute];
> +    Datum        missingval = (Datum) 0;
> +    bool        missingIsNull = true;
>  
>      /*
>       * Flatten expression to string form for storage.
> @@ -1956,6 +1973,44 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
>                                                     RelationGetRelid(rel)),
>                                 false, false);
>  
> +    /*
> +     * Compute the missing value
> +     */
> +    expr2 = expression_planner(expr2);
> +
> +    exprState = ExecInitExpr(expr2, NULL);

This should instead use ExecPrepareExpr()


> +    if (add_column_mode)
> +    {
> +        missingval = ExecEvalExpr(exprState, econtext,
> +                                    &missingIsNull);
> +
> +        defAttStruct = TupleDescAttr(rel->rd_att, attnum - 1);
> +
> +        if (missingIsNull)
> +        {
> +            missingval = PointerGetDatum(NULL);

(Datum) 0;


> +        }
> +        else if (defAttStruct->attbyval)
> +        {
> +            missingBuf = palloc(VARHDRSZ + sizeof(Datum));
> +            memcpy(VARDATA(missingBuf), &missingval, sizeof(Datum));
> +            SET_VARSIZE(missingBuf, VARHDRSZ + sizeof(Datum));
> +            missingval = PointerGetDatum(missingBuf);
> +        }
> +        else if (defAttStruct->attlen >= 0)
> +        {
> +            missingBuf = palloc(VARHDRSZ + defAttStruct->attlen);
> +            memcpy(VARDATA(missingBuf), DatumGetPointer(missingval),
> +                   defAttStruct->attlen);
> +            SET_VARSIZE(missingBuf,
> +                        VARHDRSZ + defAttStruct->attlen);
> +            missingval = PointerGetDatum(missingBuf);
> +        }

What if we get an extended datum back in the varlena case? Even if we
wouldn't change the storage from bytea to something else, you really
would need to force detoasting here.


> +    }
> +
>      /*
>       * Make the pg_attrdef entry.
>       */
> @@ -1996,7 +2051,22 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
>      attStruct = (Form_pg_attribute) GETSTRUCT(atttup);
>      if (!attStruct->atthasdef)
>      {
> -        attStruct->atthasdef = true;
> +        MemSet(valuesAtt, 0, sizeof(valuesAtt));
> +        MemSet(nullsAtt, false, sizeof(nullsAtt));
> +        MemSet(replacesAtt, false, sizeof(replacesAtt));
> +        valuesAtt[Anum_pg_attribute_atthasdef - 1] = true;
> +        replacesAtt[Anum_pg_attribute_atthasdef - 1] = true;
> +        if (add_column_mode)
> +        {
> +            valuesAtt[Anum_pg_attribute_atthasmissing - 1] = true;
> +            replacesAtt[Anum_pg_attribute_atthasmissing - 1] = true;
> +            valuesAtt[Anum_pg_attribute_attmissingval - 1] = missingval;
> +            replacesAtt[Anum_pg_attribute_attmissingval - 1] = true;
> +            nullsAtt[Anum_pg_attribute_attmissingval - 1] = missingIsNull;
> +        }
> +        atttup = heap_modify_tuple(atttup, RelationGetDescr(attrrel),
> +                                   valuesAtt, nullsAtt, replacesAtt);
> +

This seems weird. We've above formed a tuple, now we just modify it
again?


> @@ -2296,7 +2377,14 @@ AddRelationNewConstraints(Relation rel,
>              (IsA(expr, Const) &&((Const *) expr)->constisnull))
>              continue;
>  
> -        defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal);
> +        /* If the default is volatile we cannot use a missing value */
> +        if (contain_volatile_functions((Node *) expr))
> +        {
> +            colDef->missingMode = false;
> +        }
> +        defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal,
> +                                  colDef->missingMode);

Doesn't this have to be immutable rather than just stable? I mean just
storing the value of now() doesn't seem like it'd do the trick?



> @@ -663,7 +671,23 @@ ExecFetchSlotTuple(TupleTableSlot *slot)
>       * If we have a regular physical tuple then just return it.
>       */
>      if (TTS_HAS_PHYSICAL_TUPLE(slot))
> -        return slot->tts_tuple;
> +    {
> +        if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) <
> +            slot->tts_tupleDescriptor->natts)
> +        {
> +            MemoryContext oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
> +
> +            slot->tts_tuple = heap_expand_tuple(slot->tts_tuple,
> +                                                slot->tts_tupleDescriptor);
> +            slot->tts_shouldFree = true;
> +            MemoryContextSwitchTo(oldContext);
> +            return slot->tts_tuple;
> +        }
> +        else
> +        {
> +            return slot->tts_tuple;
> +        }
> +    }

I'm extremely dubious about this. HeapTupleHeaderGetNatts() commonly
causes cachemisses and therefore causes slowdowns when unnecessarily
done. Additionally here we're expanding the tuple to include columns
that aren't even likely to be accessed. This is a fairly hot codepath,
and made hugely more expensive by this.


> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
>              return false;        /* out of order */
>          if (att_tup->attisdropped)
>              return false;        /* table contains dropped columns */
> +        if (att_tup->atthasmissing)
> +            return false;        /* table contains cols with missing values */

Hm, so incrementally building a table definitions with defaults, even if
there aren't ever any rows, or if there's a subsequent table rewrite,
will cause slowdowns. If so, shouldn't a rewrite remove all
atthasmissing values?




> @@ -493,7 +494,10 @@ RelationBuildTupleDesc(Relation relation)

> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
>                      MemoryContextAllocZero(CacheMemoryContext,
>                                             relation->rd_rel->relnatts *
>                                             sizeof(AttrDefault));
> -            attrdef[ndef].adnum = attp->attnum;
> +            attrdef[ndef].adnum = attnum;
>              attrdef[ndef].adbin = NULL;
> +
>              ndef++;
>          }
> +
> +        /* Likewise for a missing value */
> +        if (attp->atthasmissing)
> +        {

As I've written somewhere above, I don't think it's a good idea to do
this preemptively. Tupledescs are very commonly copied, and the
missing attributes are quite likely never used. IOW, we should just
remember the attmissingattr value and fill in the corresponding
attmissingval on-demand.


> +            Datum        missingval;
> +            bool        missingNull;
> +
> +            if (attrmiss == NULL)
> +                attrmiss = (AttrMissing *)
> +                    MemoryContextAllocZero(CacheMemoryContext,
> +                                           relation->rd_rel->relnatts *
> +                                           sizeof(AttrMissing));
> +
> +            /* Do we have a missing value? */
> +            missingval = SysCacheGetAttr(ATTNUM, pg_attribute_tuple,
> +                                           Anum_pg_attribute_attmissingval,
> +                                           &missingNull);

Shouldn't this be a normal heap_getattr rather than a SysCacheGetAttr
access?

* SysCacheGetAttr
 *
 *        Given a tuple previously fetched by SearchSysCache(),
 *        extract a specific attribute.


> @@ -620,7 +691,19 @@ RelationBuildTupleDesc(Relation relation)
>              AttrDefaultFetch(relation);
>          }
>          else
> +        {
>              constr->num_defval = 0;
> +        }
> +
> +        if (nmissing > 0)
> +        {
> +            if (nmissing < relation->rd_rel->relnatts)
> +                constr->missing = (AttrMissing *)
> +                    repalloc(attrmiss, nmissing * sizeof(AttrMissing));
> +            else
> +                constr->missing = attrmiss;
> +        }
> +        constr->num_missing = nmissing;

Am I understand correctly that you're *downsizing* the allocation with
repalloc here? That seems likely to commonly lead to some wastage...

> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation)
>  
>      systable_endscan(adscan);
>      heap_close(adrel, AccessShareLock);
> -
> -    if (found != ndef)
> -        elog(WARNING, "%d attrdef record(s) missing for rel %s",
> -             ndef - found, RelationGetRelationName(relation));
>  }

Hm, it's not obvious why this is a good thing?



> diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
> index 415efbab97..ed30f98e33 100644
> --- a/src/include/access/tupdesc.h
> +++ b/src/include/access/tupdesc.h
> @@ -14,6 +14,7 @@
>  #ifndef TUPDESC_H
>  #define TUPDESC_H
>  
> +#include "postgres.h"
>  #include "access/attnum.h"

postgres.h should never be included in headers, why was it added here?


This doesn't seem ready yet.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

22 февраля 2018 г., 08:05:27

On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>

[ Long and useful review]

> This doesn't seem ready yet.
>


Thanks.

I'm working through the issues you raised. Will reply in a few days.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

27 февраля 2018 г., 06:59:44

On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,

Comments interspersed.


>
> On 2018-02-20 17:03:07 +1030, Andrew Dunstan wrote:
>> +     <row>
>> +      <entry><structfield>attmissingval</structfield></entry>
>> +      <entry><type>bytea</type></entry>
>> +      <entry></entry>
>> +      <entry>
>> +       This column has a binary representation of the value used when the
>> +       column is entirely missing from the row, as happens when the column is
>> +       added after the row is created. The value is only used when
>> +       <structname>atthasmissing</structname> is true.
>> +      </entry>
>> +     </row>
>
> Hm. Why is this a bytea, and why is that a good idea? In pg_statistics,
> which has to deal with something similar-ish, we deal with the ambiguity
> of the storage by storing anyarrays, which then can display the contents
> properly thanks to the embedded oid.  Now it'd be a bit sad to store a
> one element array in the table. But that seems better than storing a
> bytea that's typeless and can't be viewed reasonably. The best solution
> would be to create a variadic any type that can actually be stored, but
> I see why you'd not want to go there necessarily.

The one-element array idea is a slight hack, but it's not that bad,
and I agree a good deal better than using a bytea. I've made that
change.

>
> But also, a bit higher level, is it a good idea to store this
> information directly into pg_attribute? That table is already one of the
> hotter system catalog tables, and very frequently the largest. If we
> suddenly end up storing a lot of default values in there, possibly ones
> that aren't ever actually used because all the default values are long
> since actually stored in the table, we'll spend more time doing cache
> lookups.
>
> I'm somewhat tempted to say that the default data should be in a
> separate catalog table.  Or possibly use pg_attrdef.
>

This was debated quite some time ago. See for example
<https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us>
The consensus then seemed to be to store them in pg_attribute. If not,
I think we'd need a new catalog - pg_attrdef doesn't seem right. They
would have to go on separate rows, and we'd be storing values rather
than expressions, so it would get somewhat ugly.


>>
>> +/*
>> + * Return the missing value of an attribute, or NULL if there isn't one.
>> + */
>> +static Datum
>> +getmissingattr(TupleDesc tupleDesc,
>> +                        int attnum, bool *isnull)
>> +{
>> +     int                     missingnum;
>> +     Form_pg_attribute att;
>> +     AttrMissing *attrmiss;
>> +
>> +     Assert(attnum <= tupleDesc->natts);
>> +     Assert(attnum > 0);
>> +
>> +     att = TupleDescAttr(tupleDesc, attnum - 1);
>> +
>> +     if (att->atthasmissing)
>> +     {
>> +             Assert(tupleDesc->constr);
>> +             Assert(tupleDesc->constr->num_missing > 0);
>> +
>> +             attrmiss = tupleDesc->constr->missing;
>> +
>> +             /*
>> +              * Look through the tupledesc's attribute missing values
>> +              * for the one that corresponds to this attribute.
>> +              */
>> +             for (missingnum = tupleDesc->constr->num_missing - 1;
>> +                      missingnum >= 0; missingnum--)
>> +             {
>> +                     if (attrmiss[missingnum].amnum == attnum)
>> +                     {
>> +                             if (attrmiss[missingnum].ammissingNull)
>> +                             {
>> +                                     *isnull = true;
>> +                                     return (Datum) 0;
>> +                             }
>> +                             else
>> +                             {
>> +                                     *isnull = false;
>> +                                     return attrmiss[missingnum].ammissing;
>> +                             }
>> +                     }
>> +             }
>
> I think requiring this is a bad idea. If, and only if, there's default
> attributes with missing values we should build an array that can be
> directly indexed by attribute number, accessing the missing value.  What
> you're doing here is essentially O(n^2) afaict, with n being the number
> of columns missing values.
>
>


Done



>> +/*
>> + * Fill in missing values for a TupleTableSlot
>> + */
>> +static void
>> +slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum)
>> +{
>> +     AttrMissing *attrmiss;
>> +     int                     missingnum;
>> +     int         missattnum;
>> +     int         i;
>> +
>> +     /* initialize all the missing attributes to null */
>> +     for (missattnum = lastAttNum - 1; missattnum >= startAttNum; missattnum--)
>> +     {
>> +             slot->tts_values[missattnum] = PointerGetDatum(NULL);
>> +             slot->tts_isnull[missattnum] = true;
>> +     }
>
> Any reason not to memset this in one (well two) go with memset? Also,
> how come you're using PointerGetDatum()? Normally we just use (Datum) 0,
> imo it's confusing to use PointerGetDatum(), as it implies things that
> are quite possibly not the case.
>
>


rewritten.


>> +/*
>> + * Fill in one attribute, either a data value or a bit in the null bitmask
>> + */
>
> Imo this isn't sufficient documentation to explain what this function
> does.  Even if you just add "per-attribute helper for heap_fill_tuple
> and other routines building tuples" or such, I think it'd be clearer.

Changed that way.

>
>> +static inline void
>> +fill_val(Form_pg_attribute att,
>> +              bits8 **bit,
>> +              int *bitmask,
>> +              char **dataP,
>> +              uint16 *infomask,
>> +              Datum datum,
>> +              bool isnull)
>> +{
>> +     Size            data_length;
>> +     char       *data = *dataP;
>> +
>> +     /*
>> +      * If we're building a null bitmap, set the appropriate bit for the
>> +      * current column value here.
>> +      */
>> +     if (bit != NULL)
>> +     {
>> +             if (*bitmask != HIGHBIT)
>> +                     *bitmask <<= 1;
>> +             else
>> +             {
>> +                     *bit += 1;
>> +                     **bit = 0x0;
>> +                     *bitmask = 1;
>> +             }
>> +
>> +             if (isnull)
>> +             {
>> +                     *infomask |= HEAP_HASNULL;
>> +                     return;
>> +             }
>> +
>> +             **bit |= *bitmask;
>> +     }
>> +
>> +     Assert(att);
>
> Why isn't this asserted at the beginning?
>
>


I don't think it's worth very much anyway, so I just removed it.


>
>> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
>>   * ----------------
>>   */
>>  bool
>> -heap_attisnull(HeapTuple tup, int attnum)
>> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
>>  {
>> +     /*
>> +      * We allow a NULL tupledesc for relations not expected to have
>> +      * missing values, such as catalog relations and indexes.
>> +      */
>> +     Assert(!tupleDesc || attnum <= tupleDesc->natts);
>
> This seems fairly likely to hide bugs.
>


How? There are plenty of cases where we simply expect quite reasonably
that there won't be any missing attributes, and we don't need  a
tupleDesc at all in such cases.


>
>>       if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data))
>> -             return true;
>> +     {
>> +             return !(tupleDesc &&
>> +                              TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing);
>
> Brrr.
>
>

rewritten.


>> +/*
>> + * Expand a tuple with missing values, or NULLs. The source tuple must have
>> + * less attributes than the required number. Only one of targetHeapTuple
>> + * and targetMinimalTuple may be supplied. The other argument must be NULL.
>> + */
>
> I can't quite make head or tails of this comment "with missing values, or NULLs"?
>
>

Rewritten.


>> +static void
>> +expand_tuple(HeapTuple *targetHeapTuple,
>> +                      MinimalTuple *targetMinimalTuple,
>> +                      HeapTuple sourceTuple,
>> +                      TupleDesc tupleDesc)
>> +{
>> +     AttrMissing *attrmiss = NULL;
>> +     int                     attnum;
>> +     int                     missingnum;
>> +     int                     firstmissingnum = 0;
>> +     int                     num_missing = 0;
>> +     bool            hasNulls = HeapTupleHasNulls(sourceTuple);
>> +     HeapTupleHeader targetTHeader;
>> +     HeapTupleHeader sourceTHeader = sourceTuple->t_data;
>> +     int                     sourceNatts = HeapTupleHeaderGetNatts(sourceTHeader);
>> +     int                     natts = tupleDesc->natts;
>> +     int                     sourceIndicatorLen;
>> +     int                     targetIndicatorLen;
>> +     Size            sourceDataLen = sourceTuple->t_len - sourceTHeader->t_hoff;
>> +     Size            targetDataLen;
>> +     Size            len;
>> +     int                     hoff;
>> +     bits8      *nullBits = NULL;
>> +     int                     bitMask = 0;
>> +     char       *targetData;
>> +     uint16     *infoMask;
>> +
>> +     Assert((targetHeapTuple && !targetMinimalTuple)
>> +                || (!targetHeapTuple && targetMinimalTuple));
>> +
>> +     Assert(sourceNatts < natts);
>> +
>> +     sourceIndicatorLen = (hasNulls ? BITMAPLEN(sourceNatts) : 0);
>
> Maybe I'm just grumpy right now (did response work on 2016 taxes just
> now, brrrrr), but I don't find "indicatorLen" a descriptive term.
>
>


changed


>> +     targetDataLen = sourceDataLen;
>> +
>> +     if (tupleDesc->constr &&
>> +             tupleDesc->constr->num_missing)
>> +     {
>> +             /*
>> +              * If there are missing values we want to put them into the tuple.
>> +              * Before that we have to compute the extra length for the values
>> +              * array and the variable length data.
>> +              */
>> +             num_missing = tupleDesc->constr->num_missing;
>> +             attrmiss = tupleDesc->constr->missing;
>
> You're accessing a hell of a lot of expensive stuff outside outside of
> this if block. E.g. HeapTupleHeaderGetNatts specifically is the prime
> source of cache misses in a lot of workloads. Now a smart compiler might
> be able to optimize this away, but...
>
>
>> +             /*
>> +              * Find the first item in attrmiss for which we don't have a value in
>> +              * the source (i.e. where its amnum is > sourceNatts). We can ignore
>> +              * all the missing entries before that.
>> +              */
>> +             for (firstmissingnum = 0;
>> +                      firstmissingnum < num_missing
>> +                      && attrmiss[firstmissingnum].amnum <= sourceNatts;
>> +                      firstmissingnum++)
>> +             {                                               /* empty */
>> +             }
>> +
>> +             /*
>> +              * If there are no more missing values everything else must be
>> +              * NULL
>> +              */
>> +             if (firstmissingnum >= num_missing)
>> +             {
>> +                     hasNulls = true;
>> +             }
>> +
>> +             /*
>> +              * Now walk the missing attributes one by one. If there is a
>> +              * missing value make space for it. Otherwise, it's going to be NULL.
>> +              */
>> +             for (attnum = sourceNatts + 1;
>> +                      attnum <= natts;
>> +                      attnum++)
>> +             {
>> +                     Form_pg_attribute att = TupleDescAttr(tupleDesc, attnum - 1);
>> +
>> +                     for (missingnum = firstmissingnum; missingnum < num_missing; missingnum++)
>> +                     {
>> +
>> +                             /*
>> +                              * If there is a missing value for this attribute calculate
>> +                              * the required space if it's not NULL.
>> +                              */
>> +                             if (attrmiss[missingnum].amnum == attnum)
>> +                             {
>> +                                     if (attrmiss[missingnum].ammissingNull)
>> +                                             hasNulls = true;
>> +                                     else
>> +                                     {
>> +                                             targetDataLen = att_align_datum(targetDataLen,
>> +
att->attalign,
>> +
att->attlen,
>> +
attrmiss[missingnum].ammissing);
>> +
>> +                                             targetDataLen = att_addlength_pointer(targetDataLen,
>> +
  att->attlen,
 
>> +
  attrmiss[missingnum].ammissing);
 
>> +                                     }
>> +                                     break;
>> +                             }
>> +                     }
>> +                     if (missingnum > num_missing)
>> +                     {
>> +                             /* there is no missing value for this attribute */
>> +                             hasNulls = true;
>> +                     }
>> +             }
>> +     }                                                       /* end if have missing values */
>> +     else
>> +     {
>> +             /*
>> +              * If there are no missing values at all then NULLS must be allowed,
>> +              * since some of the attributes are known to be absent.
>> +              */
>> +             hasNulls = true;
>
> Why are we doing anything at all in this branch? ISTM we're reforming
> the tuple for absolutely no gain here, just making it larger by
> including additionall null bits?
>
>
>> +     /*
>> +      * Allocate and zero the space needed.  Note that the tuple body and
>> +      * HeapTupleData management structure are allocated in one chunk.
>> +      */
>> +     if (targetHeapTuple)
>> +     {
>> +             len += offsetof(HeapTupleHeaderData, t_bits);
>> +             hoff = len = MAXALIGN(len); /* align user data safely */
>> +             len += targetDataLen;
>
> So len isn't the length of the tuple, but the null bitmap length? That
> seems, uh, non-obvious.
>
>
>> +             *targetHeapTuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len);
>> +             (*targetHeapTuple)->t_data
>> +                     = targetTHeader
>> +                     = (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);
>
> Yuck.
>
>
>> +             (*targetHeapTuple)->t_len = len;
>> +             (*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
>> +             ItemPointerSetInvalid(&((*targetHeapTuple)->t_self));
>
> Seems like a lot of this code would be more readable by storing
> *targetHeapTuple in a local var.
>
>
>> +             targetTHeader->t_infomask = sourceTHeader->t_infomask;
>> +             targetTHeader->t_hoff = hoff;
>> +             HeapTupleHeaderSetNatts(targetTHeader, natts);
>> +             HeapTupleHeaderSetDatumLength(targetTHeader, len);
>> +             HeapTupleHeaderSetTypeId(targetTHeader, tupleDesc->tdtypeid);
>> +             HeapTupleHeaderSetTypMod(targetTHeader, tupleDesc->tdtypmod);
>> +             /* We also make sure that t_ctid is invalid unless explicitly set */
>> +             ItemPointerSetInvalid(&(targetTHeader->t_ctid));
>
> This seems fairly expensive, why aren't we just memcpy'ing the old
> header into the new one?
>
>
>
>
>> +     if (targetIndicatorLen > 0)
>> +     {
>> +             if (sourceIndicatorLen > 0)
>> +             {
>> +                     /* if bitmap pre-existed copy in - all is set */
>
> can't parse.
>
>> +                     memcpy(nullBits,
>> +                                ((char *) sourceTHeader)
>> +                                + offsetof(HeapTupleHeaderData, t_bits),
>> +                                sourceIndicatorLen);
>> +                     nullBits += sourceIndicatorLen - 1;
>> +             }
>> +             else
>> +             {
>> +                     sourceIndicatorLen = BITMAPLEN(sourceNatts);
>> +                     /* Set NOT NULL for all existing attributes */
>> +                     memset(nullBits, 0xff, sourceIndicatorLen);
>> +
>> +                     nullBits += sourceIndicatorLen - 1;
>> +
>> +                     if (sourceNatts & 0x07)
>> +                     {
>> +                             /* build the mask (inverted!) */
>> +                             bitMask = 0xff << (sourceNatts & 0x07);
>> +                             /* Voila */
>> +                             *nullBits = ~bitMask;
>> +                     }
>
> It's late, I'm tired. But how can this be correct?  Also, this code is
> massively under-commented.
>
>


I'm going to revisit expand_tuple from scratch.



>
>> +
>> +/*
>> + * Fill in the missing values for an ordinary HeapTuple
>> + */
>> +MinimalTuple
>> +minimal_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc)
>> +{
>> +     MinimalTuple minimalTuple;
>> +
>> +     expand_tuple(NULL, &minimalTuple, sourceTuple, tupleDesc);
>> +     return minimalTuple;
>> +}
>> +
>> +/*
>> + * Fill in the missing values for a minimal HeapTuple
>> + */
>> +HeapTuple
>> +heap_expand_tuple(HeapTuple sourceTuple, TupleDesc tupleDesc)
>> +{
>> +     HeapTuple       heapTuple;
>> +
>> +     expand_tuple(&heapTuple, NULL, sourceTuple, tupleDesc);
>> +     return heapTuple;
>> +}
>
> Comments appear to be swapped.
>
>


Fixed.

>
>>  /* ----------------
>>   *           heap_copy_tuple_as_datum
>>   *
>> @@ -1012,13 +1422,10 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
>>
>>       /*
>>        * If tuple doesn't have all the atts indicated by tupleDesc, read the
>> -      * rest as null
>> +      * rest as null, or a missing value if there is one.
>>        */
>
> "a missing value" isn't there quite possibly multiple ones?
>
>

Rewritten


>>  /*
>> @@ -1192,8 +1599,7 @@ slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
>>       tup = tuple->t_data;
>>       if (attnum > HeapTupleHeaderGetNatts(tup))
>>       {
>> -             *isnull = true;
>> -             return (Datum) 0;
>> +             return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
>>       }
>
> This turned a ~4 instruction case into considerably more. I've recently
> removed one of the major slot_getattr sources (hashagg comparisons), but
> we still call this super frequently for hashagg's hash computations.  I
> suggest micro-benchmarking that case.
>
>


I'll look at it. I have rewritten getmissingattr along the line you've
suggested, so it should be more efficient, and most compilers will
inline it.


>
>>  Oid
>>  StoreAttrDefault(Relation rel, AttrNumber attnum,
>> -                              Node *expr, bool is_internal)
>> +                              Node *expr, bool is_internal, bool add_column_mode)
>>  {
>>       char       *adbin;
>>       char       *adsrc;
>> @@ -1939,9 +1945,20 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
>>       Relation        attrrel;
>>       HeapTuple       atttup;
>>       Form_pg_attribute attStruct;
>> +     Form_pg_attribute defAttStruct;
>>       Oid                     attrdefOid;
>>       ObjectAddress colobject,
>>                               defobject;
>> +     ExprState  *exprState;
>> +     Expr       *expr2 = (Expr *) expr;
>> +     EState     *estate = NULL;
>> +     ExprContext *econtext;
>> +     char       *missingBuf = NULL;
>> +     Datum           valuesAtt[Natts_pg_attribute];
>> +     bool            nullsAtt[Natts_pg_attribute];
>> +     bool            replacesAtt[Natts_pg_attribute];
>> +     Datum           missingval = (Datum) 0;
>> +     bool            missingIsNull = true;
>>
>>       /*
>>        * Flatten expression to string form for storage.
>> @@ -1956,6 +1973,44 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
>>
RelationGetRelid(rel)),
>>                                                          false, false);
>>
>> +     /*
>> +      * Compute the missing value
>> +      */
>> +     expr2 = expression_planner(expr2);
>> +
>> +     exprState = ExecInitExpr(expr2, NULL);
>
> This should instead use ExecPrepareExpr()
>

Fixed


>
>> +     if (add_column_mode)
>> +     {
>> +             missingval = ExecEvalExpr(exprState, econtext,
>> +                                                                     &missingIsNull);
>> +
>> +             defAttStruct = TupleDescAttr(rel->rd_att, attnum - 1);
>> +
>> +             if (missingIsNull)
>> +             {
>> +                     missingval = PointerGetDatum(NULL);
>
> (Datum) 0;
>
>
>> +             }
>> +             else if (defAttStruct->attbyval)
>> +             {
>> +                     missingBuf = palloc(VARHDRSZ + sizeof(Datum));
>> +                     memcpy(VARDATA(missingBuf), &missingval, sizeof(Datum));
>> +                     SET_VARSIZE(missingBuf, VARHDRSZ + sizeof(Datum));
>> +                     missingval = PointerGetDatum(missingBuf);
>> +             }
>> +             else if (defAttStruct->attlen >= 0)
>> +             {
>> +                     missingBuf = palloc(VARHDRSZ + defAttStruct->attlen);
>> +                     memcpy(VARDATA(missingBuf), DatumGetPointer(missingval),
>> +                                defAttStruct->attlen);
>> +                     SET_VARSIZE(missingBuf,
>> +                                             VARHDRSZ + defAttStruct->attlen);
>> +                     missingval = PointerGetDatum(missingBuf);
>> +             }
>
> What if we get an extended datum back in the varlena case? Even if we
> wouldn't change the storage from bytea to something else, you really
> would need to force detoasting here.
>


This is now rewritten.


>
>> +     }
>> +
>>       /*
>>        * Make the pg_attrdef entry.
>>        */
>> @@ -1996,7 +2051,22 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
>>       attStruct = (Form_pg_attribute) GETSTRUCT(atttup);
>>       if (!attStruct->atthasdef)
>>       {
>> -             attStruct->atthasdef = true;
>> +             MemSet(valuesAtt, 0, sizeof(valuesAtt));
>> +             MemSet(nullsAtt, false, sizeof(nullsAtt));
>> +             MemSet(replacesAtt, false, sizeof(replacesAtt));
>> +             valuesAtt[Anum_pg_attribute_atthasdef - 1] = true;
>> +             replacesAtt[Anum_pg_attribute_atthasdef - 1] = true;
>> +             if (add_column_mode)
>> +             {
>> +                     valuesAtt[Anum_pg_attribute_atthasmissing - 1] = true;
>> +                     replacesAtt[Anum_pg_attribute_atthasmissing - 1] = true;
>> +                     valuesAtt[Anum_pg_attribute_attmissingval - 1] = missingval;
>> +                     replacesAtt[Anum_pg_attribute_attmissingval - 1] = true;
>> +                     nullsAtt[Anum_pg_attribute_attmissingval - 1] = missingIsNull;
>> +             }
>> +             atttup = heap_modify_tuple(atttup, RelationGetDescr(attrrel),
>> +                                                                valuesAtt, nullsAtt, replacesAtt);
>> +
>
> This seems weird. We've above formed a tuple, now we just modify it
> again?
>
>

I'll take a look at making it more straightforward. But It's not as
simple as the atthasdef case, since attmissing is part of the variable
portion of pg_attribute.


>> @@ -2296,7 +2377,14 @@ AddRelationNewConstraints(Relation rel,
>>                       (IsA(expr, Const) &&((Const *) expr)->constisnull))
>>                       continue;
>>
>> -             defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal);
>> +             /* If the default is volatile we cannot use a missing value */
>> +             if (contain_volatile_functions((Node *) expr))
>> +             {
>> +                     colDef->missingMode = false;
>> +             }
>> +             defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal,
>> +                                                               colDef->missingMode);
>
> Doesn't this have to be immutable rather than just stable? I mean just
> storing the value of now() doesn't seem like it'd do the trick?
>
>


I don't see why. We've stated explicitly in the docs that the existing
rows get the value of the default expression as evaluated at the time
the column is created. I don't think there is any need to restrict
that to an immutable expression.


>
>> @@ -663,7 +671,23 @@ ExecFetchSlotTuple(TupleTableSlot *slot)
>>        * If we have a regular physical tuple then just return it.
>>        */
>>       if (TTS_HAS_PHYSICAL_TUPLE(slot))
>> -             return slot->tts_tuple;
>> +     {
>> +             if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) <
>> +                     slot->tts_tupleDescriptor->natts)
>> +             {
>> +                     MemoryContext oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
>> +
>> +                     slot->tts_tuple = heap_expand_tuple(slot->tts_tuple,
>> +
slot->tts_tupleDescriptor);
>> +                     slot->tts_shouldFree = true;
>> +                     MemoryContextSwitchTo(oldContext);
>> +                     return slot->tts_tuple;
>> +             }
>> +             else
>> +             {
>> +                     return slot->tts_tuple;
>> +             }
>> +     }
>
> I'm extremely dubious about this. HeapTupleHeaderGetNatts() commonly
> causes cachemisses and therefore causes slowdowns when unnecessarily
> done. Additionally here we're expanding the tuple to include columns
> that aren't even likely to be accessed. This is a fairly hot codepath,
> and made hugely more expensive by this.


Will look further.

>
>
>> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
>>                       return false;           /* out of order */
>>               if (att_tup->attisdropped)
>>                       return false;           /* table contains dropped columns */
>> +             if (att_tup->atthasmissing)
>> +                     return false;           /* table contains cols with missing values */
>
> Hm, so incrementally building a table definitions with defaults, even if
> there aren't ever any rows, or if there's a subsequent table rewrite,
> will cause slowdowns. If so, shouldn't a rewrite remove all
> atthasmissing values?
>
>


Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having
had to make this change.  Still, it looks like dropping a column has
the same effect.


>
>
>> @@ -493,7 +494,10 @@ RelationBuildTupleDesc(Relation relation)
>
>> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
>>                                       MemoryContextAllocZero(CacheMemoryContext,
>>                                                                                  relation->rd_rel->relnatts *
>>                                                                                  sizeof(AttrDefault));
>> -                     attrdef[ndef].adnum = attp->attnum;
>> +                     attrdef[ndef].adnum = attnum;
>>                       attrdef[ndef].adbin = NULL;
>> +
>>                       ndef++;
>>               }
>> +
>> +             /* Likewise for a missing value */
>> +             if (attp->atthasmissing)
>> +             {
>
> As I've written somewhere above, I don't think it's a good idea to do
> this preemptively. Tupledescs are very commonly copied, and the
> missing attributes are quite likely never used. IOW, we should just
> remember the attmissingattr value and fill in the corresponding
> attmissingval on-demand.
>
>

Defaults are frequently not used either, yet this function fills those
in regardless. I don't see why we need to treat missing values
differently.



>> +                     Datum           missingval;
>> +                     bool            missingNull;
>> +
>> +                     if (attrmiss == NULL)
>> +                             attrmiss = (AttrMissing *)
>> +                                     MemoryContextAllocZero(CacheMemoryContext,
>> +                                                                                relation->rd_rel->relnatts *
>> +                                                                                sizeof(AttrMissing));
>> +
>> +                     /* Do we have a missing value? */
>> +                     missingval = SysCacheGetAttr(ATTNUM, pg_attribute_tuple,
>> +                                                                                Anum_pg_attribute_attmissingval,
>> +                                                                                &missingNull);
>
> Shouldn't this be a normal heap_getattr rather than a SysCacheGetAttr
> access?
>
> * SysCacheGetAttr
>  *
>  *              Given a tuple previously fetched by SearchSysCache(),
>  *              extract a specific attribute.
>
>


Fixed.


>> @@ -620,7 +691,19 @@ RelationBuildTupleDesc(Relation relation)
>>                       AttrDefaultFetch(relation);
>>               }
>>               else
>> +             {
>>                       constr->num_defval = 0;
>> +             }
>> +
>> +             if (nmissing > 0)
>> +             {
>> +                     if (nmissing < relation->rd_rel->relnatts)
>> +                             constr->missing = (AttrMissing *)
>> +                                     repalloc(attrmiss, nmissing * sizeof(AttrMissing));
>> +                     else
>> +                             constr->missing = attrmiss;
>> +             }
>> +             constr->num_missing = nmissing;
>
> Am I understand correctly that you're *downsizing* the allocation with
> repalloc here? That seems likely to commonly lead to some wastage...
>


This code is gone. But note the similar code nearby that is already there.


>> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation)
>>
>>       systable_endscan(adscan);
>>       heap_close(adrel, AccessShareLock);
>> -
>> -     if (found != ndef)
>> -             elog(WARNING, "%d attrdef record(s) missing for rel %s",
>> -                      ndef - found, RelationGetRelationName(relation));
>>  }
>
> Hm, it's not obvious why this is a good thing?
>
>
>
>> diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
>> index 415efbab97..ed30f98e33 100644
>> --- a/src/include/access/tupdesc.h
>> +++ b/src/include/access/tupdesc.h
>> @@ -14,6 +14,7 @@
>>  #ifndef TUPDESC_H
>>  #define TUPDESC_H
>>
>> +#include "postgres.h"
>>  #include "access/attnum.h"
>
> postgres.h should never be included in headers, why was it added here?
>
>


Fixed.


> This doesn't seem ready yet.
>

I hope we're a lot closer now.

I'm working on rejigging expand_tuple and on clearing the missing
attributes on a table rewrite.

Attached is the current state of things, with the fixes indicated
above, as well as some things pointed out by David Rowley.

None of this seems to have had much effect on performance.

Here's what oprofile reports from a run of pgbench with
create-alter.sql and the query "select sum(c1000) from t":

Profiling through timer interrupt
samples  %        image name               symbol name
22584    28.5982  postgres                 ExecInterpExpr
11950    15.1323  postgres                 slot_getmissingattrs
5471      6.9279  postgres                 AllocSetAlloc
3018      3.8217  postgres                 TupleDescInitEntry
2042      2.5858  postgres                 MemoryContextAllocZeroAligned
1807      2.2882  postgres                 SearchCatCache1
1638      2.0742  postgres                 expression_tree_walker


That's very different from what we see on the master branch. The time
spent in slot_getmissingattrs is perhaps not unexpected, but I don't
(at least yet) understand why we're getting so much time spent in
ExecInterpExpr, which doesn't show up at all when the same benchmark
is run on master.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v12.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

27 февраля 2018 г., 22:09:23

Hi,

On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote:
> This was debated quite some time ago. See for example
> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us>
> The consensus then seemed to be to store them in pg_attribute. If not,
> I think we'd need a new catalog - pg_attrdef doesn't seem right. They
> would have to go on separate rows, and we'd be storing values rather
> than expressions, so it would get somewhat ugly.

I know that I'm concerned with storing a number of large values in
pg_attributes, and I haven't seen that argument addressed much in a
quick scan of the discussion.


> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
> >>   * ----------------
> >>   */
> >>  bool
> >> -heap_attisnull(HeapTuple tup, int attnum)
> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
> >>  {
> >> +     /*
> >> +      * We allow a NULL tupledesc for relations not expected to have
> >> +      * missing values, such as catalog relations and indexes.
> >> +      */
> >> +     Assert(!tupleDesc || attnum <= tupleDesc->natts);
> >
> > This seems fairly likely to hide bugs.

> How? There are plenty of cases where we simply expect quite reasonably
> that there won't be any missing attributes, and we don't need  a
> tupleDesc at all in such cases.

Missing to pass a tupledesc for tables that can have defaults with not
have directly visible issues, you'll just erroneously get back a null.


> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
> >>                       return false;           /* out of order */
> >>               if (att_tup->attisdropped)
> >>                       return false;           /* table contains dropped columns */
> >> +             if (att_tup->atthasmissing)
> >> +                     return false;           /* table contains cols with missing values */
> >
> > Hm, so incrementally building a table definitions with defaults, even if
> > there aren't ever any rows, or if there's a subsequent table rewrite,
> > will cause slowdowns. If so, shouldn't a rewrite remove all
> > atthasmissing values?

> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having
> had to make this change.  Still, it looks like dropping a column has
> the same effect.

What exactly is mystifying you? That the new default stuff doesn't work
when the physical tlist optimization is in use?


> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
> >>                                       MemoryContextAllocZero(CacheMemoryContext,
> >>                                                                                  relation->rd_rel->relnatts *
> >>                                                                                  sizeof(AttrDefault));
> >> -                     attrdef[ndef].adnum = attp->attnum;
> >> +                     attrdef[ndef].adnum = attnum;
> >>                       attrdef[ndef].adbin = NULL;
> >> +
> >>                       ndef++;
> >>               }
> >> +
> >> +             /* Likewise for a missing value */
> >> +             if (attp->atthasmissing)
> >> +             {
> >
> > As I've written somewhere above, I don't think it's a good idea to do
> > this preemptively. Tupledescs are very commonly copied, and the
> > missing attributes are quite likely never used. IOW, we should just
> > remember the attmissingattr value and fill in the corresponding
> > attmissingval on-demand.

> Defaults are frequently not used either, yet this function fills those
> in regardless. I don't see why we need to treat missing values
> differently.

It's already hurting us quite a bit in workloads with a lot of trivial
queries. Adding more makes it worse.


> Attached is the current state of things, with the fixes indicated
> above, as well as some things pointed out by David Rowley.
> 
> None of this seems to have had much effect on performance.
> 
> Here's what oprofile reports from a run of pgbench with
> create-alter.sql and the query "select sum(c1000) from t":
> 
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 22584    28.5982  postgres                 ExecInterpExpr
> 11950    15.1323  postgres                 slot_getmissingattrs
> 5471      6.9279  postgres                 AllocSetAlloc
> 3018      3.8217  postgres                 TupleDescInitEntry
> 2042      2.5858  postgres                 MemoryContextAllocZeroAligned
> 1807      2.2882  postgres                 SearchCatCache1
> 1638      2.0742  postgres                 expression_tree_walker
> 
> 
> That's very different from what we see on the master branch. The time
> spent in slot_getmissingattrs is perhaps not unexpected, but I don't
> (at least yet) understand why we're getting so much time spent in
> ExecInterpExpr, which doesn't show up at all when the same benchmark
> is run on master.

I'd guess that's because the physical tlist optimization is disabled. I
assume you'd see something similar on master if you dropped a column.

- Andres

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

01 марта 2018 г., 01:49:43

On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote:
>> This was debated quite some time ago. See for example
>> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us>
>> The consensus then seemed to be to store them in pg_attribute. If not,
>> I think we'd need a new catalog - pg_attrdef doesn't seem right. They
>> would have to go on separate rows, and we'd be storing values rather
>> than expressions, so it would get somewhat ugly.
>
> I know that I'm concerned with storing a number of large values in
> pg_attributes, and I haven't seen that argument addressed much in a
> quick scan of the discussion.
>
>


I'd like to have a consensus on this before I start making such a
change as adding a new catalog table.


>> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
>> >>   * ----------------
>> >>   */
>> >>  bool
>> >> -heap_attisnull(HeapTuple tup, int attnum)
>> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
>> >>  {
>> >> +     /*
>> >> +      * We allow a NULL tupledesc for relations not expected to have
>> >> +      * missing values, such as catalog relations and indexes.
>> >> +      */
>> >> +     Assert(!tupleDesc || attnum <= tupleDesc->natts);
>> >
>> > This seems fairly likely to hide bugs.
>
>> How? There are plenty of cases where we simply expect quite reasonably
>> that there won't be any missing attributes, and we don't need  a
>> tupleDesc at all in such cases.
>
> Missing to pass a tupledesc for tables that can have defaults with not
> have directly visible issues, you'll just erroneously get back a null.
>
>


What would you prefer? Two versions of the function?


>> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
>> >>                       return false;           /* out of order */
>> >>               if (att_tup->attisdropped)
>> >>                       return false;           /* table contains dropped columns */
>> >> +             if (att_tup->atthasmissing)
>> >> +                     return false;           /* table contains cols with missing values */
>> >
>> > Hm, so incrementally building a table definitions with defaults, even if
>> > there aren't ever any rows, or if there's a subsequent table rewrite,
>> > will cause slowdowns. If so, shouldn't a rewrite remove all
>> > atthasmissing values?
>
>> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having
>> had to make this change.  Still, it looks like dropping a column has
>> the same effect.
>
> What exactly is mystifying you? That the new default stuff doesn't work
> when the physical tlist optimization is in use?
>


I think I have a better handle on that now. I'm trying to think of a
way around it, no massive inspiration yet.


>
>> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
>> >>                                       MemoryContextAllocZero(CacheMemoryContext,
>> >>                                                                                  relation->rd_rel->relnatts *
>> >>                                                                                  sizeof(AttrDefault));
>> >> -                     attrdef[ndef].adnum = attp->attnum;
>> >> +                     attrdef[ndef].adnum = attnum;
>> >>                       attrdef[ndef].adbin = NULL;
>> >> +
>> >>                       ndef++;
>> >>               }
>> >> +
>> >> +             /* Likewise for a missing value */
>> >> +             if (attp->atthasmissing)
>> >> +             {
>> >
>> > As I've written somewhere above, I don't think it's a good idea to do
>> > this preemptively. Tupledescs are very commonly copied, and the
>> > missing attributes are quite likely never used. IOW, we should just
>> > remember the attmissingattr value and fill in the corresponding
>> > attmissingval on-demand.
>
>> Defaults are frequently not used either, yet this function fills those
>> in regardless. I don't see why we need to treat missing values
>> differently.
>
> It's already hurting us quite a bit in workloads with a lot of trivial
> queries. Adding more makes it worse.
>
>


Please expand on your view of how we should "just remember the
attmissingattr value and fill in the corresponding attmissingval
on-demand."

I don't mind changing how we do things, but I need a bit more detail than this.

If copying defaults around has been hurting us so much it seems
surprising nobody has put forward an improvement.


>> Attached is the current state of things, with the fixes indicated
>> above, as well as some things pointed out by David Rowley.
>>
>> None of this seems to have had much effect on performance.
>>
>> Here's what oprofile reports from a run of pgbench with
>> create-alter.sql and the query "select sum(c1000) from t":
>>
>> Profiling through timer interrupt
>> samples  %        image name               symbol name
>> 22584    28.5982  postgres                 ExecInterpExpr
>> 11950    15.1323  postgres                 slot_getmissingattrs
>> 5471      6.9279  postgres                 AllocSetAlloc
>> 3018      3.8217  postgres                 TupleDescInitEntry
>> 2042      2.5858  postgres                 MemoryContextAllocZeroAligned
>> 1807      2.2882  postgres                 SearchCatCache1
>> 1638      2.0742  postgres                 expression_tree_walker
>>
>>
>> That's very different from what we see on the master branch. The time
>> spent in slot_getmissingattrs is perhaps not unexpected, but I don't
>> (at least yet) understand why we're getting so much time spent in
>> ExecInterpExpr, which doesn't show up at all when the same benchmark
>> is run on master.
>
> I'd guess that's because the physical tlist optimization is disabled. I
> assume you'd see something similar on master if you dropped a column.
>


Yeah. That's kinda ugly in fact.

Here's a version of the patch which cleans up the missing attribute
settings on a table rewrite.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v13.patch

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

06 марта 2018 г., 12:40:36

On 1 March 2018 at 11:49, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
> On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote:
>>> Profiling through timer interrupt
>>> samples  %        image name               symbol name
>>> 22584    28.5982  postgres                 ExecInterpExpr
>>> 11950    15.1323  postgres                 slot_getmissingattrs
>>> 5471      6.9279  postgres                 AllocSetAlloc
>>> 3018      3.8217  postgres                 TupleDescInitEntry
>>> 2042      2.5858  postgres                 MemoryContextAllocZeroAligned
>>> 1807      2.2882  postgres                 SearchCatCache1
>>> 1638      2.0742  postgres                 expression_tree_walker
>>>
>>>
>>> That's very different from what we see on the master branch. The time
>>> spent in slot_getmissingattrs is perhaps not unexpected, but I don't
>>> (at least yet) understand why we're getting so much time spent in
>>> ExecInterpExpr, which doesn't show up at all when the same benchmark
>>> is run on master.
>>
>> I'd guess that's because the physical tlist optimization is disabled. I
>> assume you'd see something similar on master if you dropped a column.

I put that to the test and there's still something fishy going on.

I created some benchmark scripts to help out a bit. Basically, they'll
allow you to choose how many columns you want in the test tables and
how many rows to put in them.

The following test is a 60 second single threaded pgbench test with
1000 columns, 400 rows, testing select sum(c10) from <table>

*** Benchmarking normal table...
tps = 1972.985639 (excluding connections establishing)

*** Benchmarking missing table...
tps = 433.779008 (excluding connections establishing)

*** Benchmarking dropped table...
tps = 3661.063986 (excluding connections establishing)

The dropped table had 1001 columns, but the script drops the first.
The missing table has all 1000 columns missing. In the normal table
all tuples have all attrs.

I imagine it should be possible to make the missing table case faster
than either of the others. The reason it's slower is down to
slot_getsomeattrs being called to get all 1000 attributes in all but
the 2nd call to the function... ? whereas only the 10th attr is
deformed in the Normal table case.

Here's some perf output for each case:

Normal:

  15.88%  postgres  postgres            [.] expression_tree_walker
   7.82%  postgres  postgres            [.] fix_expr_common
   6.12%  postgres  [kernel.kallsyms]   [k] __lock_text_start
   5.45%  postgres  postgres            [.] SearchCatCache3
   3.98%  postgres  postgres            [.] TupleDescInitEntry

Missing:

  47.80%  postgres  postgres            [.] ExecInterpExpr
  25.65%  postgres  postgres            [.] slot_getmissingattrs
   6.86%  postgres  postgres            [.] build_tlist_index
   4.43%  postgres  libc-2.17.so        [.] __memset_sse2
   2.57%  postgres  postgres            [.] expression_tree_walker

Dropped:

  19.59%  postgres  postgres            [.] slot_deform_tuple
   6.63%  postgres  postgres            [.] hash_search_with_hash_value
   6.55%  postgres  postgres            [.] heap_getnext
   5.35%  postgres  postgres            [.] ExecInterpExpr
   4.79%  postgres  postgres            [.] heap_page_prune_opt

/me studies the code for a bit...

Okay, it looks like the patch should disable physical tlists when
there's a missing column the same way as we do for dropped columns.
Patch attached.

TPS looks much better now;

*** Benchmarking missing table...

tps = 5666.117627 (excluding connections establishing)

everybody's going to want missing columns in their tables now...

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

06 марта 2018 г., 12:45:43

On 6 March 2018 at 22:40, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Okay, it looks like the patch should disable physical tlists when
> there's a missing column the same way as we do for dropped columns.
> Patch attached.

Please disregard the previous patch in favour of the attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

fast_default_disable_physical_tlists_v2.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

08 марта 2018 г., 08:40:06

On Tue, Mar 6, 2018 at 8:15 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 6 March 2018 at 22:40, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> Okay, it looks like the patch should disable physical tlists when
>> there's a missing column the same way as we do for dropped columns.
>> Patch attached.
>
> Please disregard the previous patch in favour of the attached.
>

OK, nice work, thanks. We're making good progress. Two of the cases I
was testing have gone from worse than master to better than master.
Here are the results I got, all against Tomas' 100-col 64-row table.
using pgbench -T 100:

 select sum(c1000) from t;
 fastdef tps = 4724.988619
 master  tps = 1590.843085
 ============
 select c1000 from t;
 fastdef tps = 5093.667203
 master  tps = 2437.613368
 ============
 select sum(c1000) from (select c1000 from t offset 0) x;
 fastdef tps = 3315.900091
 master  tps = 2067.574581
 ============
 select * from t;
 fastdef tps = 107.145811
 master  tps = 150.207957
 ============
 select sum(c1), count(c500), avg(c1000) from t;
 fastdef tps = 2304.636410
 master  tps = 1409.791975
 ============
 select sum(c10) from t;
 fastdef tps = 4332.625917
 master  tps = 2208.757119

"select * from t" used to be about a wash, but with this patch it's
got worse. The last two queries were worse and are now better, so
that's a win. I'm going to do a test to see if I can find the
break-even point between the second query and the fourth. If it turns
out to be at quite a large number of columns selected then I think it
might be something we can live with.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

08 марта 2018 г., 16:11:33

On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>  select * from t;
>  fastdef tps = 107.145811
>  master  tps = 150.207957
>
> "select * from t" used to be about a wash, but with this patch it's
> got worse. The last two queries were worse and are now better, so
> that's a win.

How does it compare to master if you drop a column out the table?
Physical tlists will be disabled in that case too. I imagine the
performance of master will drop much lower than the all columns
missing case.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

11 марта 2018 г., 17:59:36

On 9 March 2018 at 02:11, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>>  select * from t;
>>  fastdef tps = 107.145811
>>  master  tps = 150.207957
>>
>> "select * from t" used to be about a wash, but with this patch it's
>> got worse. The last two queries were worse and are now better, so
>> that's a win.
>
> How does it compare to master if you drop a column out the table?
> Physical tlists will be disabled in that case too. I imagine the
> performance of master will drop much lower than the all columns
> missing case.

I decided to test this for myself, and the missing version is still
slightly slower than the dropped column version, but not by much. I'm
not personally concerned about this.

The following results are with 1000 column tables with 64 rows each.

*** Benchmarking normal table...
tps = 232.794734 (excluding connections establishing)

*** Benchmarking missing table...
tps = 202.827754 (excluding connections establishing)

*** Benchmarking dropped table...
tps = 217.339255 (excluding connections establishing)

There's nothing particularly interesting in the profiles for the
missing and normal case either:

-- Missing case for select * from missing;

  12.87%  postgres  postgres            [.] AllocSetAlloc
  10.09%  postgres  libc-2.17.so        [.] __strlen_sse2_pminub
   6.35%  postgres  postgres            [.] pq_sendcountedtext
   5.97%  postgres  postgres            [.] enlargeStringInfo
   5.47%  postgres  postgres            [.] ExecInterpExpr
   4.45%  postgres  libc-2.17.so        [.] __memcpy_ssse3_back
   4.39%  postgres  postgres            [.] palloc
   4.31%  postgres  postgres            [.] printtup
   3.81%  postgres  postgres            [.] pg_ltoa
   3.72%  postgres  [kernel.kallsyms]   [k] __lock_text_start
   3.38%  postgres  postgres            [.] FunctionCall1Coll
   3.11%  postgres  postgres            [.] int4out
   2.97%  postgres  postgres            [.] appendBinaryStringInfoNT
   2.49%  postgres  postgres            [.] slot_getmissingattrs
   1.66%  postgres  postgres            [.] pg_server_to_any
   0.99%  postgres  postgres            [.] MemoryContextAllocZeroAligned
   0.94%  postgres  postgres            [.] expression_tree_walker
   0.93%  postgres  postgres            [.] lappend
   0.89%  postgres  [kernel.kallsyms]   [k] __do_page_fault
   0.72%  postgres  [kernel.kallsyms]   [k] clear_page_c_e
   0.72%  postgres  postgres            [.] SendRowDescriptionMessage
   0.71%  postgres  postgres            [.] expandRelAttrs
   0.64%  postgres  [kernel.kallsyms]   [k] get_page_from_freelist
   0.64%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
   0.62%  postgres  postgres            [.] pg_server_to_client
   0.59%  postgres  libc-2.17.so        [.] __memset_sse2
   0.58%  postgres  postgres            [.] set_pathtarget_cost_width
   0.56%  postgres  postgres            [.] OutputFunctionCall
   0.56%  postgres  postgres            [.] memcpy@plt
   0.54%  postgres  postgres            [.] makeTargetEntry
   0.50%  postgres  postgres            [.] SearchCatCache3

-- Normal case for select * from normal;

  12.57%  postgres  postgres            [.] AllocSetAlloc
  10.50%  postgres  libc-2.17.so        [.] __strlen_sse2_pminub
   7.65%  postgres  postgres            [.] slot_deform_tuple
   6.52%  postgres  postgres            [.] pq_sendcountedtext
   6.03%  postgres  postgres            [.] enlargeStringInfo
   4.51%  postgres  postgres            [.] printtup
   4.35%  postgres  postgres            [.] palloc
   4.34%  postgres  libc-2.17.so        [.] __memcpy_ssse3_back
   3.90%  postgres  postgres            [.] pg_ltoa
   3.49%  postgres  [kernel.kallsyms]   [k] __lock_text_start
   3.35%  postgres  postgres            [.] int4out
   3.31%  postgres  postgres            [.] FunctionCall1Coll
   2.82%  postgres  postgres            [.] appendBinaryStringInfoNT
   1.68%  postgres  postgres            [.] pg_server_to_any
   1.30%  postgres  postgres            [.] SearchCatCache3
   1.01%  postgres  postgres            [.] SendRowDescriptionMessage
   0.89%  postgres  postgres            [.] lappend
   0.88%  postgres  postgres            [.] MemoryContextAllocZeroAligned
   0.79%  postgres  postgres            [.] expression_tree_walker
   0.67%  postgres  postgres            [.] SearchCatCache1
   0.67%  postgres  postgres            [.] expandRelAttrs
   0.64%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
   0.60%  postgres  postgres            [.] pg_server_to_client
   0.57%  postgres  [kernel.kallsyms]   [k] __do_page_fault
   0.56%  postgres  postgres            [.] OutputFunctionCall
   0.53%  postgres  postgres            [.] memcpy@plt
   0.53%  postgres  postgres            [.] makeTargetEntry
   0.51%  postgres  [kernel.kallsyms]   [k] get_page_from_freelist

The difference appears to be in ExecInterpExpr, although, that only
accounts for about 5.5%, and we're missing 15% in performance.

Going by the commitfest app, the patch still does appear to be waiting
on Author. Never-the-less, I've made another pass over it and found a
few mistakes and a couple of ways to improve things:

1. Should <structname> not be <structfield>?

      <entry>
       This column has a value which is used where the column is entirely
       missing from the row, as happens when a column is added after the
       row is created. The actual value used is stored in the
       <structname>attmissingval</structname>  column.
      </entry>

Also additional surplus space before "column".

The same mixup appears in:

       <structname>atthasmissing</structname> is true. If there is no value

2. Instead of slot_getmissingattrs() having a memset() to set all the
missing attributes to NULL before resetting them to the missing value,
if one is found. Would it not be better to just have the missing array
store both the NULLs and the DEFAULTs together?

With that, the code could become:

static void
slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum)
{
    AttrMissing *attrmiss;
    int attnum;

    if (slot->tts_tupleDescriptor->constr &&
        slot->tts_tupleDescriptor->constr->missing)
        attrmiss = slot->tts_tupleDescriptor->constr->missing;
    else
       return;

    for (attnum = startAttNum; attnum < lastAttNum; attnum++)
    {
        slot->tts_values[attnum] = attrmiss[attnum].ammissing;
        slot->tts_isnull[attnum] = false;
   }
}

You may even be able to just:

Assert(slot->tts_tupleDescriptor->constr);
Assert(slot->tts_tupleDescriptor->constr->missing);

instead of the run-time check.

This would also simplify expand_tuple() and getmissingattr()

I think this change makes sense as slot_getmissingattrs() is getting
NULLs and the missing values, not just the missing values as the name
indicates. The name would become more fitting if you class a NULL as a
missing value.

3. Not sure what amnum means.  Is it meant to be adnum?

* the source (i.e. where its amnum is > sourceNatts). We can ignore

4. In StoreAttrDefault(), is there any point in creating the executor
state when add_column_mode == false?

It's probably also worth commenting what add_column_mode does. If any
caller mistakenly passes it as 'true' then you'll overwrite the
original missing value which would appear to update tuples that
already exist in the table.

It also looks like you could move the bulk of the new code in that
function, including the new variables into where you're modifying the
pg_attribute tuple.

This function also incorrectly sets atthasmissing to true even when
the DEFAULT expression evaluates to NULL. This seems to contradict
what you're doing in RelationBuildTupleDesc(), which skips building
the missing array unless it sees a non-null attmissingval.

This will cause getmissingattr() to Assert fail with the following case:

drop table if exists t1;
create table t1 (a int);
insert into t1 values(1);
alter table t1 add column b text default 'test' || null;
alter table t1 add column c text null;
select attname,atthasmissing,attmissingval from pg_attribute where
attrelid = 't1'::regclass and attnum>0;
select b from t1;
TRAP: FailedAssertion("!(tupleDesc->constr->missing)", File:
"src/backend/access/common/heaptuple.c", Line: 98)

Probably another good reason to do the refactor work mentioned in #2.

It might be worth adding a regression test to ensure atthasmissing
remains false for a column where the default expression evaluated to
NULL;

5. It's not really required, but in AddRelationNewConstraints() you
could get away with only performing the volatile function check if
missingMode is true. That would save calling
contain_volatile_functions() needlessly.

6. The comment above RelationClearMissing() should probably mention
that the caller must hold an AccessExclusiveLock on rel. There is also
a surplus newline between the header comment and the function.

7. Can you pg_ident the patch? There's quite a few places where the
whitespace is messed up, and also quite a few places where you've
changed lines so they're longer than 80 chars.

8. In RelationBuildTupleDesc() it looks like attnum can be declared in
the scope of the while loop.

An updated patch should probably also included the patch I sent to
disable the physical tlists when there are missing values mentioned in
the TupleDesc. Although I admit to not having properly updated the
comments even in my v2 attempt.

 * NIL.  Ideally we would like to handle the dropped-column case too.  However

should read:

 * NIL.  Ideally, we would like to handle these cases too.  However

Thanks again for working on this feature. I hope we can get this into PG11.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

13 марта 2018 г., 07:10:16

On Mon, Mar 12, 2018 at 1:29 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 9 March 2018 at 02:11, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
>>>  select * from t;
>>>  fastdef tps = 107.145811
>>>  master  tps = 150.207957
>>>
>>> "select * from t" used to be about a wash, but with this patch it's
>>> got worse. The last two queries were worse and are now better, so
>>> that's a win.
>>
>> How does it compare to master if you drop a column out the table?
>> Physical tlists will be disabled in that case too. I imagine the
>> performance of master will drop much lower than the all columns
>> missing case.
>
> I decided to test this for myself, and the missing version is still
> slightly slower than the dropped column version, but not by much. I'm
> not personally concerned about this.
>
> The following results are with 1000 column tables with 64 rows each.

I've done some more extensive benchmarking now. Here are some fairly
typical results from pgbench runs done on standard scale 100 pgbench
data:

[andrew@foo tests]$ PATH=$HOME/pg_fast_def/root/HEAD/inst/bin:$PATH
pgbench -S -c 10 -j 5 -T 60 test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 5
duration: 60 s
number of transactions actually processed: 2235601
latency average = 0.268 ms
tps = 37256.886332 (including connections establishing)
tps = 37258.562925 (excluding connections establishing)
[andrew@foo tests]$ PATH=$HOME/pg_head/root/HEAD/inst/bin:$PATH
pgbench -S -c 10 -j 5 -T 60 test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 5
duration: 60 s
number of transactions actually processed: 2230085
latency average = 0.269 ms
tps = 37164.696271 (including connections establishing)
tps = 37166.647971 (excluding connections establishing)

So generally the patched code and master are pretty much on a par.

I have also done some testing on cases meant to stress-test the
feature a bit - two 1000 column, all columns having defaults, one with
a dropped column. For the fast_default case I then also copied the
tables (and again dropped a column) so that the data files and table
definitions would match fairly closely what was being tested in the
master branch. The scripts in the attached tests.tgz. The test
platform is an Amazon r4.2xlarge instance running RHEL7.

There are two sets of results attached, one for 64 row tables and one
for 50k row tables.

The 50k row results are fairly unambiguous, the patched code performs
as well as or better (in some cases spectacularly better) than master.
In a few cases the patched code performs slightly worse than master in
the last (fdnmiss) case with the copied tables.

>
> Going by the commitfest app, the patch still does appear to be waiting
> on Author. Never-the-less, I've made another pass over it and found a
> few mistakes and a couple of ways to improve things:
>

working on these. Should have a new patch tomorrow.

> Thanks again for working on this feature. I hope we can get this into PG11.
>

Thanks for you help. I hope so too.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

13 марта 2018 г., 15:28:03

On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:

>>
>> Going by the commitfest app, the patch still does appear to be waiting
>> on Author. Never-the-less, I've made another pass over it and found a
>> few mistakes and a couple of ways to improve things:
>>
>
> working on these. Should have a new patch tomorrow.
>

Here is a patch that attends to most of these, except that I haven't
re-indented it.

Your proposed changes to slot_getmissingattrs() wouldn't work, but I
have rewritten it in a way that avoids the double work you disliked.

I'll rerun the benchmark tests I posted upthread and let you know the results.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v15.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

14 марта 2018 г., 01:36:54

On Tue, Mar 13, 2018 at 10:58 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>
>>>
>>> Going by the commitfest app, the patch still does appear to be waiting
>>> on Author. Never-the-less, I've made another pass over it and found a
>>> few mistakes and a couple of ways to improve things:
>>>
>>
>> working on these. Should have a new patch tomorrow.
>>
>
>
> Here is a patch that attends to most of these, except that I haven't
> re-indented it.
>
> Your proposed changes to slot_getmissingattrs() wouldn't work, but I
> have rewritten it in a way that avoids the double work you disliked.
>
> I'll rerun the benchmark tests I posted upthread and let you know the results.
>

Here are the benchmark results from the v15 patch. Fairly similar to
previous results. I'm going to run some profiling again to see if I
can identify any glaring hotspots. I do suspect that the "physical
tlist" optimization sometimes turns out not to be one. It seems
perverse to be able to improve a query's performance by dropping a
column.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

14 марта 2018 г., 03:28:35

On 14 March 2018 at 11:36, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> Here are the benchmark results from the v15 patch. Fairly similar to
> previous results. I'm going to run some profiling again to see if I
> can identify any glaring hotspots. I do suspect that the "physical
> tlist" optimization sometimes turns out not to be one. It seems
> perverse to be able to improve a query's performance by dropping a
> column.

Can you explain what "fdnmiss" is that appears in the results?


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

14 марта 2018 г., 03:31:40

Hi,

On 2018-03-14 09:06:54 +1030, Andrew Dunstan wrote:
> I do suspect that the "physical tlist" optimization sometimes turns
> out not to be one. It seems perverse to be able to improve a query's
> performance by dropping a column.

Yea, it's definitely not always a boon. Especially w/ the newer v10+
project code.  I suspect we should largely get rid of it in v12, which
presumably will see a storage layer abstraction...

It'll be even more worthwhile to get rid of it if we manage to avoid
deforming columns that aren't needed but are lower than the currently
required columns. I still think we should build bitmasks of required
columns during planning.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

14 марта 2018 г., 05:10:40




> On Mar 14, 2018, at 10:58 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 14 March 2018 at 11:36, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> Here are the benchmark results from the v15 patch. Fairly similar to
>> previous results. I'm going to run some profiling again to see if I
>> can identify any glaring hotspots. I do suspect that the "physical
>> tlist" optimization sometimes turns out not to be one. It seems
>> perverse to be able to improve a query's performance by dropping a
>> column.
>
> Can you explain what "fdnmiss" is that appears in the results?
>
>

It’s the patched code run against a materialized version of the table, i.e. one with no missing attributes.

Cheers

Andrew

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

15 марта 2018 г., 11:33:44

On Tue, Mar 13, 2018 at 10:58 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Tue, Mar 13, 2018 at 2:40 PM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>
>>>
>>> Going by the commitfest app, the patch still does appear to be waiting
>>> on Author. Never-the-less, I've made another pass over it and found a
>>> few mistakes and a couple of ways to improve things:
>>>
>>
>> working on these. Should have a new patch tomorrow.
>>
>
>
> Here is a patch that attends to most of these, except that I haven't
> re-indented it.
>
> Your proposed changes to slot_getmissingattrs() wouldn't work, but I
> have rewritten it in a way that avoids the double work you disliked.
>


rebased and mostly indented patch version attached. It's actually
moderately difficult to produce a nicely indented patch that is
against non-pgindented code. We should look at that a bit. Another
item for my TODO list.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

fast_default-v16.patch

Re: Re: ALTER TABLE ADD COLUMN fast default

От

David Steele

Дата:

21 марта 2018 г., 22:31:43

On 3/15/18 4:33 AM, Andrew Dunstan wrote:
> 
> rebased and mostly indented patch version attached. It's actually
> moderately difficult to produce a nicely indented patch that is
> against non-pgindented code. We should look at that a bit. Another
> item for my TODO list.
It looks like this should be marked Needs Review so I have done so.  If
that's not right please change it back or let me know and I will.

Regards,
-- 
-David
david@pgmasters.net

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

25 марта 2018 г., 10:09:36

On 15 March 2018 at 21:33, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> rebased and mostly indented patch version attached.

Thanks. I've attached a version of this which applies, builds and
passes the regression tests on current master.

Some conflicts were caused by 325f2ec555 and there was a new call to
heap_attisnull which needed to be updated.

I'll look over this now.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

fast_default-v17.patch

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

25 марта 2018 г., 13:43:40

On 25 March 2018 at 20:09, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 15 March 2018 at 21:33, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> rebased and mostly indented patch version attached.
>
> Thanks. I've attached a version of this which applies, builds and
> passes the regression tests on current master.
>
> Some conflicts were caused by 325f2ec555 and there was a new call to
> heap_attisnull which needed to be updated.
>
> I'll look over this now.

I've attached a delta patch against the v17 patch that I attached
earlier.  I didn't change much, but there did seem to be a few places
where the patch was not properly setting atthasmissing to false. Most
of the rest is just cosmetic stuff

With the attached applied, I'm happy to mark the patch as ready for
committer, however, Petr is also signed up to review, so will defer to
him to see if he has any comments before altering the commitfest app's
state.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

fast_default-v17_fixes.patch

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

26 марта 2018 г., 01:32:10

On Sun, Mar 25, 2018 at 9:13 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 25 March 2018 at 20:09, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> On 15 March 2018 at 21:33, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> rebased and mostly indented patch version attached.
>>
>> Thanks. I've attached a version of this which applies, builds and
>> passes the regression tests on current master.
>>
>> Some conflicts were caused by 325f2ec555 and there was a new call to
>> heap_attisnull which needed to be updated.
>>
>> I'll look over this now.
>
> I've attached a delta patch against the v17 patch that I attached
> earlier.  I didn't change much, but there did seem to be a few places
> where the patch was not properly setting atthasmissing to false. Most
> of the rest is just cosmetic stuff
>
> With the attached applied, I'm happy to mark the patch as ready for
> committer, however, Petr is also signed up to review, so will defer to
> him to see if he has any comments before altering the commitfest app's
> state.


Thanks for this, all looks good. Here is the consolidate patch
rebased. If there are no further comments I propose to commit this in
a few days time.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

David Rowley

Дата:

26 марта 2018 г., 02:04:33

On 25 March 2018 at 23:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
> With the attached applied, I'm happy to mark the patch as ready for
> committer, however, Petr is also signed up to review, so will defer to
> him to see if he has any comments before altering the commitfest app's
> state.

Petr won't have time to look so I'll adjust the commitfest app's state now.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

28 марта 2018 г., 10:00:41

Hi,

On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
> Thanks for this, all looks good. Here is the consolidate patch
> rebased. If there are no further comments I propose to commit this in
> a few days time.

Here's a bit of post commit review:

@@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
 
     /*
      * If tuple doesn't have all the atts indicated by tupleDesc, read the
-     * rest as null
+     * rest as NULLs or missing values
      */
-    for (; attno < attnum; attno++)
-    {
-        slot->tts_values[attno] = (Datum) 0;
-        slot->tts_isnull[attno] = true;
-    }
+    if (attno < attnum)
+        slot_getmissingattrs(slot, attno, attnum);
+
     slot->tts_nvalid = attnum;
 }

It's worthwhile to note that this'll re-process *all* missing values,
even if they've already been deformed. I.e. if
slot_getmissingattrs(.., first-missing)
slot_getmissingattrs(.., first-missing + 1)
slot_getmissingattrs(.., first-missing + 2)
is called, all three missing values will be copied every time. That's
because tts_nvalid isn't taken into account.  I wonder if slot_getmissingattrs
could take tts_nvalid into account?

I also wonder if this doesn't deserve an unlikely(), to avoid the cost
of spilling registers in the hot branch of not missing any values.


+    else
+    {
+        /* if there is a missing values array we must process them one by one */
+        for (missattnum = lastAttNum - 1;
+             missattnum >= startAttNum;
+             missattnum--)
+        {
+            slot->tts_values[missattnum] = attrmiss[missattnum].ammissing;
+            slot->tts_isnull[missattnum] =
+                !attrmiss[missattnum].ammissingPresent;
+        }
+    }
+}

Why is this done backwards? It's noticeably slower to walk arrays
backwards on some CPU microarchitectures.



@@ -176,6 +179,23 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
             }
         }
 


@@ -469,6 +503,29 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
             if (strcmp(defval1->adbin, defval2->adbin) != 0)
                 return false;
         }
+        if (constr1->missing)
+        {
+            if (!constr2->missing)
+                return false;
+            for (i = 0; i < tupdesc1->natts; i++)
+            {
+                AttrMissing *missval1 = constr1->missing + i;
+                AttrMissing *missval2 = constr2->missing + i;

It's a bit odd to not use array indexing here?


@@ -625,7 +625,15 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot)
     if (slot->tts_mintuple)
         return heap_copy_minimal_tuple(slot->tts_mintuple);
     if (slot->tts_tuple)
-        return minimal_tuple_from_heap_tuple(slot->tts_tuple);
+    {
+        if (TTS_HAS_PHYSICAL_TUPLE(slot) &&
+            HeapTupleHeaderGetNatts(slot->tts_tuple->t_data)
+            < slot->tts_tupleDescriptor->natts)
+            return minimal_expand_tuple(slot->tts_tuple,
+                                        slot->tts_tupleDescriptor);
+        else
+            return minimal_tuple_from_heap_tuple(slot->tts_tuple);
+    }
 

What's the TTS_HAS_PHYSICAL_TUPLE doing here? How can this be relevant
given the previous tts_mintuple check? Am I missing something?



@@ -563,10 +569,63 @@ RelationBuildTupleDesc(Relation relation)
                     MemoryContextAllocZero(CacheMemoryContext,
                                            relation->rd_rel->relnatts *
                                            sizeof(AttrDefault));
-            attrdef[ndef].adnum = attp->attnum;
+            attrdef[ndef].adnum = attnum;
             attrdef[ndef].adbin = NULL;
+
             ndef++;
         }
+
+        /* Likewise for a missing value */
+        if (attp->atthasmissing)
+        {
+            Datum        missingval;
+            bool        missingNull;
+
+            /* Do we have a missing value? */
+            missingval = heap_getattr(pg_attribute_tuple,
+                                      Anum_pg_attribute_attmissingval,
+                                      pg_attribute_desc->rd_att,
+                                      &missingNull);
+            if (!missingNull)
+            {
+                /* Yes, fetch from the array */
+                MemoryContext oldcxt;
+                bool        is_null;
+                int            one = 1;
+                Datum        missval;
+
+                if (attrmiss == NULL)
+                    attrmiss = (AttrMissing *)
+                        MemoryContextAllocZero(CacheMemoryContext,
+                                               relation->rd_rel->relnatts *
+                                               sizeof(AttrMissing));
+
+                missval = array_get_element(missingval,
+                                            1,
+                                            &one,
+                                            -1,
+                                            attp->attlen,
+                                            attp->attbyval,
+                                            attp->attalign,
+                                            &is_null);
+                Assert(!is_null);
+                if (attp->attbyval)
+                {
+                    /* for copy by val just copy the datum direct */
+                    attrmiss[attnum - 1].ammissing = missval;
+                }
+                else
+                {
+                    /* otherwise copy in the correct context */
+                    oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+                    attrmiss[attnum - 1].ammissing = datumCopy(missval,
+                                                               attp->attbyval,
+                                                               attp->attlen);
+                    MemoryContextSwitchTo(oldcxt);
+                }
+                attrmiss[attnum - 1].ammissingPresent = true;
+            }
+        }
         need--;
         if (need == 0)
             break;

I'm still strongly object to do doing this unconditionally for queries
that never need any of this.  We're can end up with a significant number
of large tuples in memory here, and then copy that through dozens of
tupledesc copies in queries.


@@ -167,6 +170,12 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
 
     /* Column-level FDW options */
     text        attfdwoptions[1] BKI_DEFAULT(_null_);
+
+    /*
+     * Missing value for added columns. This is a one element array which lets
+     * us store a value of the attribute type here.
+     */
+    anyarray    attmissingval BKI_DEFAULT(_null_);
 #endif
 } FormData_pg_attribute;
 

Still think this is a bad location, and it'll reduce cache hit ratio for
catalog lookups.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

29 марта 2018 г., 02:46:23

On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
>> Thanks for this, all looks good. Here is the consolidate patch
>> rebased. If there are no further comments I propose to commit this in
>> a few days time.
>
> Here's a bit of post commit review:
>
> @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
>
>         /*
>          * If tuple doesn't have all the atts indicated by tupleDesc, read the
> -        * rest as null
> +        * rest as NULLs or missing values
>          */
> -       for (; attno < attnum; attno++)
> -       {
> -               slot->tts_values[attno] = (Datum) 0;
> -               slot->tts_isnull[attno] = true;
> -       }
> +       if (attno < attnum)
> +               slot_getmissingattrs(slot, attno, attnum);
> +
>         slot->tts_nvalid = attnum;
>  }
>
> It's worthwhile to note that this'll re-process *all* missing values,
> even if they've already been deformed. I.e. if
> slot_getmissingattrs(.., first-missing)
> slot_getmissingattrs(.., first-missing + 1)
> slot_getmissingattrs(.., first-missing + 2)
> is called, all three missing values will be copied every time. That's
> because tts_nvalid isn't taken into account.  I wonder if slot_getmissingattrs
> could take tts_nvalid into account?
>
> I also wonder if this doesn't deserve an unlikely(), to avoid the cost
> of spilling registers in the hot branch of not missing any values.
>
>


One of us at least is very confused about this function.
slot_getmissingattrs() gets called at most once by slot_getsomeattrs
and never for any attribute that's already been deformed.



> +       else
> +       {
> +               /* if there is a missing values array we must process them one by one */
> +               for (missattnum = lastAttNum - 1;
> +                        missattnum >= startAttNum;
> +                        missattnum--)
> +               {
> +                       slot->tts_values[missattnum] = attrmiss[missattnum].ammissing;
> +                       slot->tts_isnull[missattnum] =
> +                               !attrmiss[missattnum].ammissingPresent;
> +               }
> +       }
> +}
>
> Why is this done backwards? It's noticeably slower to walk arrays
> backwards on some CPU microarchitectures.
>
>

I'll change it.


>
> @@ -176,6 +179,23 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
>                         }
>                 }
>
>
>
> @@ -469,6 +503,29 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
>                         if (strcmp(defval1->adbin, defval2->adbin) != 0)
>                                 return false;
>                 }
> +               if (constr1->missing)
> +               {
> +                       if (!constr2->missing)
> +                               return false;
> +                       for (i = 0; i < tupdesc1->natts; i++)
> +                       {
> +                               AttrMissing *missval1 = constr1->missing + i;
> +                               AttrMissing *missval2 = constr2->missing + i;
>
> It's a bit odd to not use array indexing here?
>


*shrug* Maybe. I'll change it if you like, doesn't seem that important or odd.

>
> @@ -625,7 +625,15 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot)
>         if (slot->tts_mintuple)
>                 return heap_copy_minimal_tuple(slot->tts_mintuple);
>         if (slot->tts_tuple)
> -               return minimal_tuple_from_heap_tuple(slot->tts_tuple);
> +       {
> +               if (TTS_HAS_PHYSICAL_TUPLE(slot) &&
> +                       HeapTupleHeaderGetNatts(slot->tts_tuple->t_data)
> +                       < slot->tts_tupleDescriptor->natts)
> +                       return minimal_expand_tuple(slot->tts_tuple,
> +                                                                               slot->tts_tupleDescriptor);
> +               else
> +                       return minimal_tuple_from_heap_tuple(slot->tts_tuple);
> +       }
>
>
> What's the TTS_HAS_PHYSICAL_TUPLE doing here? How can this be relevant
> given the previous tts_mintuple check? Am I missing something?
>
>

Hmm, that dates back to the original patch. My bad, I should have
picked that up. I'll just remove it.


>
> @@ -563,10 +569,63 @@ RelationBuildTupleDesc(Relation relation)
>                                         MemoryContextAllocZero(CacheMemoryContext,
>                                                                                    relation->rd_rel->relnatts *
>                                                                                    sizeof(AttrDefault));
> -                       attrdef[ndef].adnum = attp->attnum;
> +                       attrdef[ndef].adnum = attnum;
>                         attrdef[ndef].adbin = NULL;
> +
>                         ndef++;
>                 }
> +
> +               /* Likewise for a missing value */
> +               if (attp->atthasmissing)
> +               {
> +                       Datum           missingval;
> +                       bool            missingNull;
> +
> +                       /* Do we have a missing value? */
> +                       missingval = heap_getattr(pg_attribute_tuple,
> +                                                                         Anum_pg_attribute_attmissingval,
> +                                                                         pg_attribute_desc->rd_att,
> +                                                                         &missingNull);
> +                       if (!missingNull)
> +                       {
> +                               /* Yes, fetch from the array */
> +                               MemoryContext oldcxt;
> +                               bool            is_null;
> +                               int                     one = 1;
> +                               Datum           missval;
> +
> +                               if (attrmiss == NULL)
> +                                       attrmiss = (AttrMissing *)
> +                                               MemoryContextAllocZero(CacheMemoryContext,
> +                                                                                          relation->rd_rel->relnatts
*
> +                                                                                          sizeof(AttrMissing));
> +
> +                               missval = array_get_element(missingval,
> +                                                                                       1,
> +                                                                                       &one,
> +                                                                                       -1,
> +                                                                                       attp->attlen,
> +                                                                                       attp->attbyval,
> +                                                                                       attp->attalign,
> +                                                                                       &is_null);
> +                               Assert(!is_null);
> +                               if (attp->attbyval)
> +                               {
> +                                       /* for copy by val just copy the datum direct */
> +                                       attrmiss[attnum - 1].ammissing = missval;
> +                               }
> +                               else
> +                               {
> +                                       /* otherwise copy in the correct context */
> +                                       oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> +                                       attrmiss[attnum - 1].ammissing = datumCopy(missval,
> +
    attp->attbyval, 
> +
    attp->attlen);
 
> +                                       MemoryContextSwitchTo(oldcxt);
> +                               }
> +                               attrmiss[attnum - 1].ammissingPresent = true;
> +                       }
> +               }
>                 need--;
>                 if (need == 0)
>                         break;
>
> I'm still strongly object to do doing this unconditionally for queries
> that never need any of this.  We're can end up with a significant number
> of large tuples in memory here, and then copy that through dozens of
> tupledesc copies in queries.
>
>

We're just doing the same thing we do for default values.

None of the tests I did with large numbers of missing values seemed to
show performance impacts of the kind you describe. Now, none of the
queries were particularly complex, but the worst case was from
actually using very large numbers of attributes with missing values,
where we'd need this data anyway. If we just selected a few attributes
performance went up rather than down.


> @@ -167,6 +170,12 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
>
>         /* Column-level FDW options */
>         text            attfdwoptions[1] BKI_DEFAULT(_null_);
> +
> +       /*
> +        * Missing value for added columns. This is a one element array which lets
> +        * us store a value of the attribute type here.
> +        */
> +       anyarray        attmissingval BKI_DEFAULT(_null_);
>  #endif
>  } FormData_pg_attribute;
>
>
> Still think this is a bad location, and it'll reduce cache hit ratio for
> catalog lookups.
>


As I think I mentioned before, this was discussed previously and as I
understood it this was the consensus location for it.

pg_attrdef isn't really a good place for it (what if they drop the
default?). So the only alternative then would be a completely new
catalog. I'd need a deal of convincing that that was justified.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

29 марта 2018 г., 02:49:59

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote:
>> +       /*
>> +        * Missing value for added columns. This is a one element array which lets
>> +        * us store a value of the attribute type here.
>> +        */
>> +       anyarray        attmissingval BKI_DEFAULT(_null_);
>> #endif
>> } FormData_pg_attribute;
>>
>> Still think this is a bad location, and it'll reduce cache hit ratio for
>> catalog lookups.

> As I think I mentioned before, this was discussed previously and as I
> understood it this was the consensus location for it.

I don't have a problem with putting that in pg_attribute (and I certainly
agree with not putting it in pg_attrdef).  But "anyarray" seems like a
damn strange, and bulky, choice.  Why not just make it a bytea holding the
bits for the value, nothing more?

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

29 марта 2018 г., 04:27:55

On Thu, Mar 29, 2018 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote:
>>> +       /*
>>> +        * Missing value for added columns. This is a one element array which lets
>>> +        * us store a value of the attribute type here.
>>> +        */
>>> +       anyarray        attmissingval BKI_DEFAULT(_null_);
>>> #endif
>>> } FormData_pg_attribute;
>>>
>>> Still think this is a bad location, and it'll reduce cache hit ratio for
>>> catalog lookups.
>
>> As I think I mentioned before, this was discussed previously and as I
>> understood it this was the consensus location for it.
>
> I don't have a problem with putting that in pg_attribute (and I certainly
> agree with not putting it in pg_attrdef).  But "anyarray" seems like a
> damn strange, and bulky, choice.  Why not just make it a bytea holding the
> bits for the value, nothing more?
>


That's what we started with and Andres suggested we change it. One
virtue of the array is that is displays nicely when examining
pg_attribute. But changing it back would be easy enough.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Haribabu Kommi

Дата:

29 марта 2018 г., 05:07:09

On Mon, Mar 26, 2018 at 9:32 AM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

Thanks for this, all looks good. Here is the consolidate patch
rebased. If there are no further comments I propose to commit this in
a few days time.

I have some comments with the committed patch.

@@ -663,7 +671,23 @@ ExecFetchSlotTuple(TupleTableSlot *slot)

* If we have a regular physical tuple then just return it.

if (TTS_HAS_PHYSICAL_TUPLE(slot))

- return slot->tts_tuple;

+ {

+ if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) <

+ slot->tts_tupleDescriptor->natts)

+ {

+ MemoryContext oldContext = MemoryContextSwitchTo(slot->tts_mcxt);

+ slot->tts_tuple = heap_expand_tuple(slot->tts_tuple,

+ slot->tts_tupleDescriptor);

+ slot->tts_shouldFree = true;

+ MemoryContextSwitchTo(oldContext);

+ return slot->tts_tuple;

+ }

+ else

+ {

+ return slot->tts_tuple;

+ }

In the above scenario, directly replacing the slot->tts_tuple without freeing the exisitng

tuple will unnecessarily increase the slot context memory size, this may lead to a problem

if the same slot is used for many tuples. Better to use ExecStoreTuple() function to update

the slot tuple.

Regards,

Hari Babu

Fujitsu Australia

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

29 марта 2018 г., 21:30:50

Hi,

On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote:
> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote:
> > Hi,
> >
> > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
> >> Thanks for this, all looks good. Here is the consolidate patch
> >> rebased. If there are no further comments I propose to commit this in
> >> a few days time.
> >
> > Here's a bit of post commit review:
> >
> > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
> >
> >         /*
> >          * If tuple doesn't have all the atts indicated by tupleDesc, read the
> > -        * rest as null
> > +        * rest as NULLs or missing values
> >          */
> > -       for (; attno < attnum; attno++)
> > -       {
> > -               slot->tts_values[attno] = (Datum) 0;
> > -               slot->tts_isnull[attno] = true;
> > -       }
> > +       if (attno < attnum)
> > +               slot_getmissingattrs(slot, attno, attnum);
> > +
> >         slot->tts_nvalid = attnum;
> >  }
> >
> > It's worthwhile to note that this'll re-process *all* missing values,
> > even if they've already been deformed. I.e. if
> > slot_getmissingattrs(.., first-missing)
> > slot_getmissingattrs(.., first-missing + 1)
> > slot_getmissingattrs(.., first-missing + 2)
> > is called, all three missing values will be copied every time. That's
> > because tts_nvalid isn't taken into account.  I wonder if slot_getmissingattrs
> > could take tts_nvalid into account?
> >
> > I also wonder if this doesn't deserve an unlikely(), to avoid the cost
> > of spilling registers in the hot branch of not missing any values.

> One of us at least is very confused about this function.
> slot_getmissingattrs() gets called at most once by slot_getsomeattrs
> and never for any attribute that's already been deformed.

Imagine the same slot being used in two different expressions. The
typical case for that is e.g. something like
  SELECT a,b,c,d FROM foo WHERE b = 1;

this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
evaluation, and then a slot_getsomeattrs(slot, 4) from within the
projection code.  If you then imagine a tuple where only the first
column exists physically, we'd copy b twice, because attno is only going
to be one 1.  I think we might just want to check tts nvalid in
getmissingattrs?


> > I'm still strongly object to do doing this unconditionally for queries
> > that never need any of this.  We're can end up with a significant number
> > of large tuples in memory here, and then copy that through dozens of
> > tupledesc copies in queries.

> We're just doing the same thing we do for default values.

That's really not a defense. It's hurting us there too.


> None of the tests I did with large numbers of missing values seemed to
> show performance impacts of the kind you describe. Now, none of the
> queries were particularly complex, but the worst case was from
> actually using very large numbers of attributes with missing values,
> where we'd need this data anyway. If we just selected a few attributes
> performance went up rather than down.

Now imagine a partitioned workload with a few thousand partitions over a
few tables. The additional memory is going to start being noticeable.


> pg_attrdef isn't really a good place for it (what if they drop the
> default?). So the only alternative then would be a completely new
> catalog. I'd need a deal of convincing that that was justified.

There's plenty databases with pg_attribute being many gigabytes large,
and this is going to make that even worse.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

30 марта 2018 г., 00:27:47

Andres Freund <andres@anarazel.de> writes:
> There's plenty databases with pg_attribute being many gigabytes large,
> and this is going to make that even worse.

Only if you imagine that a sizable fraction of the columns have fast
default values, which seems somewhat unlikely.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

30 марта 2018 г., 00:31:29

Hi,

On 2018-03-29 17:27:47 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > There's plenty databases with pg_attribute being many gigabytes large,
> > and this is going to make that even worse.
> 
> Only if you imagine that a sizable fraction of the columns have fast
> default values, which seems somewhat unlikely.

Why is that unlikely?  In the field it's definitely not uncommon to
define default values for just about every column. And in a lot of cases
that'll mean we'll end up with pg_attribute containing default values
for most columns but the ones defined at table creation. A lot of
frameworks make it a habit to add columns near exclusively in
incremental steps.  You'd only get rid of them if you force an operation
that does a full table rewrite, which often enough is impractical.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Tomas Vondra

Дата:

30 марта 2018 г., 01:28:43


On 03/29/2018 11:31 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-03-29 17:27:47 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> There's plenty databases with pg_attribute being many gigabytes large,
>>> and this is going to make that even worse.
>>
>> Only if you imagine that a sizable fraction of the columns have fast
>> default values, which seems somewhat unlikely.
> 
> Why is that unlikely?  In the field it's definitely not uncommon to
> define default values for just about every column. And in a lot of cases
> that'll mean we'll end up with pg_attribute containing default values
> for most columns but the ones defined at table creation. A lot of
> frameworks make it a habit to add columns near exclusively in
> incremental steps.  You'd only get rid of them if you force an operation
> that does a full table rewrite, which often enough is impractical.
> 

I don't quite see how moving that gets solved by moving the info into a
different catalog? We would need to fetch it whenever attribute
meta-data from pg_attribute are loaded.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

30 марта 2018 г., 01:34:55

Hi,

On 2018-03-30 00:28:43 +0200, Tomas Vondra wrote:
> > Why is that unlikely?  In the field it's definitely not uncommon to
> > define default values for just about every column. And in a lot of cases
> > that'll mean we'll end up with pg_attribute containing default values
> > for most columns but the ones defined at table creation. A lot of
> > frameworks make it a habit to add columns near exclusively in
> > incremental steps.  You'd only get rid of them if you force an operation
> > that does a full table rewrite, which often enough is impractical.
> > 
> 
> I don't quite see how moving that gets solved by moving the info into a
> different catalog? We would need to fetch it whenever attribute
> meta-data from pg_attribute are loaded.

I'm also complaining nearby that the "stored default" values should only
be loaded on demand. We very regularly build relcache entries that'll
never have their default values queried. Even moreso with partitioned
tables.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

30 марта 2018 г., 02:38:54

On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote:
>> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote:
>> > Hi,
>> >
>> > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
>> >> Thanks for this, all looks good. Here is the consolidate patch
>> >> rebased. If there are no further comments I propose to commit this in
>> >> a few days time.
>> >
>> > Here's a bit of post commit review:
>> >
>> > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
>> >
>> >         /*
>> >          * If tuple doesn't have all the atts indicated by tupleDesc, read the
>> > -        * rest as null
>> > +        * rest as NULLs or missing values
>> >          */
>> > -       for (; attno < attnum; attno++)
>> > -       {
>> > -               slot->tts_values[attno] = (Datum) 0;
>> > -               slot->tts_isnull[attno] = true;
>> > -       }
>> > +       if (attno < attnum)
>> > +               slot_getmissingattrs(slot, attno, attnum);
>> > +
>> >         slot->tts_nvalid = attnum;
>> >  }
>> >
>> > It's worthwhile to note that this'll re-process *all* missing values,
>> > even if they've already been deformed. I.e. if
>> > slot_getmissingattrs(.., first-missing)
>> > slot_getmissingattrs(.., first-missing + 1)
>> > slot_getmissingattrs(.., first-missing + 2)
>> > is called, all three missing values will be copied every time. That's
>> > because tts_nvalid isn't taken into account.  I wonder if slot_getmissingattrs
>> > could take tts_nvalid into account?
>> >
>> > I also wonder if this doesn't deserve an unlikely(), to avoid the cost
>> > of spilling registers in the hot branch of not missing any values.
>
>> One of us at least is very confused about this function.
>> slot_getmissingattrs() gets called at most once by slot_getsomeattrs
>> and never for any attribute that's already been deformed.
>
> Imagine the same slot being used in two different expressions. The
> typical case for that is e.g. something like
>   SELECT a,b,c,d FROM foo WHERE b = 1;
>
> this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
> evaluation, and then a slot_getsomeattrs(slot, 4) from within the
> projection code.  If you then imagine a tuple where only the first
> column exists physically, we'd copy b twice, because attno is only going
> to be one 1.  I think we might just want to check tts nvalid in
> getmissingattrs?
>
>


OK. so add something like this to the top of slot_getmissingattrs()?

    startAttNum = Max(startAttNum, slot->tts_nvalid);



>> > I'm still strongly object to do doing this unconditionally for queries
>> > that never need any of this.  We're can end up with a significant number
>> > of large tuples in memory here, and then copy that through dozens of
>> > tupledesc copies in queries.
>
>> We're just doing the same thing we do for default values.
>
> That's really not a defense. It's hurting us there too.
>


I will take a look and see if it can be avoided easily.


>
>> None of the tests I did with large numbers of missing values seemed to
>> show performance impacts of the kind you describe. Now, none of the
>> queries were particularly complex, but the worst case was from
>> actually using very large numbers of attributes with missing values,
>> where we'd need this data anyway. If we just selected a few attributes
>> performance went up rather than down.
>
> Now imagine a partitioned workload with a few thousand partitions over a
> few tables. The additional memory is going to start being noticeable.
>
>
>> pg_attrdef isn't really a good place for it (what if they drop the
>> default?). So the only alternative then would be a completely new
>> catalog. I'd need a deal of convincing that that was justified.
>
> There's plenty databases with pg_attribute being many gigabytes large,
> and this is going to make that even worse.
>


I'll change it if there's a consensus about it, but so far the only
other opinion has been from Tom who doesn't apparently see much of a
problem.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

30 марта 2018 г., 02:40:29

Hi,

On 2018-03-30 10:08:54 +1030, Andrew Dunstan wrote:
> On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote:
> > this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
> > evaluation, and then a slot_getsomeattrs(slot, 4) from within the
> > projection code.  If you then imagine a tuple where only the first
> > column exists physically, we'd copy b twice, because attno is only going
> > to be one 1.  I think we might just want to check tts nvalid in
> > getmissingattrs?

> OK. so add something like this to the top of slot_getmissingattrs()?
> 
>     startAttNum = Max(startAttNum, slot->tts_nvalid);

Yea, I think that should do the trick.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andres Freund

Дата:

30 марта 2018 г., 02:45:17

On 2018-03-29 16:40:29 -0700, Andres Freund wrote:
> Hi,
>
> On 2018-03-30 10:08:54 +1030, Andrew Dunstan wrote:
> > On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote:
> > > this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
> > > evaluation, and then a slot_getsomeattrs(slot, 4) from within the
> > > projection code.  If you then imagine a tuple where only the first
> > > column exists physically, we'd copy b twice, because attno is only going
> > > to be one 1.  I think we might just want to check tts nvalid in
> > > getmissingattrs?
>
> > OK. so add something like this to the top of slot_getmissingattrs()?
> >
> >     startAttNum = Max(startAttNum, slot->tts_nvalid);
>
> Yea, I think that should do the trick.

Hm, or if you want to microoptimize ;):

if (startAttNum < slot->tts_nvalid)
   startAttNum = slot->tts_nvalid

I think that can use cmov (i.e. no visible branch), which Max() probably
can't usefully. Don't think the compiler can figure out that
slot->tts_nvalid cannot be smaller than startAttNum.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

03 апреля 2018 г., 09:14:37

On Fri, Mar 30, 2018 at 10:08 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres@anarazel.de> wrote:

[ missing values are loaded in the TupleDesc by RelationBuildTupleDesc ]

>>> > I'm still strongly object to do doing this unconditionally for queries
>>> > that never need any of this.  We're can end up with a significant number
>>> > of large tuples in memory here, and then copy that through dozens of
>>> > tupledesc copies in queries.
>>
>>> We're just doing the same thing we do for default values.
>>
>> That's really not a defense. It's hurting us there too.
>>
>
>
> I will take a look and see if it can be avoided easily.
>
>

For missing values there are three places where we would need to load
them on demand: getmissingattr(), slot_getmissingattrs() and
expand_tuple(). However, none of those functions have obvious access
to the information required to load the missing values. We could
probably do something very ugly like getting the relid from the
TupleDesc's first attribute. Maybe someone else has a better option.
But I can't see a simple and clean way to do this.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Gierth

Дата:

04 апреля 2021 г., 00:41:24

[warning, reviving a thread from 2018]

>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

 > On Wed, Feb 21, 2018 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote:
 >> Hi,

 Andrew> Comments interspersed.

 >>> @@ -4059,10 +4142,6 @@ AttrDefaultFetch(Relation relation)
 >>> 
 >>> systable_endscan(adscan);
 >>> heap_close(adrel, AccessShareLock);
 >>> -
 >>> -     if (found != ndef)
 >>> -             elog(WARNING, "%d attrdef record(s) missing for rel %s",
 >>> -                      ndef - found, RelationGetRelationName(relation));
 >>> }
 >> 
 >> Hm, it's not obvious why this is a good thing?

I didn't find an answer to this in the thread archive, and the above
change apparently did make it into the final patch.

I just got through diagnosing a SEGV crash with someone on IRC, and the
cause turned out to be exactly this - a table had (for some reason we
could not determine within the available resources) lost its pg_attrdef
record for the one column it had with a default (which was a serial
column, so none of the fast-default code is actually implicated). Any
attempt to alter the table resulted in a crash in equalTupleDesc on this
line:

            if (strcmp(defval1->adbin, defval2->adbin) != 0)

due to trying to compare adbin values which were NULL pointers.

So, questions: why was the check removed in the first place?

(Why was it previously only a warning when it causes a crash further
down the line on any alteration?)

Does equalTupleDesc need to be more defensive about this, or does the
above check need to be reinstated?

(The immediate issue was fixed by "update pg_attribute set
atthasdef=false ..." for the offending attribute and then adding it back
with ALTER TABLE, which seems to have cured the crash.)

-- 
Andrew (irc:RhodiumToad)

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

04 апреля 2021 г., 02:09:51

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> I just got through diagnosing a SEGV crash with someone on IRC, and the
> cause turned out to be exactly this - a table had (for some reason we
> could not determine within the available resources) lost its pg_attrdef
> record for the one column it had with a default (which was a serial
> column, so none of the fast-default code is actually implicated). Any
> attempt to alter the table resulted in a crash in equalTupleDesc on this
> line:
>             if (strcmp(defval1->adbin, defval2->adbin) != 0)
> due to trying to compare adbin values which were NULL pointers.

Ouch.

> Does equalTupleDesc need to be more defensive about this, or does the
> above check need to be reinstated?

Looking around at the other touches of AttrDefault.adbin in the backend
(of which there are not that many), some of them are prepared for it to be
NULL and some are not.  I don't immediately have a strong opinion whether
that should be an allowed state; but if it is not allowed then it's not
okay for AttrDefaultFetch to leave such a state behind.  Alternatively,
if it is allowed then equalTupleDesc is in the wrong.  We should choose
one definition and make all the relevant code match it.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Justin Pryzby

Дата:

04 апреля 2021 г., 13:19:19

It reminded me of this thread, but nothing ever came of it.
https://www.postgresql.org/message-id/20200328223052.GK20103%40telsasoft.com


https://www.postgresql.org/message-id/87pmzaq4gx.fsf%40news-spur.riddles.org.uk

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

04 апреля 2021 г., 15:08:59

On 4/3/21 10:09 PM, Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> I just got through diagnosing a SEGV crash with someone on IRC, and the
>> cause turned out to be exactly this - a table had (for some reason we
>> could not determine within the available resources) lost its pg_attrdef
>> record for the one column it had with a default (which was a serial
>> column, so none of the fast-default code is actually implicated). 


I don't recall why the check was removed. In general the fast default
code doesn't touch adbin.


I'd be curious to know how this state came about. From the description
it sounds like something removed the pg_attrdef entry without adjusting
pg_attribute, which shouldn't happen.


>> Any
>> attempt to alter the table resulted in a crash in equalTupleDesc on this
>> line:
>>             if (strcmp(defval1->adbin, defval2->adbin) != 0)
>> due to trying to compare adbin values which were NULL pointers.
> Ouch.
>
>> Does equalTupleDesc need to be more defensive about this, or does the
>> above check need to be reinstated?
> Looking around at the other touches of AttrDefault.adbin in the backend
> (of which there are not that many), some of them are prepared for it to be
> NULL and some are not.  I don't immediately have a strong opinion whether
> that should be an allowed state; but if it is not allowed then it's not
> okay for AttrDefaultFetch to leave such a state behind.  Alternatively,
> if it is allowed then equalTupleDesc is in the wrong.  We should choose
> one definition and make all the relevant code match it.
>
>             


There's already a warning if it sets an explicit NULL value, so I'm
inclined to say your suggestion "it's not okay for AttrDefaultFetch to
leave such a state behind" is what we should go with.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

04 апреля 2021 г., 15:21:59

On 4/4/21 9:19 AM, Justin Pryzby wrote:
> It reminded me of this thread, but nothing ever came of it.
> https://www.postgresql.org/message-id/20200328223052.GK20103%40telsasoft.com
>
>
> https://www.postgresql.org/message-id/87pmzaq4gx.fsf%40news-spur.riddles.org.uk
>
>

I don't recall seeing this. Around that time I was busy returning from
Australia at the start of the pandemic, so my I might have missed it in
the hubbub.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

04 апреля 2021 г., 16:05:24

Andrew Dunstan <andrew@dunslane.net> writes:
> On 4/3/21 10:09 PM, Tom Lane wrote:
>> Looking around at the other touches of AttrDefault.adbin in the backend
>> (of which there are not that many), some of them are prepared for it to be
>> NULL and some are not.  I don't immediately have a strong opinion whether
>> that should be an allowed state; but if it is not allowed then it's not
>> okay for AttrDefaultFetch to leave such a state behind.  Alternatively,
>> if it is allowed then equalTupleDesc is in the wrong.  We should choose
>> one definition and make all the relevant code match it.

> There's already a warning if it sets an explicit NULL value, so I'm
> inclined to say your suggestion "it's not okay for AttrDefaultFetch to
> leave such a state behind" is what we should go with.

Yeah, I came to the same conclusion after looking around a bit more.
There are two places in tupdesc.c that defend against adbin being NULL,
which seems a bit silly when their sibling equalTupleDesc does not.
Every other touch in the backend will segfault on a NULL value,
if it doesn't Assert first :-(

Another thing that annoyed me while I was looking at this is the
potentially O(N^2) behavior in equalTupleDesc due to not wanting
to assume that the AttrDefault arrays are in the same order.
It seems far smarter to have AttrDefaultFetch sort the arrays.

So attached is a proposed patch to clean all this up.

* Don't link the AttrDefault array into the relcache entry until
it's fully built and valid.

* elog, don't just Assert, if we fail to find an expected default
value.  (Perhaps there's a case for ereport instead.)

* Remove now-useless null checks in tupdesc.c.

* Sort the AttrDefault array, remove double loops in equalTupleDesc.

I made CheckConstraintFetch likewise not install its array until
it's done.  I notice that it is throwing elog(ERROR) not WARNING
for the equivalent cases of not finding the right number of
entries.  I wonder if we ought to back that off to a WARNING too.
elog(ERROR) during relcache entry load is kinda nasty, because
it prevents essentially *all* use of the relation.  On the other
hand, failing to enforce check constraints isn't lovely either.
Anybody have opinions about that?

            regards, tom lane

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index cb76465050..d81f6b8a60 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -174,10 +174,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
             cpy->defval = (AttrDefault *) palloc(cpy->num_defval * sizeof(AttrDefault));
             memcpy(cpy->defval, constr->defval, cpy->num_defval * sizeof(AttrDefault));
             for (i = cpy->num_defval - 1; i >= 0; i--)
-            {
-                if (constr->defval[i].adbin)
-                    cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin);
-            }
+                cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin);
         }

         if (constr->missing)
@@ -328,10 +325,7 @@ FreeTupleDesc(TupleDesc tupdesc)
             AttrDefault *attrdef = tupdesc->constr->defval;

             for (i = tupdesc->constr->num_defval - 1; i >= 0; i--)
-            {
-                if (attrdef[i].adbin)
-                    pfree(attrdef[i].adbin);
-            }
+                pfree(attrdef[i].adbin);
             pfree(attrdef);
         }
         if (tupdesc->constr->missing)
@@ -412,7 +406,6 @@ bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 {
     int            i,
-                j,
                 n;

     if (tupdesc1->natts != tupdesc2->natts)
@@ -488,22 +481,13 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
         n = constr1->num_defval;
         if (n != (int) constr2->num_defval)
             return false;
+        /* We assume here that both AttrDefault arrays are in adnum order */
         for (i = 0; i < n; i++)
         {
             AttrDefault *defval1 = constr1->defval + i;
-            AttrDefault *defval2 = constr2->defval;
-
-            /*
-             * We can't assume that the items are always read from the system
-             * catalogs in the same order; so use the adnum field to identify
-             * the matching item to compare.
-             */
-            for (j = 0; j < n; defval2++, j++)
-            {
-                if (defval1->adnum == defval2->adnum)
-                    break;
-            }
-            if (j >= n)
+            AttrDefault *defval2 = constr2->defval + i;
+
+            if (defval1->adnum != defval2->adnum)
                 return false;
             if (strcmp(defval1->adbin, defval2->adbin) != 0)
                 return false;
@@ -534,25 +518,21 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
         n = constr1->num_check;
         if (n != (int) constr2->num_check)
             return false;
+
+        /*
+         * Similarly, we rely here on the ConstrCheck entries being sorted by
+         * name.  If there are duplicate names, the outcome of the comparison
+         * is uncertain, but that should not happen.
+         */
         for (i = 0; i < n; i++)
         {
             ConstrCheck *check1 = constr1->check + i;
-            ConstrCheck *check2 = constr2->check;
-
-            /*
-             * Similarly, don't assume that the checks are always read in the
-             * same order; match them up by name and contents. (The name
-             * *should* be unique, but...)
-             */
-            for (j = 0; j < n; check2++, j++)
-            {
-                if (strcmp(check1->ccname, check2->ccname) == 0 &&
-                    strcmp(check1->ccbin, check2->ccbin) == 0 &&
-                    check1->ccvalid == check2->ccvalid &&
-                    check1->ccnoinherit == check2->ccnoinherit)
-                    break;
-            }
-            if (j >= n)
+            ConstrCheck *check2 = constr2->check + i;
+
+            if (!(strcmp(check1->ccname, check2->ccname) == 0 &&
+                  strcmp(check1->ccbin, check2->ccbin) == 0 &&
+                  check1->ccvalid == check2->ccvalid &&
+                  check1->ccnoinherit == check2->ccnoinherit))
                 return false;
         }
     }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 88a68a4697..87f9bdaef0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2501,21 +2501,24 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
             if (attribute->atthasdef)
             {
                 Node       *this_default = NULL;
-                AttrDefault *attrdef;
-                int            i;

                 /* Find default in constraint structure */
-                Assert(constr != NULL);
-                attrdef = constr->defval;
-                for (i = 0; i < constr->num_defval; i++)
+                if (constr != NULL)
                 {
-                    if (attrdef[i].adnum == parent_attno)
+                    AttrDefault *attrdef = constr->defval;
+
+                    for (int i = 0; i < constr->num_defval; i++)
                     {
-                        this_default = stringToNode(attrdef[i].adbin);
-                        break;
+                        if (attrdef[i].adnum == parent_attno)
+                        {
+                            this_default = stringToNode(attrdef[i].adbin);
+                            break;
+                        }
                     }
                 }
-                Assert(this_default != NULL);
+                if (this_default == NULL)
+                    elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                         parent_attno, RelationGetRelationName(relation));

                 /*
                  * If it's a GENERATED default, it might contain Vars that
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index b968c25dd6..2c57b200f8 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1276,7 +1276,9 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause)
                         break;
                     }
                 }
-                Assert(this_default != NULL);
+                if (this_default == NULL)
+                    elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                         parent_attno, RelationGetRelationName(relation));

                 atsubcmd = makeNode(AlterTableCmd);
                 atsubcmd->subtype = AT_CookedColumnDefault;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 92661abae2..531edaacdb 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1247,6 +1247,9 @@ build_column_default(Relation rel, int attrno)
                 break;
             }
         }
+        if (expr == NULL)
+            elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                 attrno, RelationGetRelationName(rel));
     }

     /*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index ff7395c85b..b05026f1c5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -282,7 +282,8 @@ static void RelationInitPhysicalAddr(Relation relation);
 static void load_critical_index(Oid indexoid, Oid heapoid);
 static TupleDesc GetPgClassDescriptor(void);
 static TupleDesc GetPgIndexDescriptor(void);
-static void AttrDefaultFetch(Relation relation);
+static void AttrDefaultFetch(Relation relation, int ndef);
+static int    AttrDefaultCmp(const void *a, const void *b);
 static void CheckConstraintFetch(Relation relation);
 static int    CheckConstraintCmp(const void *a, const void *b);
 static void InitIndexAmRoutine(Relation relation);
@@ -503,7 +504,6 @@ RelationBuildTupleDesc(Relation relation)
     ScanKeyData skey[2];
     int            need;
     TupleConstr *constr;
-    AttrDefault *attrdef = NULL;
     AttrMissing *attrmiss = NULL;
     int            ndef = 0;

@@ -512,8 +512,8 @@ RelationBuildTupleDesc(Relation relation)
         relation->rd_rel->reltype ? relation->rd_rel->reltype : RECORDOID;
     relation->rd_att->tdtypmod = -1;    /* just to be sure */

-    constr = (TupleConstr *) MemoryContextAlloc(CacheMemoryContext,
-                                                sizeof(TupleConstr));
+    constr = (TupleConstr *) MemoryContextAllocZero(CacheMemoryContext,
+                                                    sizeof(TupleConstr));
     constr->has_not_null = false;
     constr->has_generated_stored = false;

@@ -560,7 +560,6 @@ RelationBuildTupleDesc(Relation relation)
             elog(ERROR, "invalid attribute number %d for %s",
                  attp->attnum, RelationGetRelationName(relation));

-
         memcpy(TupleDescAttr(relation->rd_att, attnum - 1),
                attp,
                ATTRIBUTE_FIXED_PART_SIZE);
@@ -570,22 +569,10 @@ RelationBuildTupleDesc(Relation relation)
             constr->has_not_null = true;
         if (attp->attgenerated == ATTRIBUTE_GENERATED_STORED)
             constr->has_generated_stored = true;
-
-        /* If the column has a default, fill it into the attrdef array */
         if (attp->atthasdef)
-        {
-            if (attrdef == NULL)
-                attrdef = (AttrDefault *)
-                    MemoryContextAllocZero(CacheMemoryContext,
-                                           RelationGetNumberOfAttributes(relation) *
-                                           sizeof(AttrDefault));
-            attrdef[ndef].adnum = attnum;
-            attrdef[ndef].adbin = NULL;
-
             ndef++;
-        }

-        /* Likewise for a missing value */
+        /* If the column has a "missing" value, put it in the attrmiss array */
         if (attp->atthasmissing)
         {
             Datum        missingval;
@@ -680,33 +667,19 @@ RelationBuildTupleDesc(Relation relation)
         constr->has_generated_stored ||
         ndef > 0 ||
         attrmiss ||
-        relation->rd_rel->relchecks)
+        relation->rd_rel->relchecks > 0)
     {
         relation->rd_att->constr = constr;

         if (ndef > 0)            /* DEFAULTs */
-        {
-            if (ndef < RelationGetNumberOfAttributes(relation))
-                constr->defval = (AttrDefault *)
-                    repalloc(attrdef, ndef * sizeof(AttrDefault));
-            else
-                constr->defval = attrdef;
-            constr->num_defval = ndef;
-            AttrDefaultFetch(relation);
-        }
+            AttrDefaultFetch(relation, ndef);
         else
             constr->num_defval = 0;

         constr->missing = attrmiss;

         if (relation->rd_rel->relchecks > 0)    /* CHECKs */
-        {
-            constr->num_check = relation->rd_rel->relchecks;
-            constr->check = (ConstrCheck *)
-                MemoryContextAllocZero(CacheMemoryContext,
-                                       constr->num_check * sizeof(ConstrCheck));
             CheckConstraintFetch(relation);
-        }
         else
             constr->num_check = 0;
     }
@@ -4251,21 +4224,28 @@ GetPgIndexDescriptor(void)

 /*
  * Load any default attribute value definitions for the relation.
+ *
+ * ndef is the number of attributes that were marked atthasdef.
+ *
+ * Note: we don't make it a hard error to be missing some pg_attrdef records.
+ * We can limp along as long as no INSERT or UPDATE needs to use the entry.
  */
 static void
-AttrDefaultFetch(Relation relation)
+AttrDefaultFetch(Relation relation, int ndef)
 {
-    AttrDefault *attrdef = relation->rd_att->constr->defval;
-    int            ndef = relation->rd_att->constr->num_defval;
+    AttrDefault *attrdef;
     Relation    adrel;
     SysScanDesc adscan;
     ScanKeyData skey;
     HeapTuple    htup;
-    Datum        val;
-    bool        isnull;
-    int            found;
-    int            i;
+    int            found = 0;
+
+    /* Allocate array with room for as many entries as expected */
+    attrdef = (AttrDefault *)
+        MemoryContextAllocZero(CacheMemoryContext,
+                               ndef * sizeof(AttrDefault));

+    /* Search pg_attrdef for relevant entries */
     ScanKeyInit(&skey,
                 Anum_pg_attrdef_adrelid,
                 BTEqualStrategyNumber, F_OIDEQ,
@@ -4274,49 +4254,69 @@ AttrDefaultFetch(Relation relation)
     adrel = table_open(AttrDefaultRelationId, AccessShareLock);
     adscan = systable_beginscan(adrel, AttrDefaultIndexId, true,
                                 NULL, 1, &skey);
-    found = 0;

     while (HeapTupleIsValid(htup = systable_getnext(adscan)))
     {
         Form_pg_attrdef adform = (Form_pg_attrdef) GETSTRUCT(htup);
-        Form_pg_attribute attr = TupleDescAttr(relation->rd_att, adform->adnum - 1);
+        Datum        val;
+        bool        isnull;

-        for (i = 0; i < ndef; i++)
+        /* protect limited size of array */
+        if (found >= ndef)
         {
-            if (adform->adnum != attrdef[i].adnum)
-                continue;
-            if (attrdef[i].adbin != NULL)
-                elog(WARNING, "multiple attrdef records found for attr %s of rel %s",
-                     NameStr(attr->attname),
-                     RelationGetRelationName(relation));
-            else
-                found++;
-
-            val = fastgetattr(htup,
-                              Anum_pg_attrdef_adbin,
-                              adrel->rd_att, &isnull);
-            if (isnull)
-                elog(WARNING, "null adbin for attr %s of rel %s",
-                     NameStr(attr->attname),
-                     RelationGetRelationName(relation));
-            else
-            {
-                /* detoast and convert to cstring in caller's context */
-                char       *s = TextDatumGetCString(val);
-
-                attrdef[i].adbin = MemoryContextStrdup(CacheMemoryContext, s);
-                pfree(s);
-            }
+            elog(WARNING, "unexpected attrdef record found for attr %d of rel %s",
+                 adform->adnum, RelationGetRelationName(relation));
             break;
         }

-        if (i >= ndef)
-            elog(WARNING, "unexpected attrdef record found for attr %d of rel %s",
+        val = fastgetattr(htup,
+                          Anum_pg_attrdef_adbin,
+                          adrel->rd_att, &isnull);
+        if (isnull)
+            elog(WARNING, "null adbin for attr %d of rel %s",
                  adform->adnum, RelationGetRelationName(relation));
+        else
+        {
+            /* detoast and convert to cstring in caller's context */
+            char       *s = TextDatumGetCString(val);
+
+            attrdef[found].adnum = adform->adnum;
+            attrdef[found].adbin = MemoryContextStrdup(CacheMemoryContext, s);
+            found++;
+            pfree(s);
+        }
     }

     systable_endscan(adscan);
     table_close(adrel, AccessShareLock);
+
+    if (found != ndef)
+        elog(WARNING, "%d attrdef record(s) missing for rel %s",
+             ndef - found, RelationGetRelationName(relation));
+
+    /*
+     * Sort the AttrDefault entries by adnum, for the convenience of
+     * equalTupleDescs().  (Usually, they already will be in order, but this
+     * might not be so if systable_getnext isn't using an index.)
+     */
+    if (found > 1)
+        qsort(attrdef, found, sizeof(AttrDefault), AttrDefaultCmp);
+
+    /* Install array only after it's fully valid */
+    relation->rd_att->constr->defval = attrdef;
+    relation->rd_att->constr->num_defval = found;
+}
+
+/*
+ * qsort comparator to sort AttrDefault entries by adnum
+ */
+static int
+AttrDefaultCmp(const void *a, const void *b)
+{
+    const AttrDefault *ada = (const AttrDefault *) a;
+    const AttrDefault *adb = (const AttrDefault *) b;
+
+    return ada->adnum - adb->adnum;
 }

 /*
@@ -4325,14 +4325,20 @@ AttrDefaultFetch(Relation relation)
 static void
 CheckConstraintFetch(Relation relation)
 {
-    ConstrCheck *check = relation->rd_att->constr->check;
-    int            ncheck = relation->rd_att->constr->num_check;
+    ConstrCheck *check;
+    int            ncheck = relation->rd_rel->relchecks;
     Relation    conrel;
     SysScanDesc conscan;
     ScanKeyData skey[1];
     HeapTuple    htup;
     int            found = 0;

+    /* Allocate array with room for as many entries as expected */
+    check = (ConstrCheck *)
+        MemoryContextAllocZero(CacheMemoryContext,
+                               ncheck * sizeof(ConstrCheck));
+
+    /* Search pg_constraint for relevant entries */
     ScanKeyInit(&skey[0],
                 Anum_pg_constraint_conrelid,
                 BTEqualStrategyNumber, F_OIDEQ,
@@ -4353,6 +4359,7 @@ CheckConstraintFetch(Relation relation)
         if (conform->contype != CONSTRAINT_CHECK)
             continue;

+        /* protect limited size of array */
         if (found >= ncheck)
             elog(ERROR, "unexpected constraint record found for rel %s",
                  RelationGetRelationName(relation));
@@ -4385,9 +4392,16 @@ CheckConstraintFetch(Relation relation)
         elog(ERROR, "%d constraint record(s) missing for rel %s",
              ncheck - found, RelationGetRelationName(relation));

-    /* Sort the records so that CHECKs are applied in a deterministic order */
-    if (ncheck > 1)
-        qsort(check, ncheck, sizeof(ConstrCheck), CheckConstraintCmp);
+    /*
+     * Sort the records by name.  This ensures that CHECKs are applied in a
+     * deterministic order, and it also makes equalTupleDescs() faster.
+     */
+    if (found > 1)
+        qsort(check, found, sizeof(ConstrCheck), CheckConstraintCmp);
+
+    /* Install array only after it's fully valid */
+    relation->rd_att->constr->check = check;
+    relation->rd_att->constr->num_check = found;
 }

 /*

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

04 апреля 2021 г., 17:01:16

On 4/4/21 12:05 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 4/3/21 10:09 PM, Tom Lane wrote:
>>> Looking around at the other touches of AttrDefault.adbin in the backend
>>> (of which there are not that many), some of them are prepared for it to be
>>> NULL and some are not.  I don't immediately have a strong opinion whether
>>> that should be an allowed state; but if it is not allowed then it's not
>>> okay for AttrDefaultFetch to leave such a state behind.  Alternatively,
>>> if it is allowed then equalTupleDesc is in the wrong.  We should choose
>>> one definition and make all the relevant code match it.
>> There's already a warning if it sets an explicit NULL value, so I'm
>> inclined to say your suggestion "it's not okay for AttrDefaultFetch to
>> leave such a state behind" is what we should go with.
> Yeah, I came to the same conclusion after looking around a bit more.
> There are two places in tupdesc.c that defend against adbin being NULL,
> which seems a bit silly when their sibling equalTupleDesc does not.
> Every other touch in the backend will segfault on a NULL value,
> if it doesn't Assert first :-(
>
> Another thing that annoyed me while I was looking at this is the
> potentially O(N^2) behavior in equalTupleDesc due to not wanting
> to assume that the AttrDefault arrays are in the same order.
> It seems far smarter to have AttrDefaultFetch sort the arrays.


+1 The O(N^2) loops also bothered me.


>
> So attached is a proposed patch to clean all this up.
>
> * Don't link the AttrDefault array into the relcache entry until
> it's fully built and valid.
>
> * elog, don't just Assert, if we fail to find an expected default
> value.  (Perhaps there's a case for ereport instead.)
>
> * Remove now-useless null checks in tupdesc.c.
>
> * Sort the AttrDefault array, remove double loops in equalTupleDesc.


This all looks a lot cleaner and simpler to follow. I like not
allocating the array until AttrDefaultFetch.


>
> I made CheckConstraintFetch likewise not install its array until
> it's done.  I notice that it is throwing elog(ERROR) not WARNING
> for the equivalent cases of not finding the right number of
> entries.  I wonder if we ought to back that off to a WARNING too.
> elog(ERROR) during relcache entry load is kinda nasty, because
> it prevents essentially *all* use of the relation.  On the other
> hand, failing to enforce check constraints isn't lovely either.
> Anybody have opinions about that?



None of this is supposed to happen, is it? If you can't fetch the
constraints (and I think of a default as almost a kind of constraint)
your database is already somehow corrupted. So maybe if any of these
things happen we should ERROR out. Anything else seems likely to corrupt
the database further.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Zhihong Yu

Дата:

04 апреля 2021 г., 17:31:06

Hi,

Thanks for the cleanup.

if (found != ncheck)
elog(ERROR, "%d constraint record(s) missing for rel %s",
ncheck - found, RelationGetRelationName(relation));

Since there is check on found being smaller than ncheck inside the loop, the if condition can be written as:
if (found < ncheck)

Actually the above is implied by the expression, ncheck - found.

One minor comment w.r.t. the variable name is that, found is normally a bool variable.
Maybe rename the variable nfound which would be clearer in its meaning.

+ if (found != ndef)
+ elog(WARNING, "%d attrdef record(s) missing for rel %s",
+ ndef - found, RelationGetRelationName(relation));

Since only warning is logged, there seems to be some wasted space in attrdef. Would such space accumulate, resulting in some memory leak ?
I think the attrdef should be copied to another array of the proper size so that there is no chance of memory leak.

Thanks

On Sun, Apr 4, 2021 at 9:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:
> On 4/3/21 10:09 PM, Tom Lane wrote:
>> Looking around at the other touches of AttrDefault.adbin in the backend
>> (of which there are not that many), some of them are prepared for it to be
>> NULL and some are not. I don't immediately have a strong opinion whether
>> that should be an allowed state; but if it is not allowed then it's not
>> okay for AttrDefaultFetch to leave such a state behind. Alternatively,
>> if it is allowed then equalTupleDesc is in the wrong. We should choose
>> one definition and make all the relevant code match it.

> There's already a warning if it sets an explicit NULL value, so I'm
> inclined to say your suggestion "it's not okay for AttrDefaultFetch to
> leave such a state behind" is what we should go with.

Yeah, I came to the same conclusion after looking around a bit more.
There are two places in tupdesc.c that defend against adbin being NULL,
which seems a bit silly when their sibling equalTupleDesc does not.
Every other touch in the backend will segfault on a NULL value,
if it doesn't Assert first :-(

Another thing that annoyed me while I was looking at this is the
potentially O(N^2) behavior in equalTupleDesc due to not wanting
to assume that the AttrDefault arrays are in the same order.
It seems far smarter to have AttrDefaultFetch sort the arrays.

So attached is a proposed patch to clean all this up.

* Don't link the AttrDefault array into the relcache entry until
it's fully built and valid.

* elog, don't just Assert, if we fail to find an expected default
value. (Perhaps there's a case for ereport instead.)

* Remove now-useless null checks in tupdesc.c.

* Sort the AttrDefault array, remove double loops in equalTupleDesc.

I made CheckConstraintFetch likewise not install its array until
it's done. I notice that it is throwing elog(ERROR) not WARNING
for the equivalent cases of not finding the right number of
entries. I wonder if we ought to back that off to a WARNING too.
elog(ERROR) during relcache entry load is kinda nasty, because
it prevents essentially *all* use of the relation. On the other
hand, failing to enforce check constraints isn't lovely either.
Anybody have opinions about that?

regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

04 апреля 2021 г., 18:18:34

On 4/4/21 11:21 AM, Andrew Dunstan wrote:
> On 4/4/21 9:19 AM, Justin Pryzby wrote:
>> It reminded me of this thread, but nothing ever came of it.
>> https://www.postgresql.org/message-id/20200328223052.GK20103%40telsasoft.com
>>
>>
>> https://www.postgresql.org/message-id/87pmzaq4gx.fsf%40news-spur.riddles.org.uk
>>
>>
> I don't recall seeing this. Around that time I was busy returning from
> Australia at the start of the pandemic, so my I might have missed it in
> the hubbub.
>
>

If this is still an issue, is it possible to get a self-contained test
to reproduce it?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

04 апреля 2021 г., 22:50:05

Andrew Dunstan <andrew@dunslane.net> writes:
> On 4/4/21 12:05 PM, Tom Lane wrote:
>> I made CheckConstraintFetch likewise not install its array until
>> it's done.  I notice that it is throwing elog(ERROR) not WARNING
>> for the equivalent cases of not finding the right number of
>> entries.  I wonder if we ought to back that off to a WARNING too.
>> elog(ERROR) during relcache entry load is kinda nasty, because
>> it prevents essentially *all* use of the relation.  On the other
>> hand, failing to enforce check constraints isn't lovely either.
>> Anybody have opinions about that?

> None of this is supposed to happen, is it? If you can't fetch the
> constraints (and I think of a default as almost a kind of constraint)
> your database is already somehow corrupted. So maybe if any of these
> things happen we should ERROR out. Anything else seems likely to corrupt
> the database further.

Meh.  "pg_class.relchecks is inconsistent with the number of entries
in pg_constraint" does not seem to me like a scary enough situation
to justify a panic response.  Maybe there's an argument for failing
at the point where we'd need to actually apply the CHECK constraints
(similarly to what my patch is doing for missing defaults).
But preventing the user from, say, dumping the data in the table
seems to me to be making the situation worse not better.

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Dunstan

Дата:

05 апреля 2021 г., 13:15:32

On 4/4/21 6:50 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 4/4/21 12:05 PM, Tom Lane wrote:
>>> I made CheckConstraintFetch likewise not install its array until
>>> it's done.  I notice that it is throwing elog(ERROR) not WARNING
>>> for the equivalent cases of not finding the right number of
>>> entries.  I wonder if we ought to back that off to a WARNING too.
>>> elog(ERROR) during relcache entry load is kinda nasty, because
>>> it prevents essentially *all* use of the relation.  On the other
>>> hand, failing to enforce check constraints isn't lovely either.
>>> Anybody have opinions about that?
>> None of this is supposed to happen, is it? If you can't fetch the
>> constraints (and I think of a default as almost a kind of constraint)
>> your database is already somehow corrupted. So maybe if any of these
>> things happen we should ERROR out. Anything else seems likely to corrupt
>> the database further.
> Meh.  "pg_class.relchecks is inconsistent with the number of entries
> in pg_constraint" does not seem to me like a scary enough situation
> to justify a panic response.  Maybe there's an argument for failing
> at the point where we'd need to actually apply the CHECK constraints
> (similarly to what my patch is doing for missing defaults).
> But preventing the user from, say, dumping the data in the table
> seems to me to be making the situation worse not better.
>
>             


OK, fair argument.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

05 апреля 2021 г., 16:10:48

Andrew Dunstan <andrew@dunslane.net> writes:
> On 4/4/21 6:50 PM, Tom Lane wrote:
>> Meh.  "pg_class.relchecks is inconsistent with the number of entries
>> in pg_constraint" does not seem to me like a scary enough situation
>> to justify a panic response.  Maybe there's an argument for failing
>> at the point where we'd need to actually apply the CHECK constraints
>> (similarly to what my patch is doing for missing defaults).
>> But preventing the user from, say, dumping the data in the table
>> seems to me to be making the situation worse not better.

> OK, fair argument.

Here's a v2 that applies the same principles to struct ConstrCheck
as AttrDefault, ie create only valid entries (no NULL strings), and
if there's not the right number, complain at point of use rather
than failing the relcache load.

There is a hole in this, which is that if pg_class.relchecks = 0
then we won't even look in pg_constraint, so we won't realize if
there is an inconsistency.  But that was true before, and I don't
want to expend an almost-always-useless catalog search to see if
relchecks = 0 is a lie.

I also tried to bring the related message texts up to something
approaching project standards, though I didn't convert them into
translatable ereports.

            regards, tom lane

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index cb76465050..ffb275afbe 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -174,10 +174,7 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
             cpy->defval = (AttrDefault *) palloc(cpy->num_defval * sizeof(AttrDefault));
             memcpy(cpy->defval, constr->defval, cpy->num_defval * sizeof(AttrDefault));
             for (i = cpy->num_defval - 1; i >= 0; i--)
-            {
-                if (constr->defval[i].adbin)
-                    cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin);
-            }
+                cpy->defval[i].adbin = pstrdup(constr->defval[i].adbin);
         }

         if (constr->missing)
@@ -203,10 +200,8 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
             memcpy(cpy->check, constr->check, cpy->num_check * sizeof(ConstrCheck));
             for (i = cpy->num_check - 1; i >= 0; i--)
             {
-                if (constr->check[i].ccname)
-                    cpy->check[i].ccname = pstrdup(constr->check[i].ccname);
-                if (constr->check[i].ccbin)
-                    cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin);
+                cpy->check[i].ccname = pstrdup(constr->check[i].ccname);
+                cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin);
                 cpy->check[i].ccvalid = constr->check[i].ccvalid;
                 cpy->check[i].ccnoinherit = constr->check[i].ccnoinherit;
             }
@@ -328,10 +323,7 @@ FreeTupleDesc(TupleDesc tupdesc)
             AttrDefault *attrdef = tupdesc->constr->defval;

             for (i = tupdesc->constr->num_defval - 1; i >= 0; i--)
-            {
-                if (attrdef[i].adbin)
-                    pfree(attrdef[i].adbin);
-            }
+                pfree(attrdef[i].adbin);
             pfree(attrdef);
         }
         if (tupdesc->constr->missing)
@@ -352,10 +344,8 @@ FreeTupleDesc(TupleDesc tupdesc)

             for (i = tupdesc->constr->num_check - 1; i >= 0; i--)
             {
-                if (check[i].ccname)
-                    pfree(check[i].ccname);
-                if (check[i].ccbin)
-                    pfree(check[i].ccbin);
+                pfree(check[i].ccname);
+                pfree(check[i].ccbin);
             }
             pfree(check);
         }
@@ -412,7 +402,6 @@ bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 {
     int            i,
-                j,
                 n;

     if (tupdesc1->natts != tupdesc2->natts)
@@ -488,22 +477,13 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
         n = constr1->num_defval;
         if (n != (int) constr2->num_defval)
             return false;
+        /* We assume here that both AttrDefault arrays are in adnum order */
         for (i = 0; i < n; i++)
         {
             AttrDefault *defval1 = constr1->defval + i;
-            AttrDefault *defval2 = constr2->defval;
-
-            /*
-             * We can't assume that the items are always read from the system
-             * catalogs in the same order; so use the adnum field to identify
-             * the matching item to compare.
-             */
-            for (j = 0; j < n; defval2++, j++)
-            {
-                if (defval1->adnum == defval2->adnum)
-                    break;
-            }
-            if (j >= n)
+            AttrDefault *defval2 = constr2->defval + i;
+
+            if (defval1->adnum != defval2->adnum)
                 return false;
             if (strcmp(defval1->adbin, defval2->adbin) != 0)
                 return false;
@@ -534,25 +514,21 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
         n = constr1->num_check;
         if (n != (int) constr2->num_check)
             return false;
+
+        /*
+         * Similarly, we rely here on the ConstrCheck entries being sorted by
+         * name.  If there are duplicate names, the outcome of the comparison
+         * is uncertain, but that should not happen.
+         */
         for (i = 0; i < n; i++)
         {
             ConstrCheck *check1 = constr1->check + i;
-            ConstrCheck *check2 = constr2->check;
-
-            /*
-             * Similarly, don't assume that the checks are always read in the
-             * same order; match them up by name and contents. (The name
-             * *should* be unique, but...)
-             */
-            for (j = 0; j < n; check2++, j++)
-            {
-                if (strcmp(check1->ccname, check2->ccname) == 0 &&
-                    strcmp(check1->ccbin, check2->ccbin) == 0 &&
-                    check1->ccvalid == check2->ccvalid &&
-                    check1->ccnoinherit == check2->ccnoinherit)
-                    break;
-            }
-            if (j >= n)
+            ConstrCheck *check2 = constr2->check + i;
+
+            if (!(strcmp(check1->ccname, check2->ccname) == 0 &&
+                  strcmp(check1->ccbin, check2->ccbin) == 0 &&
+                  check1->ccvalid == check2->ccvalid &&
+                  check1->ccnoinherit == check2->ccnoinherit))
                 return false;
         }
     }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 88a68a4697..87f9bdaef0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2501,21 +2501,24 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
             if (attribute->atthasdef)
             {
                 Node       *this_default = NULL;
-                AttrDefault *attrdef;
-                int            i;

                 /* Find default in constraint structure */
-                Assert(constr != NULL);
-                attrdef = constr->defval;
-                for (i = 0; i < constr->num_defval; i++)
+                if (constr != NULL)
                 {
-                    if (attrdef[i].adnum == parent_attno)
+                    AttrDefault *attrdef = constr->defval;
+
+                    for (int i = 0; i < constr->num_defval; i++)
                     {
-                        this_default = stringToNode(attrdef[i].adbin);
-                        break;
+                        if (attrdef[i].adnum == parent_attno)
+                        {
+                            this_default = stringToNode(attrdef[i].adbin);
+                            break;
+                        }
                     }
                 }
-                Assert(this_default != NULL);
+                if (this_default == NULL)
+                    elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                         parent_attno, RelationGetRelationName(relation));

                 /*
                  * If it's a GENERATED default, it might contain Vars that
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 163242f54e..9f86910a6b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1609,6 +1609,15 @@ ExecRelCheck(ResultRelInfo *resultRelInfo,
     MemoryContext oldContext;
     int            i;

+    /*
+     * CheckConstraintFetch let this pass with only a warning, but now we
+     * should fail rather than possibly failing to enforce an important
+     * constraint.
+     */
+    if (ncheck != rel->rd_rel->relchecks)
+        elog(ERROR, "%d pg_constraint record(s) missing for relation \"%s\"",
+             rel->rd_rel->relchecks - ncheck, RelationGetRelationName(rel));
+
     /*
      * If first time through for this result relation, build expression
      * nodetrees for rel's constraint expressions.  Keep them in the per-query
@@ -1862,7 +1871,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
         }
     }

-    if (constr->num_check > 0)
+    if (rel->rd_rel->relchecks > 0)
     {
         const char *failed;

diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index b968c25dd6..9dd30370da 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1239,8 +1239,6 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause)
          (CREATE_TABLE_LIKE_DEFAULTS | CREATE_TABLE_LIKE_GENERATED)) &&
         constr != NULL)
     {
-        AttrDefault *attrdef = constr->defval;
-
         for (parent_attno = 1; parent_attno <= tupleDesc->natts;
              parent_attno++)
         {
@@ -1264,6 +1262,7 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause)
                  (table_like_clause->options & CREATE_TABLE_LIKE_DEFAULTS)))
             {
                 Node       *this_default = NULL;
+                AttrDefault *attrdef = constr->defval;
                 AlterTableCmd *atsubcmd;
                 bool        found_whole_row;

@@ -1276,7 +1275,9 @@ expandTableLikeClause(RangeVar *heapRel, TableLikeClause *table_like_clause)
                         break;
                     }
                 }
-                Assert(this_default != NULL);
+                if (this_default == NULL)
+                    elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                         parent_attno, RelationGetRelationName(relation));

                 atsubcmd = makeNode(AlterTableCmd);
                 atsubcmd->subtype = AT_CookedColumnDefault;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 92661abae2..da78f02775 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1228,25 +1228,28 @@ build_column_default(Relation rel, int attrno)
     }

     /*
-     * Scan to see if relation has a default for this column.
+     * If relation has a default for this column, fetch that expression.
      */
-    if (att_tup->atthasdef && rd_att->constr &&
-        rd_att->constr->num_defval > 0)
+    if (att_tup->atthasdef)
     {
-        AttrDefault *defval = rd_att->constr->defval;
-        int            ndef = rd_att->constr->num_defval;
-
-        while (--ndef >= 0)
+        if (rd_att->constr && rd_att->constr->num_defval > 0)
         {
-            if (attrno == defval[ndef].adnum)
+            AttrDefault *defval = rd_att->constr->defval;
+            int            ndef = rd_att->constr->num_defval;
+
+            while (--ndef >= 0)
             {
-                /*
-                 * Found it, convert string representation to node tree.
-                 */
-                expr = stringToNode(defval[ndef].adbin);
-                break;
+                if (attrno == defval[ndef].adnum)
+                {
+                    /* Found it, convert string representation to node tree. */
+                    expr = stringToNode(defval[ndef].adbin);
+                    break;
+                }
             }
         }
+        if (expr == NULL)
+            elog(ERROR, "default expression not found for attribute %d of relation \"%s\"",
+                 attrno, RelationGetRelationName(rel));
     }

     /*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index ff7395c85b..29702d6eab 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -282,7 +282,8 @@ static void RelationInitPhysicalAddr(Relation relation);
 static void load_critical_index(Oid indexoid, Oid heapoid);
 static TupleDesc GetPgClassDescriptor(void);
 static TupleDesc GetPgIndexDescriptor(void);
-static void AttrDefaultFetch(Relation relation);
+static void AttrDefaultFetch(Relation relation, int ndef);
+static int    AttrDefaultCmp(const void *a, const void *b);
 static void CheckConstraintFetch(Relation relation);
 static int    CheckConstraintCmp(const void *a, const void *b);
 static void InitIndexAmRoutine(Relation relation);
@@ -503,7 +504,6 @@ RelationBuildTupleDesc(Relation relation)
     ScanKeyData skey[2];
     int            need;
     TupleConstr *constr;
-    AttrDefault *attrdef = NULL;
     AttrMissing *attrmiss = NULL;
     int            ndef = 0;

@@ -512,8 +512,8 @@ RelationBuildTupleDesc(Relation relation)
         relation->rd_rel->reltype ? relation->rd_rel->reltype : RECORDOID;
     relation->rd_att->tdtypmod = -1;    /* just to be sure */

-    constr = (TupleConstr *) MemoryContextAlloc(CacheMemoryContext,
-                                                sizeof(TupleConstr));
+    constr = (TupleConstr *) MemoryContextAllocZero(CacheMemoryContext,
+                                                    sizeof(TupleConstr));
     constr->has_not_null = false;
     constr->has_generated_stored = false;

@@ -557,10 +557,9 @@ RelationBuildTupleDesc(Relation relation)

         attnum = attp->attnum;
         if (attnum <= 0 || attnum > RelationGetNumberOfAttributes(relation))
-            elog(ERROR, "invalid attribute number %d for %s",
+            elog(ERROR, "invalid attribute number %d for relation \"%s\"",
                  attp->attnum, RelationGetRelationName(relation));

-
         memcpy(TupleDescAttr(relation->rd_att, attnum - 1),
                attp,
                ATTRIBUTE_FIXED_PART_SIZE);
@@ -570,22 +569,10 @@ RelationBuildTupleDesc(Relation relation)
             constr->has_not_null = true;
         if (attp->attgenerated == ATTRIBUTE_GENERATED_STORED)
             constr->has_generated_stored = true;
-
-        /* If the column has a default, fill it into the attrdef array */
         if (attp->atthasdef)
-        {
-            if (attrdef == NULL)
-                attrdef = (AttrDefault *)
-                    MemoryContextAllocZero(CacheMemoryContext,
-                                           RelationGetNumberOfAttributes(relation) *
-                                           sizeof(AttrDefault));
-            attrdef[ndef].adnum = attnum;
-            attrdef[ndef].adbin = NULL;
-
             ndef++;
-        }

-        /* Likewise for a missing value */
+        /* If the column has a "missing" value, put it in the attrmiss array */
         if (attp->atthasmissing)
         {
             Datum        missingval;
@@ -648,7 +635,7 @@ RelationBuildTupleDesc(Relation relation)
     table_close(pg_attribute_desc, AccessShareLock);

     if (need != 0)
-        elog(ERROR, "catalog is missing %d attribute(s) for relid %u",
+        elog(ERROR, "pg_attribute catalog is missing %d attribute(s) for relation OID %u",
              need, RelationGetRelid(relation));

     /*
@@ -680,33 +667,19 @@ RelationBuildTupleDesc(Relation relation)
         constr->has_generated_stored ||
         ndef > 0 ||
         attrmiss ||
-        relation->rd_rel->relchecks)
+        relation->rd_rel->relchecks > 0)
     {
         relation->rd_att->constr = constr;

         if (ndef > 0)            /* DEFAULTs */
-        {
-            if (ndef < RelationGetNumberOfAttributes(relation))
-                constr->defval = (AttrDefault *)
-                    repalloc(attrdef, ndef * sizeof(AttrDefault));
-            else
-                constr->defval = attrdef;
-            constr->num_defval = ndef;
-            AttrDefaultFetch(relation);
-        }
+            AttrDefaultFetch(relation, ndef);
         else
             constr->num_defval = 0;

         constr->missing = attrmiss;

         if (relation->rd_rel->relchecks > 0)    /* CHECKs */
-        {
-            constr->num_check = relation->rd_rel->relchecks;
-            constr->check = (ConstrCheck *)
-                MemoryContextAllocZero(CacheMemoryContext,
-                                       constr->num_check * sizeof(ConstrCheck));
             CheckConstraintFetch(relation);
-        }
         else
             constr->num_check = 0;
     }
@@ -4251,21 +4224,29 @@ GetPgIndexDescriptor(void)

 /*
  * Load any default attribute value definitions for the relation.
+ *
+ * ndef is the number of attributes that were marked atthasdef.
+ *
+ * Note: we don't make it a hard error to be missing some pg_attrdef records.
+ * We can limp along as long as nothing needs to use the default value.  Code
+ * that fails to find an expected AttrDefault record should throw an error.
  */
 static void
-AttrDefaultFetch(Relation relation)
+AttrDefaultFetch(Relation relation, int ndef)
 {
-    AttrDefault *attrdef = relation->rd_att->constr->defval;
-    int            ndef = relation->rd_att->constr->num_defval;
+    AttrDefault *attrdef;
     Relation    adrel;
     SysScanDesc adscan;
     ScanKeyData skey;
     HeapTuple    htup;
-    Datum        val;
-    bool        isnull;
-    int            found;
-    int            i;
+    int            found = 0;
+
+    /* Allocate array with room for as many entries as expected */
+    attrdef = (AttrDefault *)
+        MemoryContextAllocZero(CacheMemoryContext,
+                               ndef * sizeof(AttrDefault));

+    /* Search pg_attrdef for relevant entries */
     ScanKeyInit(&skey,
                 Anum_pg_attrdef_adrelid,
                 BTEqualStrategyNumber, F_OIDEQ,
@@ -4274,65 +4255,94 @@ AttrDefaultFetch(Relation relation)
     adrel = table_open(AttrDefaultRelationId, AccessShareLock);
     adscan = systable_beginscan(adrel, AttrDefaultIndexId, true,
                                 NULL, 1, &skey);
-    found = 0;

     while (HeapTupleIsValid(htup = systable_getnext(adscan)))
     {
         Form_pg_attrdef adform = (Form_pg_attrdef) GETSTRUCT(htup);
-        Form_pg_attribute attr = TupleDescAttr(relation->rd_att, adform->adnum - 1);
+        Datum        val;
+        bool        isnull;

-        for (i = 0; i < ndef; i++)
+        /* protect limited size of array */
+        if (found >= ndef)
         {
-            if (adform->adnum != attrdef[i].adnum)
-                continue;
-            if (attrdef[i].adbin != NULL)
-                elog(WARNING, "multiple attrdef records found for attr %s of rel %s",
-                     NameStr(attr->attname),
-                     RelationGetRelationName(relation));
-            else
-                found++;
-
-            val = fastgetattr(htup,
-                              Anum_pg_attrdef_adbin,
-                              adrel->rd_att, &isnull);
-            if (isnull)
-                elog(WARNING, "null adbin for attr %s of rel %s",
-                     NameStr(attr->attname),
-                     RelationGetRelationName(relation));
-            else
-            {
-                /* detoast and convert to cstring in caller's context */
-                char       *s = TextDatumGetCString(val);
-
-                attrdef[i].adbin = MemoryContextStrdup(CacheMemoryContext, s);
-                pfree(s);
-            }
+            elog(WARNING, "unexpected pg_attrdef record found for attribute %d of relation \"%s\"",
+                 adform->adnum, RelationGetRelationName(relation));
             break;
         }

-        if (i >= ndef)
-            elog(WARNING, "unexpected attrdef record found for attr %d of rel %s",
+        val = fastgetattr(htup,
+                          Anum_pg_attrdef_adbin,
+                          adrel->rd_att, &isnull);
+        if (isnull)
+            elog(WARNING, "null adbin for attribute %d of relation \"%s\"",
                  adform->adnum, RelationGetRelationName(relation));
+        else
+        {
+            /* detoast and convert to cstring in caller's context */
+            char       *s = TextDatumGetCString(val);
+
+            attrdef[found].adnum = adform->adnum;
+            attrdef[found].adbin = MemoryContextStrdup(CacheMemoryContext, s);
+            pfree(s);
+            found++;
+        }
     }

     systable_endscan(adscan);
     table_close(adrel, AccessShareLock);
+
+    if (found != ndef)
+        elog(WARNING, "%d pg_attrdef record(s) missing for relation \"%s\"",
+             ndef - found, RelationGetRelationName(relation));
+
+    /*
+     * Sort the AttrDefault entries by adnum, for the convenience of
+     * equalTupleDescs().  (Usually, they already will be in order, but this
+     * might not be so if systable_getnext isn't using an index.)
+     */
+    if (found > 1)
+        qsort(attrdef, found, sizeof(AttrDefault), AttrDefaultCmp);
+
+    /* Install array only after it's fully valid */
+    relation->rd_att->constr->defval = attrdef;
+    relation->rd_att->constr->num_defval = found;
+}
+
+/*
+ * qsort comparator to sort AttrDefault entries by adnum
+ */
+static int
+AttrDefaultCmp(const void *a, const void *b)
+{
+    const AttrDefault *ada = (const AttrDefault *) a;
+    const AttrDefault *adb = (const AttrDefault *) b;
+
+    return ada->adnum - adb->adnum;
 }

 /*
  * Load any check constraints for the relation.
+ *
+ * As with defaults, if we don't find the expected number of them, just warn
+ * here.  The executor should throw an error if an INSERT/UPDATE is attempted.
  */
 static void
 CheckConstraintFetch(Relation relation)
 {
-    ConstrCheck *check = relation->rd_att->constr->check;
-    int            ncheck = relation->rd_att->constr->num_check;
+    ConstrCheck *check;
+    int            ncheck = relation->rd_rel->relchecks;
     Relation    conrel;
     SysScanDesc conscan;
     ScanKeyData skey[1];
     HeapTuple    htup;
     int            found = 0;

+    /* Allocate array with room for as many entries as expected */
+    check = (ConstrCheck *)
+        MemoryContextAllocZero(CacheMemoryContext,
+                               ncheck * sizeof(ConstrCheck));
+
+    /* Search pg_constraint for relevant entries */
     ScanKeyInit(&skey[0],
                 Anum_pg_constraint_conrelid,
                 BTEqualStrategyNumber, F_OIDEQ,
@@ -4347,15 +4357,18 @@ CheckConstraintFetch(Relation relation)
         Form_pg_constraint conform = (Form_pg_constraint) GETSTRUCT(htup);
         Datum        val;
         bool        isnull;
-        char       *s;

         /* We want check constraints only */
         if (conform->contype != CONSTRAINT_CHECK)
             continue;

+        /* protect limited size of array */
         if (found >= ncheck)
-            elog(ERROR, "unexpected constraint record found for rel %s",
+        {
+            elog(WARNING, "unexpected pg_constraint record found for relation \"%s\"",
                  RelationGetRelationName(relation));
+            break;
+        }

         check[found].ccvalid = conform->convalidated;
         check[found].ccnoinherit = conform->connoinherit;
@@ -4367,27 +4380,36 @@ CheckConstraintFetch(Relation relation)
                           Anum_pg_constraint_conbin,
                           conrel->rd_att, &isnull);
         if (isnull)
-            elog(ERROR, "null conbin for rel %s",
+            elog(WARNING, "null conbin for relation \"%s\"",
                  RelationGetRelationName(relation));
+        else
+        {
+            /* detoast and convert to cstring in caller's context */
+            char       *s = TextDatumGetCString(val);

-        /* detoast and convert to cstring in caller's context */
-        s = TextDatumGetCString(val);
-        check[found].ccbin = MemoryContextStrdup(CacheMemoryContext, s);
-        pfree(s);
-
-        found++;
+            check[found].ccbin = MemoryContextStrdup(CacheMemoryContext, s);
+            pfree(s);
+            found++;
+        }
     }

     systable_endscan(conscan);
     table_close(conrel, AccessShareLock);

     if (found != ncheck)
-        elog(ERROR, "%d constraint record(s) missing for rel %s",
+        elog(WARNING, "%d pg_constraint record(s) missing for relation \"%s\"",
              ncheck - found, RelationGetRelationName(relation));

-    /* Sort the records so that CHECKs are applied in a deterministic order */
-    if (ncheck > 1)
-        qsort(check, ncheck, sizeof(ConstrCheck), CheckConstraintCmp);
+    /*
+     * Sort the records by name.  This ensures that CHECKs are applied in a
+     * deterministic order, and it also makes equalTupleDescs() faster.
+     */
+    if (found > 1)
+        qsort(check, found, sizeof(ConstrCheck), CheckConstraintCmp);
+
+    /* Install array only after it's fully valid */
+    relation->rd_att->constr->check = check;
+    relation->rd_att->constr->num_check = found;
 }

 /*

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

05 апреля 2021 г., 16:21:36

Zhihong Yu <zyu@yugabyte.com> writes:
>>     if (found != ncheck)

> Since there is check on found being smaller than ncheck inside the loop,
> the if condition can be written as:
>     if (found < ncheck)

Doesn't really seem like an improvement?  Nor am I excited about renaming
these "found" variables, considering that those names can be traced back
more than twenty years.

> +   if (found != ndef)
> +       elog(WARNING, "%d attrdef record(s) missing for rel %s",
> +            ndef - found, RelationGetRelationName(relation));

> Since only warning is logged, there seems to be some wasted space in
> attrdef. Would such space accumulate, resulting in some memory leak ?

No, it's just one palloc chunk either way.  I do not think there is
any value in adding more code to reclaim the unused array slots a
bit sooner.  (Because of aset.c's power-of-two allocation practices,
it's likely there would be zero actual space savings anyway.)

            regards, tom lane

Re: ALTER TABLE ADD COLUMN fast default

От

Andrew Gierth

Дата:

06 апреля 2021 г., 03:52:44

>>>>> "Andrew" == Andrew Dunstan <andrew@dunslane.net> writes:

 Andrew> I'd be curious to know how this state came about.

Me too, but available information is fairly sparse: PG 12.5, in a
container, backing a (personal) instance of Gitlab; the only database
admin operations should have been those done by Gitlab itself, but I
have not audited that codebase. No information on any history of
crashes. The missing pg_attrdef row appeared to be missing or not
visible in the heap, not just missing from indexes; it did not show up
in queries whether seqscan or indexscan was used. Available time did not
permit trying to use pageinspect on pg_attrdef.

This gitlab ticket refers to the same incident:

https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047

(which actually contains a new relevant fact that hadn't been mentioned
in the IRC discussion, which is that the problem affected multiple
tables, not just one.)

-- 
Andrew (irc:RhodiumToad)

Re: ALTER TABLE ADD COLUMN fast default

От

Tom Lane

Дата:

06 апреля 2021 г., 15:01:55

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> This gitlab ticket refers to the same incident:
> https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6047
> (which actually contains a new relevant fact that hadn't been mentioned
> in the IRC discussion, which is that the problem affected multiple
> tables, not just one.)

Hmm.  If you assume that the problem is either a corrupted index on
pg_attrdef, or some block(s) of pg_attrdef got zeroed out, then it
would be entirely unsurprising for multiple tables to be affected.

I've pushed the v2 patch to HEAD.  Not sure if we want to consider
back-patching.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: ALTER TABLE ADD COLUMN fast default

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения