Re: ALTER TABLE ADD COLUMN fast default

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: ALTER TABLE ADD COLUMN fast default
Дата
Msg-id 20180227190923.ku6pfkv53bnxu4fr@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: ALTER TABLE ADD COLUMN fast default  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
Ответы Re: ALTER TABLE ADD COLUMN fast default
Список pgsql-hackers
Hi,

On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote:
> This was debated quite some time ago. See for example
> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us>
> The consensus then seemed to be to store them in pg_attribute. If not,
> I think we'd need a new catalog - pg_attrdef doesn't seem right. They
> would have to go on separate rows, and we'd be storing values rather
> than expressions, so it would get somewhat ugly.

I know that I'm concerned with storing a number of large values in
pg_attributes, and I haven't seen that argument addressed much in a
quick scan of the discussion.


> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
> >>   * ----------------
> >>   */
> >>  bool
> >> -heap_attisnull(HeapTuple tup, int attnum)
> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
> >>  {
> >> +     /*
> >> +      * We allow a NULL tupledesc for relations not expected to have
> >> +      * missing values, such as catalog relations and indexes.
> >> +      */
> >> +     Assert(!tupleDesc || attnum <= tupleDesc->natts);
> >
> > This seems fairly likely to hide bugs.

> How? There are plenty of cases where we simply expect quite reasonably
> that there won't be any missing attributes, and we don't need  a
> tupleDesc at all in such cases.

Missing to pass a tupledesc for tables that can have defaults with not
have directly visible issues, you'll just erroneously get back a null.


> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
> >>                       return false;           /* out of order */
> >>               if (att_tup->attisdropped)
> >>                       return false;           /* table contains dropped columns */
> >> +             if (att_tup->atthasmissing)
> >> +                     return false;           /* table contains cols with missing values */
> >
> > Hm, so incrementally building a table definitions with defaults, even if
> > there aren't ever any rows, or if there's a subsequent table rewrite,
> > will cause slowdowns. If so, shouldn't a rewrite remove all
> > atthasmissing values?

> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having
> had to make this change.  Still, it looks like dropping a column has
> the same effect.

What exactly is mystifying you? That the new default stuff doesn't work
when the physical tlist optimization is in use?


> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
> >>                                       MemoryContextAllocZero(CacheMemoryContext,
> >>                                                                                  relation->rd_rel->relnatts *
> >>                                                                                  sizeof(AttrDefault));
> >> -                     attrdef[ndef].adnum = attp->attnum;
> >> +                     attrdef[ndef].adnum = attnum;
> >>                       attrdef[ndef].adbin = NULL;
> >> +
> >>                       ndef++;
> >>               }
> >> +
> >> +             /* Likewise for a missing value */
> >> +             if (attp->atthasmissing)
> >> +             {
> >
> > As I've written somewhere above, I don't think it's a good idea to do
> > this preemptively. Tupledescs are very commonly copied, and the
> > missing attributes are quite likely never used. IOW, we should just
> > remember the attmissingattr value and fill in the corresponding
> > attmissingval on-demand.

> Defaults are frequently not used either, yet this function fills those
> in regardless. I don't see why we need to treat missing values
> differently.

It's already hurting us quite a bit in workloads with a lot of trivial
queries. Adding more makes it worse.


> Attached is the current state of things, with the fixes indicated
> above, as well as some things pointed out by David Rowley.
> 
> None of this seems to have had much effect on performance.
> 
> Here's what oprofile reports from a run of pgbench with
> create-alter.sql and the query "select sum(c1000) from t":
> 
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 22584    28.5982  postgres                 ExecInterpExpr
> 11950    15.1323  postgres                 slot_getmissingattrs
> 5471      6.9279  postgres                 AllocSetAlloc
> 3018      3.8217  postgres                 TupleDescInitEntry
> 2042      2.5858  postgres                 MemoryContextAllocZeroAligned
> 1807      2.2882  postgres                 SearchCatCache1
> 1638      2.0742  postgres                 expression_tree_walker
> 
> 
> That's very different from what we see on the master branch. The time
> spent in slot_getmissingattrs is perhaps not unexpected, but I don't
> (at least yet) understand why we're getting so much time spent in
> ExecInterpExpr, which doesn't show up at all when the same benchmark
> is run on master.

I'd guess that's because the physical tlist optimization is disabled. I
assume you'd see something similar on master if you dropped a column.

- Andres


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Let's remove DSM_INPL_NONE.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)