Re: Fast AT ADD COLUMN with DEFAULTs
От | Vitaly Burovoy |
---|---|
Тема | Re: Fast AT ADD COLUMN with DEFAULTs |
Дата | |
Msg-id | CAKOSWNmF-ZZtntD4BASZxv8S86UWqZt-tEXER5WdrCcV=1Ok5g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Fast AT ADD COLUMN with DEFAULTs (Serge Rielau <serge@rielau.com>) |
Список | pgsql-hackers |
On 10/6/16, Serge Rielau <serge@rielau.com> wrote: >> On Oct 6, 2016, at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >>> But what I discover for myself is that we have pg_attrdef separately >>> from the pg_attribute. Why? >> >> The core reason for that is that the default expression needs to be >> a separate object from the column for purposes of dependency analysis. >> For example, if you have a column whose default is "foo()", then the >> default expression depends on the function foo(), but the column should >> not: if you drop the function, only the default expression ought to >> be dropped, not the column. >> >> Because of this, the default expression needs to have its own OID >> (to be stored in pg_depend) and it's convenient to store it in a >> separate catalog so that the classoid can identify it as being a >> default expression rather than some other kind of object. > > Good to know. > >> If we were going to allow these missing_values or creation_defaults >> or whatever they're called to be general expressions, then they would >> need >> to have their own OIDs for dependency purposes. That would lead me to >> think that the best representation is to put them in their own rows in >> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish >> regular defaults from these things. Or maybe they even need their own >> catalog, depending on whether you think dependency analysis would want >> to distinguish them from regular defaults using just the classed. >> >> Now, as I just pointed out in another mail, realistically we're probably >> going to restrict the feature to simple constants, which'd mean they will >> depend only on the column's type and can never need any dependencies of >> their own. So we could take the shortcut of just storing them in a new >> column in pg_attribute. I agree with you. >> But maybe that's shortsighted and we'll >> eventually wish we'd done them as full-fledged separate objects. I don't think so. If we try to implement non-blocking adding columns with volatile defaults (and for instance update old rows in the background), we can end up with the next situation: CREATE TABLE a(i bigint PRIMARY KEY); INSERT INTO a SELECT generate_series(1,10000000000); ALTER TABLE a ADD COLUMN b bigserial CHECK (b BETWEEN 1 AND 100); For indexes (even unique) created concurrently similar troubles are solved with a "not valid" mark, but what to do with a heap if we try to do it in the background? >> But on the third hand ... once one of these is in place, how could you >> drop it separately from the column? That would amount to a change in the >> column's stored data, which is not what one would expect from dropping >> a separate object. So maybe it's senseless to think that these things >> could ever be distinct objects. But that definitely leads to the >> conclusion that they're constants and nothing else. > I cannot follow this reasoning. > Let’s look past what PG does today: > For each row (whether that’s necessary or not) we evaluate the expression, > compute the value and > store it in the rewritten table. > We do not record dependencies on the “pedigree” of the value. > It happened to originate from the DEFAULT expression provided with the ADD > COLUMN, > but that is not remembered anywhere. > All we remember is the value - in each row. > So the only change that is proposed here - when it comes right down to it - > is to remember the value once only (IFF it is provably the same for each row) > and thus avoid the need to rewrite the table. > So I see no reason to impose any restriction other than “evaluated value is > provably the same for every row”. Tom says the same thing. The expression at the end should be a value if it allows to avoid rewriting table. > Regarding the location of storage. > I did start of using pg_attrdef, but ran into some snags. > My approach was to add the value as an extra column (rather than an extra > row). > That caused trouble since a SET DEFAULT operation is decomposed into a DROP > and a SET and > preserving the value across such operations did not come naturally. I'm sorry for making you be confused. The best way is to use an extra column in the pg_attribute to store serialized value. > If we were to use extra rows instead that issue would be solved, assuming we > add a “default kind” sort of column. > It would dictate the storage format though which may be considered overkill > for a a constant. -- Best regards, Vitaly Burovoy
В списке pgsql-hackers по дате отправления: