Обсуждение: BUG #10591: setting newly added columns to null is slow

Поиск
Список
Период
Сортировка

BUG #10591: setting newly added columns to null is slow

От
eike@inter.net
Дата:
The following bug has been logged on the website:

Bug reference:      10591
Logged by:          Eike Dierks
Email address:      eike@inter.net
PostgreSQL version: 9.2.1
Operating system:   OSX 10.9, centos on virtuozzo linux-2.6.18-028stab
Description:

- given demo_table with some amount of data
- add new_column (works instantely)
- add a unique contraint on new_column
- update demo_table set new_colum=null; (quite slow)
-> takes a long time: seems to rewrite the whole table

I guess that adding a new column does not yet reserve space for that new
column on disk, but that the on-disk layout is only changed as soon as I do
the update (aka making the column physically present on disk)

I fully understand that this is a sensible tradeoff and that this is the
intended behaviour.

However from my uneducated guess (not tested) I believe that a vacuum full
followed by the update would have been much faster.

I somehow believe that my first sequence of commands did make the new_column
virtual first and then physically for every row in turn, thereby taking a
long time to complete.

I'd like to suggest to add to the documentation (if it's not already in
there)
It should be stated that a vacuum full might speed up further access to a
newly introduced column.

But again I did not fully test that scenario, it's just my guess.

Re: BUG #10591: setting newly added columns to null is slow

От
Kevin Grittner
Дата:
"eike@inter.net" <eike@inter.net> wrote:=0A=0A>- given demo_table with some=
 amount of data=0A>- add new_column (works instantely)=0A>- add a unique co=
ntraint on new_column=0A>- update demo_table set new_colum=3Dnull; (quite s=
low)=0A>-> takes a long time: seems to rewrite the whole table=0A=0AThis is=
 not a bug.=A0 It has nothing to do with having added the=0Acolumn.=A0 An U=
PDATE statement with no WHERE clause will delete and=0Are-add every row in =
the table, even if your update is setting a=0Acolumn to the value it alread=
y has.=0A=0A> I guess that adding a new column does not yet reserve space f=
or=0A> that new column on disk, but that the on-disk layout is only=0A> cha=
nged as soon as I do the update (aka making the column=0A> physically prese=
nt on disk)=0A=0ANo, the table can immediately be used, and the new column =
will be=0ANULL.=A0 The UPDATE was completely unnecessary.=A0 When in doubt,=
 use a=0AWHERE clause that ensures you are not updating unnecessarily, like=
:=0A=0AUPDATE demo_table=0A=A0 SET new_column =3D NULL=0A=A0 WHERE new_colu=
mn IS NOT NULL;=0A=0A> However from my uneducated guess (not tested) I beli=
eve that a=0A> vacuum full followed by the update would have been much fast=
er.=0A=0AIt would not have been.=A0 VACUUM FULL is not often a good idea;=
=0Aalthough you are now in a state where this table is heavily=0Abloated by=
 the unnecessary update of every row, so it might=0Aactually be worth it no=
w.=0A=0Ahttp://rhaas.blogspot.com/2014/03/vacuum-full-doesnt-mean-vacuum-bu=
t.html=0A=0A--=0AKevin Grittner=0AEDB: http://www.enterprisedb.com=0AThe En=
terprise PostgreSQL Company

Re: BUG #10591: setting newly added columns to null is slow

От
David G Johnston
Дата:
Eike Dierks wrote
> The following bug has been logged on the website:
>
> Bug reference:      10591
> Logged by:          Eike Dierks
> Email address:

> eike@

> PostgreSQL version: 9.2.1
> Operating system:   OSX 10.9, centos on virtuozzo linux-2.6.18-028stab
> Description:
>
> - given demo_table with some amount of data
> - add new_column (works instantely)
> - add a unique contraint on new_column
> - update demo_table set new_colum=null; (quite slow)
> -> takes a long time: seems to rewrite the whole table
>
> I guess that adding a new column does not yet reserve space for that new
> column on disk, but that the on-disk layout is only changed as soon as I
> do
> the update (aka making the column physically present on disk)
>
> I fully understand that this is a sensible tradeoff and that this is the
> intended behaviour.

As you say - this is not a bug...


> However from my uneducated guess (not tested) I believe that a vacuum full
> followed by the update would have been much faster.

Doubtful and not without its own problems.  While the UPDATE could perform
faster the entire transaction will definately take longer as a whole.


> I somehow believe that my first sequence of commands did make the
> new_column
> virtual first and then physically for every row in turn, thereby taking a
> long time to complete.

Yes, this is how MVCC (multi-version concurrency control) works.


> I'd like to suggest to add to the documentation (if it's not already in
> there)
> It should be stated that a vacuum full might speed up further access to a
> newly introduced column.

This is not a bug and while I understand your confusion if you are going to
suggest that changes are needed is it very helpful to actually suggest what
you think needs to change.

Or, at least, tell us what parts of the documentation you did read so that
we may try and better understand how users are using the documentation and
also point to you anything you may have missed.


> But again I did not fully test that scenario, it's just my guess.

If you are not going to actually test your theories you should then at least
provide logical support for your reasoning.


As a final note, a better wording for the message would be: "setting newly
added columns to null is pointless" (or maybe- "is not optimized").  But any
optimization to make this pointless use-case faster would likely slow down
real-world use-cases...

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-10591-setting-newly-added-columns-to-null-is-slow-tp5806732p5806737.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.