Обсуждение: Updating pg_attribute to change field's data type from integer to bigint on very large table

Поиск
Список
Период
Сортировка

Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Jeff Adams
Дата:
Greetings,

I have a very large table (approximately 1 billion records). I need to change a field's data type from integer to bigint. I started up an ALTER TABLE approach yesterday and it is still running (trying to rewrite the whole table?). I remember seeing mention of being able to do this in the pg_attribute table. Has anybody ever used the latter approach. If so, instructions would be greatly appreciated. Thanks...

Jeff


Jeff Adams <jeff.adams@noaa.gov> writes:
> I have a very large table (approximately 1 billion records). I need to
> change a field's data type from integer to bigint. I started up an ALTER
> TABLE approach yesterday and it is still running (trying to rewrite the
> whole table?). I remember seeing mention of being able to do this in the
> pg_attribute table. Has anybody ever used the latter approach. If so,
> instructions would be greatly appreciated. Thanks...

No, that is an actual on-disk change (making the field physically
wider), so it's going to cost ya.  There are some cases where the
on-disk representation doesn't change and so a catalog update isn't
needed, but int4->int8 isn't one of them.

            regards, tom lane

I wrote:
> No, that is an actual on-disk change (making the field physically
> wider), so it's going to cost ya.  There are some cases where the
> on-disk representation doesn't change and so a catalog update isn't
> needed, but int4->int8 isn't one of them.

Sheesh, hit send too quickly on that.  Of course I meant "and so a
catalog update is all that's needed".

            regards, tom lane

Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Jeff Adams
Дата:
so i can? if so, how do i go about? i should mention that, while i dabble in postgres dba activity, it is not my day job...

On Fri, Apr 13, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> No, that is an actual on-disk change (making the field physically
> wider), so it's going to cost ya.  There are some cases where the
> on-disk representation doesn't change and so a catalog update isn't
> needed, but int4->int8 isn't one of them.

Sheesh, hit send too quickly on that.  Of course I meant "and so a
catalog update is all that's needed".

                       regards, tom lane



--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376

Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Bèrto ëd Sèra
Дата:
Hi Jeff,

read carefully, he said it does stuff ON DISK, which means that all of your fields must be physically remade (so it a lot more than telling it "it's an int4, Sheila", which is what just modifying the catalog would do). Since you are not a dba be aware that you are producing a new record for each of the existing ones, and thus bloating things quite a bit. You may want to vacuum it asap :)

Bèrto

On 13 April 2012 16:30, Jeff Adams <jeff.adams@noaa.gov> wrote:
so i can? if so, how do i go about? i should mention that, while i dabble in postgres dba activity, it is not my day job...


On Fri, Apr 13, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> No, that is an actual on-disk change (making the field physically
> wider), so it's going to cost ya.  There are some cases where the
> on-disk representation doesn't change and so a catalog update isn't
> needed, but int4->int8 isn't one of them.

Sheesh, hit send too quickly on that.  Of course I meant "and so a
catalog update is all that's needed".

                       regards, tom lane



--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376



--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.

Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Steve Crawford
Дата:
On 04/13/2012 08:30 AM, Jeff Adams wrote:
> so i can? if so, how do i go about? i should mention that, while i
> dabble in postgres dba activity, it is not my day job...
>
That really depends on details and your concerns. Is the database used
for constant insert/update/select activity or is it a big table used for
analysis and can be taken offline for some period? Is the column you
want to update a primary or foreign key? How much available disk space
do you have? Is a large portion of the data static (historical logs)?

Some possible approaches:

1. Just let it run to completion if you can afford the maintenance time.

2. Add a new column of the appropriate type, copy the data into that
column then drop the old one and rename the new one. If you do the
update all at once you will have severe table bloat but you may be able
to do the updates of the new column in batches so that vacuum can
reclaim space between update batches. This approach may be useful if you
do not have enough maintenance time to do the change all at once.

3. Dump the table data. Truncate the table and modify the column
definition. Restore the data. This requires downtime but will probably
be faster than in-place modification. However it's not something that
you can easily cancel part-way through and not a friendly method if
there are foreign-keys involved.

4. Rename the table and create a new table with the structure you want.
Copy the old data back into the new table - perhaps in batches. This
might be useful if you need to constantly keep collecting data but can
afford a delay in analysis of the data.

If partitioning the table would be beneficial, this might be a good time
to consider that as well.

Cheers,
Steve


Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Jeff Adams
Дата:
Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id field (I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my part!), but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter the column in the parent table. Does this information point towards an optimal solution? Thanks again...

On Fri, Apr 13, 2012 at 12:31 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 04/13/2012 08:30 AM, Jeff Adams wrote:
so i can? if so, how do i go about? i should mention that, while i dabble in postgres dba activity, it is not my day job...

That really depends on details and your concerns. Is the database used for constant insert/update/select activity or is it a big table used for analysis and can be taken offline for some period? Is the column you want to update a primary or foreign key? How much available disk space do you have? Is a large portion of the data static (historical logs)?

Some possible approaches:

1. Just let it run to completion if you can afford the maintenance time.

2. Add a new column of the appropriate type, copy the data into that column then drop the old one and rename the new one. If you do the update all at once you will have severe table bloat but you may be able to do the updates of the new column in batches so that vacuum can reclaim space between update batches. This approach may be useful if you do not have enough maintenance time to do the change all at once.

3. Dump the table data. Truncate the table and modify the column definition. Restore the data. This requires downtime but will probably be faster than in-place modification. However it's not something that you can easily cancel part-way through and not a friendly method if there are foreign-keys involved.

4. Rename the table and create a new table with the structure you want. Copy the old data back into the new table - perhaps in batches. This might be useful if you need to constantly keep collecting data but can afford a delay in analysis of the data.

If partitioning the table would be beneficial, this might be a good time to consider that as well.

Cheers,
Steve




--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376

Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Alban Hertroys
Дата:
On 13 Apr 2012, at 18:43, Jeff Adams wrote:

> Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id
field(I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my
part!),but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter
thecolumn in the parent table. Does this information point towards an optimal solution? Thanks again... 

The optimal solution most likely involves the table rewrite that you're trying to prevent.

Do you really need that id column though? It seems like it's a surrogate key - perhaps the real primary key for that
datais a combination of some other columns? You say you partitioned that table, so the column you partitioned on must
bepart of that "new" PK in that case. That won't work if you depend on that id-column for other things, of course, or
ifit's really the only thing that guarantees that your records are unique (then again, are duplicates useful in any
way?).


...If that doesn't work either, I just got a really dangerous idea for an alternate solution: I was just wondering
whetheryou might have a field with 4 bytes of spare length _before_ the id column? 
If you alter the pg_attributes, reducing that field with 4 bytes, and increase the id-column to bigint (+4 bytes), you
mightend up with a working solution. 

The id's in the "resized" field would probably be really weird though and quite likely invalid for the data-type (those
last4 bytes you'd borrow of the previous column better be empty!). Plus, I'm assuming there's nothing between those
fieldson disk - in reality there's padding going on and such. As I said, it's a dangerous idea! 
It will probably get shot down by people with more knowledge about the on-disk data structure right away. For example,
ifthere's any kind of pointer to the start of your id-column, well... Also, it would probably kill any kind of
replicationon this table and will likely cause mayhem in the WAL logs as well, so you'd at least have to make sure
thereare no transactions going on (including autovacuum). 
Still, if it works in your case then it's a possible solution :P

If you're going to experiment with this, you should really try it on a separate database (preferably on a different
server)with some test data first. There's definitely a risk of losing everything in the database. It's probably a
_really_bad idea :P 

Alban Hertroys

--
The scale of a problem often equals the size of an ego.



Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Jasen Betts
Дата:
On 2012-04-13, Jeff Adams <jeff.adams@noaa.gov> wrote:
> --e89a8ff1c1e8d37deb04bd922a84
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks for the ideas Steve. I am actually working with a partitioned table
> and the field I am modifying is the id field (I have reached the cap on the
> integer data type and need to modify it to bigint - very poor planning on
> my part!), but no related tables exist. The id field in the partitioned
> tables is inherited, so I figured I needed to alter the column in the
> parent table. Does this information point towards an optimal solution?
> Thanks again...

you can drop the inheritance and alter the tables one at a time, I did
this on a partitioned table with only millions of rows when I saw
what direction it was going in.

As my data was partitioned by time the system had to run without some
of the older data for a while while my script altered and re-inherited
each partition.

--
⚂⚃ 100% natural

Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

От
Jeff Adams
Дата:
Thanks Alban. Perhaps I will go with the rewrite. I use the table for storage and analysis purposes only and am the only user, so I can take the table offline, I was wondering whether running the ALTER TABLE on the parent table makes the change in the inherited child tables? The fact that my initial stab at the ALTER TABLE lasted 14 hours or so before I cancelled suggests that it was working on the child tables?

On Sat, Apr 14, 2012 at 6:29 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 13 Apr 2012, at 18:43, Jeff Adams wrote:

> Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id field (I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my part!), but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter the column in the parent table. Does this information point towards an optimal solution? Thanks again...

The optimal solution most likely involves the table rewrite that you're trying to prevent.

Do you really need that id column though? It seems like it's a surrogate key - perhaps the real primary key for that data is a combination of some other columns? You say you partitioned that table, so the column you partitioned on must be part of that "new" PK in that case. That won't work if you depend on that id-column for other things, of course, or if it's really the only thing that guarantees that your records are unique (then again, are duplicates useful in any way?).


...If that doesn't work either, I just got a really dangerous idea for an alternate solution: I was just wondering whether you might have a field with 4 bytes of spare length _before_ the id column?
If you alter the pg_attributes, reducing that field with 4 bytes, and increase the id-column to bigint (+4 bytes), you might end up with a working solution.

The id's in the "resized" field would probably be really weird though and quite likely invalid for the data-type (those last 4 bytes you'd borrow of the previous column better be empty!). Plus, I'm assuming there's nothing between those fields on disk - in reality there's padding going on and such. As I said, it's a dangerous idea!
It will probably get shot down by people with more knowledge about the on-disk data structure right away. For example, if there's any kind of pointer to the start of your id-column, well... Also, it would probably kill any kind of replication on this table and will likely cause mayhem in the WAL logs as well, so you'd at least have to make sure there are no transactions going on (including autovacuum).
Still, if it works in your case then it's a possible solution :P

If you're going to experiment with this, you should really try it on a separate database (preferably on a different server) with some test data first. There's definitely a risk of losing everything in the database. It's probably a _really_ bad idea :P

Alban Hertroys

--
The scale of a problem often equals the size of an ego.





--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376