Обсуждение: Updating pg_attribute to change field's data type from integer to bigint on very large table
Greetings,
I have a very large table (approximately 1 billion records). I need to change a field's data type from integer to bigint. I started up an ALTER TABLE approach yesterday and it is still running (trying to rewrite the whole table?). I remember seeing mention of being able to do this in the pg_attribute table. Has anybody ever used the latter approach. If so, instructions would be greatly appreciated. Thanks...
Jeff
I have a very large table (approximately 1 billion records). I need to change a field's data type from integer to bigint. I started up an ALTER TABLE approach yesterday and it is still running (trying to rewrite the whole table?). I remember seeing mention of being able to do this in the pg_attribute table. Has anybody ever used the latter approach. If so, instructions would be greatly appreciated. Thanks...
Jeff
Jeff Adams <jeff.adams@noaa.gov> writes: > I have a very large table (approximately 1 billion records). I need to > change a field's data type from integer to bigint. I started up an ALTER > TABLE approach yesterday and it is still running (trying to rewrite the > whole table?). I remember seeing mention of being able to do this in the > pg_attribute table. Has anybody ever used the latter approach. If so, > instructions would be greatly appreciated. Thanks... No, that is an actual on-disk change (making the field physically wider), so it's going to cost ya. There are some cases where the on-disk representation doesn't change and so a catalog update isn't needed, but int4->int8 isn't one of them. regards, tom lane
I wrote: > No, that is an actual on-disk change (making the field physically > wider), so it's going to cost ya. There are some cases where the > on-disk representation doesn't change and so a catalog update isn't > needed, but int4->int8 isn't one of them. Sheesh, hit send too quickly on that. Of course I meant "and so a catalog update is all that's needed". regards, tom lane
so i can? if so, how do i go about? i should mention that, while i dabble in postgres dba activity, it is not my day job...
--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
On Fri, Apr 13, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:Sheesh, hit send too quickly on that. Of course I meant "and so a
> No, that is an actual on-disk change (making the field physically
> wider), so it's going to cost ya. There are some cases where the
> on-disk representation doesn't change and so a catalog update isn't
> needed, but int4->int8 isn't one of them.
catalog update is all that's needed".
regards, tom lane
--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
Hi Jeff,
Bèrto
--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.
On 13 April 2012 16:30, Jeff Adams <jeff.adams@noaa.gov> wrote:
so i can? if so, how do i go about? i should mention that, while i dabble in postgres dba activity, it is not my day job...Jeffrey D. AdamsOn Fri, Apr 13, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:I wrote:Sheesh, hit send too quickly on that. Of course I meant "and so a
> No, that is an actual on-disk change (making the field physically
> wider), so it's going to cost ya. There are some cases where the
> on-disk representation doesn't change and so a catalog update isn't
> needed, but int4->int8 isn't one of them.
catalog update is all that's needed".
regards, tom lane
--
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table
От
Steve Crawford
Дата:
On 04/13/2012 08:30 AM, Jeff Adams wrote: > so i can? if so, how do i go about? i should mention that, while i > dabble in postgres dba activity, it is not my day job... > That really depends on details and your concerns. Is the database used for constant insert/update/select activity or is it a big table used for analysis and can be taken offline for some period? Is the column you want to update a primary or foreign key? How much available disk space do you have? Is a large portion of the data static (historical logs)? Some possible approaches: 1. Just let it run to completion if you can afford the maintenance time. 2. Add a new column of the appropriate type, copy the data into that column then drop the old one and rename the new one. If you do the update all at once you will have severe table bloat but you may be able to do the updates of the new column in batches so that vacuum can reclaim space between update batches. This approach may be useful if you do not have enough maintenance time to do the change all at once. 3. Dump the table data. Truncate the table and modify the column definition. Restore the data. This requires downtime but will probably be faster than in-place modification. However it's not something that you can easily cancel part-way through and not a friendly method if there are foreign-keys involved. 4. Rename the table and create a new table with the structure you want. Copy the old data back into the new table - perhaps in batches. This might be useful if you need to constantly keep collecting data but can afford a delay in analysis of the data. If partitioning the table would be beneficial, this might be a good time to consider that as well. Cheers, Steve
Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id field (I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my part!), but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter the column in the parent table. Does this information point towards an optimal solution? Thanks again...
--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
On Fri, Apr 13, 2012 at 12:31 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 04/13/2012 08:30 AM, Jeff Adams wrote:That really depends on details and your concerns. Is the database used for constant insert/update/select activity or is it a big table used for analysis and can be taken offline for some period? Is the column you want to update a primary or foreign key? How much available disk space do you have? Is a large portion of the data static (historical logs)?so i can? if so, how do i go about? i should mention that, while i dabble in postgres dba activity, it is not my day job...
Some possible approaches:
1. Just let it run to completion if you can afford the maintenance time.
2. Add a new column of the appropriate type, copy the data into that column then drop the old one and rename the new one. If you do the update all at once you will have severe table bloat but you may be able to do the updates of the new column in batches so that vacuum can reclaim space between update batches. This approach may be useful if you do not have enough maintenance time to do the change all at once.
3. Dump the table data. Truncate the table and modify the column definition. Restore the data. This requires downtime but will probably be faster than in-place modification. However it's not something that you can easily cancel part-way through and not a friendly method if there are foreign-keys involved.
4. Rename the table and create a new table with the structure you want. Copy the old data back into the new table - perhaps in batches. This might be useful if you need to constantly keep collecting data but can afford a delay in analysis of the data.
If partitioning the table would be beneficial, this might be a good time to consider that as well.
Cheers,
Steve
--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table
От
Alban Hertroys
Дата:
On 13 Apr 2012, at 18:43, Jeff Adams wrote: > Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id field(I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my part!),but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter thecolumn in the parent table. Does this information point towards an optimal solution? Thanks again... The optimal solution most likely involves the table rewrite that you're trying to prevent. Do you really need that id column though? It seems like it's a surrogate key - perhaps the real primary key for that datais a combination of some other columns? You say you partitioned that table, so the column you partitioned on must bepart of that "new" PK in that case. That won't work if you depend on that id-column for other things, of course, or ifit's really the only thing that guarantees that your records are unique (then again, are duplicates useful in any way?). ...If that doesn't work either, I just got a really dangerous idea for an alternate solution: I was just wondering whetheryou might have a field with 4 bytes of spare length _before_ the id column? If you alter the pg_attributes, reducing that field with 4 bytes, and increase the id-column to bigint (+4 bytes), you mightend up with a working solution. The id's in the "resized" field would probably be really weird though and quite likely invalid for the data-type (those last4 bytes you'd borrow of the previous column better be empty!). Plus, I'm assuming there's nothing between those fieldson disk - in reality there's padding going on and such. As I said, it's a dangerous idea! It will probably get shot down by people with more knowledge about the on-disk data structure right away. For example, ifthere's any kind of pointer to the start of your id-column, well... Also, it would probably kill any kind of replicationon this table and will likely cause mayhem in the WAL logs as well, so you'd at least have to make sure thereare no transactions going on (including autovacuum). Still, if it works in your case then it's a possible solution :P If you're going to experiment with this, you should really try it on a separate database (preferably on a different server)with some test data first. There's definitely a risk of losing everything in the database. It's probably a _really_bad idea :P Alban Hertroys -- The scale of a problem often equals the size of an ego.
On 2012-04-13, Jeff Adams <jeff.adams@noaa.gov> wrote: > --e89a8ff1c1e8d37deb04bd922a84 > Content-Type: text/plain; charset=ISO-8859-1 > > Thanks for the ideas Steve. I am actually working with a partitioned table > and the field I am modifying is the id field (I have reached the cap on the > integer data type and need to modify it to bigint - very poor planning on > my part!), but no related tables exist. The id field in the partitioned > tables is inherited, so I figured I needed to alter the column in the > parent table. Does this information point towards an optimal solution? > Thanks again... you can drop the inheritance and alter the tables one at a time, I did this on a partitioned table with only millions of rows when I saw what direction it was going in. As my data was partitioned by time the system had to run without some of the older data for a while while my script altered and re-inherited each partition. -- ⚂⚃ 100% natural
Thanks Alban. Perhaps I will go with the rewrite. I use the table for storage and analysis purposes only and am the only user, so I can take the table offline, I was wondering whether running the ALTER TABLE on the parent table makes the change in the inherited child tables? The fact that my initial stab at the ALTER TABLE lasted 14 hours or so before I cancelled suggests that it was working on the child tables?
--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
On Sat, Apr 14, 2012 at 6:29 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 13 Apr 2012, at 18:43, Jeff Adams wrote:The optimal solution most likely involves the table rewrite that you're trying to prevent.
> Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id field (I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my part!), but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter the column in the parent table. Does this information point towards an optimal solution? Thanks again...
Do you really need that id column though? It seems like it's a surrogate key - perhaps the real primary key for that data is a combination of some other columns? You say you partitioned that table, so the column you partitioned on must be part of that "new" PK in that case. That won't work if you depend on that id-column for other things, of course, or if it's really the only thing that guarantees that your records are unique (then again, are duplicates useful in any way?).
...If that doesn't work either, I just got a really dangerous idea for an alternate solution: I was just wondering whether you might have a field with 4 bytes of spare length _before_ the id column?
If you alter the pg_attributes, reducing that field with 4 bytes, and increase the id-column to bigint (+4 bytes), you might end up with a working solution.
The id's in the "resized" field would probably be really weird though and quite likely invalid for the data-type (those last 4 bytes you'd borrow of the previous column better be empty!). Plus, I'm assuming there's nothing between those fields on disk - in reality there's padding going on and such. As I said, it's a dangerous idea!
It will probably get shot down by people with more knowledge about the on-disk data structure right away. For example, if there's any kind of pointer to the start of your id-column, well... Also, it would probably kill any kind of replication on this table and will likely cause mayhem in the WAL logs as well, so you'd at least have to make sure there are no transactions going on (including autovacuum).
Still, if it works in your case then it's a possible solution :P
If you're going to experiment with this, you should really try it on a separate database (preferably on a different server) with some test data first. There's definitely a risk of losing everything in the database. It's probably a _really_ bad idea :P
Alban Hertroys
--
The scale of a problem often equals the size of an ego.
--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376