Обсуждение: Alter the column data type of the large data volume table.

Поиск
Список
Период
Сортировка

Alter the column data type of the large data volume table.

От
charles meng
Дата:
Hi all,

I have a table with 1.6 billion records. The data type of the primary key column is incorrectly used as integer. I need to replace the type of the column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original table data to the temporary table

Thanks in advance.

Re: Alter the column data type of the large data volume table.

От
Michael Lewis
Дата:
On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:
Hi all,

I have a table with 1.6 billion records. The data type of the primary key column is incorrectly used as integer. I need to replace the type of the column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original table data to the temporary table

Thanks in advance.

You can add a new column with NO default value and null as default and have it be very fast. Then you can gradually update rows in batches (if on PG11+, perhaps use do script with a loop to commit after X rows) to set the new column the same as the primary key. Lastly, in a transaction, update any new rows where the bigint column is null, and change which column is the primary key & drop the old one. This should keep each transaction reasonably sized to not hold up other processes.

Re: Alter the column data type of the large data volume table.

От
Rich Shepard
Дата:
On Thu, 3 Dec 2020, Michael Lewis wrote:

> On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

>> I have a table with 1.6 billion records. The data type of the primary key
>> column is incorrectly used as integer. I need to replace the type of the
>> column with bigint. Is there any ideas for this?

> You can add a new column with NO default value and null as default and have
> it be very fast. Then you can gradually update rows in batches (if on
> PG11+, perhaps use do script with a loop to commit after X rows) to set the
> new column the same as the primary key. Lastly, in a transaction, update
> any new rows where the bigint column is null, and change which column is
> the primary key & drop the old one. This should keep each transaction
> reasonably sized to not hold up other processes.

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used the
above to increase their size. Worked perfectly.

Regards,

Rich



Re: Alter the column data type of the large data volume table.

От
Michael Lewis
Дата:
On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Thu, 3 Dec 2020, Michael Lewis wrote:

> On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

>> I have a table with 1.6 billion records. The data type of the primary key
>> column is incorrectly used as integer. I need to replace the type of the
>> column with bigint. Is there any ideas for this?

> You can add a new column with NO default value and null as default and have
> it be very fast. Then you can gradually update rows in batches (if on
> PG11+, perhaps use do script with a loop to commit after X rows) to set the
> new column the same as the primary key. Lastly, in a transaction, update
> any new rows where the bigint column is null, and change which column is
> the primary key & drop the old one. This should keep each transaction
> reasonably sized to not hold up other processes.

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used the
above to increase their size. Worked perfectly.

Regards,

Rich

Afaik, it will require an access exclusive lock for the entire time it takes to re-write the 1.6 billion rows and update all indexes. That sort of lock out time doesn't seem workable in many production systems.

Re: Alter the column data type of the large data volume table.

От
Rich Shepard
Дата:
On Thu, 3 Dec 2020, Michael Lewis wrote:

> Afaik, it will require an access exclusive lock for the entire time it
> takes to re-write the 1.6 billion rows and update all indexes. That sort
> of lock out time doesn't seem workable in many production systems.

Michael,

Okay. I hadn't thought of that.

Stay well,

Rich



Re: Alter the column data type of the large data volume table.

От
Ron
Дата:
On 12/3/20 11:26 AM, Michael Lewis wrote:
On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Thu, 3 Dec 2020, Michael Lewis wrote:

> On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

>> I have a table with 1.6 billion records. The data type of the primary key
>> column is incorrectly used as integer. I need to replace the type of the
>> column with bigint. Is there any ideas for this?

> You can add a new column with NO default value and null as default and have
> it be very fast. Then you can gradually update rows in batches (if on
> PG11+, perhaps use do script with a loop to commit after X rows) to set the
> new column the same as the primary key. Lastly, in a transaction, update
> any new rows where the bigint column is null, and change which column is
> the primary key & drop the old one. This should keep each transaction
> reasonably sized to not hold up other processes.

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used the
above to increase their size. Worked perfectly.

Regards,

Rich

Afaik, it will require an access exclusive lock for the entire time it takes to re-write the 1.6 billion rows and update all indexes. That sort of lock out time doesn't seem workable in many production systems.

Yet another argument for partitioning!

1. Split split all the partitions from the main table,
2. drop the PK,
3. do all the ALTER statements in parallel,
4. recreate the PK indices, then
5. join them back to the main table.

Not instant, but faster than updating 1.6Bn rows in one single giant statement.

(Of course, that doesn't help OP with his current problem.)

--
Angular momentum makes the world go 'round.

Re: Alter the column data type of the large data volume table.

От
Michael Lewis
Дата:


On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used the
above to increase their size. Worked perfectly.


Something else noteworthy is that with varchar, there is no rewrite of the table. You are just removing or loosening the length restriction on a variable width column type. I believe you could change all columns from VARCHAR(n) to TEXT or VARCHAR(n+X) and have it take almost no time at all since you are only impacting the catalog tables (pretty sure about that at least). With a fixed width column like int4 to int8, all the rows need to be actually re-written.

Re: Alter the column data type of the large data volume table.

От
Ron
Дата:
On 12/3/20 11:53 AM, Michael Lewis wrote:


On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used the
above to increase their size. Worked perfectly.


Something else noteworthy is that with varchar, there is no rewrite of the table. You are just removing or loosening the length restriction on a variable width column type. I believe you could change all columns from VARCHAR(n) to TEXT or VARCHAR(n+X) and have it take almost no time at all since you are only impacting the catalog tables (pretty sure about that at least). With a fixed width column like int4 to int8, all the rows need to be actually re-written.

And in this case it's the PK, so indexed and thus even slower.  Lots slowe.

--
Angular momentum makes the world go 'round.

Re: Alter the column data type of the large data volume table.

От
charles meng
Дата:
What I mean is that it will take a lot of time to complete all data processing.I have to say that it is a good solution to adjust the column type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for MySQL...


regards.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五 下午1:04写道:
Please do reply all.

Not sure what you mean about it taking too much time. It's rewriting a bunch of data. It's going to take a while. The best you can do is break the work up into small pieces and commit each piece.

On Thu, Dec 3, 2020, 7:11 PM charles meng <xlyybz@gmail.com> wrote:
Thanks for your help, I think the first method I tried (adding temporary column) is similar to what you said, but it takes too much time for me.


Thanks again.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五 上午1:11写道:
On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:
Hi all,

I have a table with 1.6 billion records. The data type of the primary key column is incorrectly used as integer. I need to replace the type of the column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original table data to the temporary table

Thanks in advance.

You can add a new column with NO default value and null as default and have it be very fast. Then you can gradually update rows in batches (if on PG11+, perhaps use do script with a loop to commit after X rows) to set the new column the same as the primary key. Lastly, in a transaction, update any new rows where the bigint column is null, and change which column is the primary key & drop the old one. This should keep each transaction reasonably sized to not hold up other processes.

Re: Alter the column data type of the large data volume table.

От
Olivier Gautherot
Дата:
Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz@gmail.com> wrote:
What I mean is that it will take a lot of time to complete all data processing.I have to say that it is a good solution to adjust the column type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for MySQL...


MySQL has its own strategy with regard to column handling so what works there does not necessarily fit here. 

There are some good ideas in this thread but I would start with a few checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in Production) to add a column typed bigint - if it is recent enough it will be a simple catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested, progressively in batches of a few tens of thousands. Make sure you commit and vacuum after each to retrieve the space (or you may end up with a storage space issue in addition to all the rest). In the meantime, add a trigger to set the new column to the index value. Once the new column is complete, drop the old column and set the new one as primary key (it may take a while to recalculate the index).

4) If your table is still growing, I would definitely look into partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

 
regards.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五 下午1:04写道:
Please do reply all.

Not sure what you mean about it taking too much time. It's rewriting a bunch of data. It's going to take a while. The best you can do is break the work up into small pieces and commit each piece.

On Thu, Dec 3, 2020, 7:11 PM charles meng <xlyybz@gmail.com> wrote:
Thanks for your help, I think the first method I tried (adding temporary column) is similar to what you said, but it takes too much time for me.


Thanks again.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五 上午1:11写道:
On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:
Hi all,

I have a table with 1.6 billion records. The data type of the primary key column is incorrectly used as integer. I need to replace the type of the column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original table data to the temporary table

Thanks in advance.

You can add a new column with NO default value and null as default and have it be very fast. Then you can gradually update rows in batches (if on PG11+, perhaps use do script with a loop to commit after X rows) to set the new column the same as the primary key. Lastly, in a transaction, update any new rows where the bigint column is null, and change which column is the primary key & drop the old one. This should keep each transaction reasonably sized to not hold up other processes.

Cheers
--
Olivier Gautherot
 

Libre de virus. www.avast.com

Re: Alter the column data type of the large data volume table.

От
Olivier Gautherot
Дата:

Hi Charles,

On 04-12-2020 9:44, Olivier Gautherot wrote:
Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz@gmail.com> wrote:
What I mean is that it will take a lot of time to complete all data processing.I have to say that it is a good solution to adjust the column type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for MySQL...


MySQL has its own strategy with regard to column handling so what works there does not necessarily fit here. 

There are some good ideas in this thread but I would start with a few checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in Production) to add a column typed bigint - if it is recent enough it will be a simple catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested, progressively in batches of a few tens of thousands. Make sure you commit and vacuum after each to retrieve the space (or you may end up with a storage space issue in addition to all the rest). In the meantime, add a trigger to set the new column to the index value. Once the new column is complete, drop the old column and set the new one as primary key (it may take a while to recalculate the index).

4) If your table is still growing, I would definitely look into partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.


Cheers
--
Olivier Gautherot

Re: Alter the column data type of the large data volume table.

От
charles meng
Дата:
Hi Olivier,

My PG version is 10.
Anyway, thanks a lot for your help.

Best regards.

Olivier Gautherot <ogautherot@gautherot.net> 于2020年12月4日周五 下午6:14写道:

Hi Charles,

On 04-12-2020 9:44, Olivier Gautherot wrote:
Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz@gmail.com> wrote:
What I mean is that it will take a lot of time to complete all data processing.I have to say that it is a good solution to adjust the column type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for MySQL...


MySQL has its own strategy with regard to column handling so what works there does not necessarily fit here. 

There are some good ideas in this thread but I would start with a few checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in Production) to add a column typed bigint - if it is recent enough it will be a simple catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested, progressively in batches of a few tens of thousands. Make sure you commit and vacuum after each to retrieve the space (or you may end up with a storage space issue in addition to all the rest). In the meantime, add a trigger to set the new column to the index value. Once the new column is complete, drop the old column and set the new one as primary key (it may take a while to recalculate the index).

4) If your table is still growing, I would definitely look into partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.


Cheers
--
Olivier Gautherot

RE: Alter the column data type of the large data volume table.

От
Kevin Brannen
Дата:

>From: Olivier Gautherot <ogautherot@gautherot.net>

>>5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

>Sorry, just tried this one and it failed: type mismatch.

 

Seems like a sound idea in general. I’d probably rename the tables, let’s call them “big_hist” for the old big table and “big_split” for the new partitioned table that being used go forward – assuming the original table was called “big”. Then create a View that will look at both of those but call it the same as the old table, and let the view do a type cast on the old key like big_hist.id::bigint so it matches the new type, because the view will probably be a union and the type need to match. That way your application only has to pause long enough to do a few meta-commands then it all can resume, and like Olivier pointed you, you can fix the data by moving it from big_hist to big_split in the background as you have time.

 

I’d probably put it all in a transaction too:

 

Create table … -- all the commands to create your patitioned table big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other cols */ from big_hist;

Commit;

 

Try it on a dev system and if it works you’re off and running. I’d expect the view to slow things down a little, but probably not too much if you have good indexes. But at least you could transition without major downtime and then rename “big_split” back to “big” and drop “big_hist” when you’ve finished the transition. I might even be tempted to add a trigger so that all new inserts into “big” really go into “big_split” so “big_hist” doesn’t grow any more. Your imagination is probably the limit. 😊

 

HTH,

Kevin

.

 

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

Re: Alter the column data type of the large data volume table.

От
Michael Lewis
Дата:



On Fri, Dec 4, 2020 at 9:04 AM Kevin Brannen <KBrannen@efji.com> wrote:

>From: Olivier Gautherot <ogautherot@gautherot.net>

>>5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

>Sorry, just tried this one and it failed: type mismatch.

 

Seems like a sound idea in general. I’d probably rename the tables, let’s call them “big_hist” for the old big table and “big_split” for the new partitioned table that being used go forward – assuming the original table was called “big”. Then create a View that will look at both of those but call it the same as the old table, and let the view do a type cast on the old key like big_hist.id::bigint so it matches the new type, because the view will probably be a union and the type need to match. That way your application only has to pause long enough to do a few meta-commands then it all can resume, and like Olivier pointed you, you can fix the data by moving it from big_hist to big_split in the background as you have time.

 

I’d probably put it all in a transaction too:

 

Create table … -- all the commands to create your patitioned table big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other cols */ from big_hist;

Commit;

 

Try it on a dev system and if it works you’re off and running. I’d expect the view to slow things down a little, but probably not too much if you have good indexes. But at least you could transition without major downtime and then rename “big_split” back to “big” and drop “big_hist” when you’ve finished the transition. I might even be tempted to add a trigger so that all new inserts into “big” really go into “big_split” so “big_hist” doesn’t grow any more. Your imagination is probably the limit. 😊


Yes, "instead of" triggers for insert/update/delete will make the change transparent to the application side, other than the potential for slowness while in the process of moving the data and still using the view.

Also, I'd advocate for UNION ALL to avoid the "distinct" work that is required for plain UNION. I wish ALL were default behavior and "UNION DISTINCT" was how get that behavior.

If partitioning is going to happen anyway in the near future, now is a great time to make that happen. Unfortunately, PG13 is required to use logical replication from a non-partitioned table to a partitioned table so moving the data still requires some creative work.

Re: Alter the column data type of the large data volume table.

От
Olivier Gautherot
Дата:
On Fri, Dec 4, 2020 at 5:22 PM Michael Lewis <mlewis@entrata.com> wrote:

On Fri, Dec 4, 2020 at 9:04 AM Kevin Brannen <KBrannen@efji.com> wrote:

>From: Olivier Gautherot <ogautherot@gautherot.net>

>>5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

>Sorry, just tried this one and it failed: type mismatch.

 

Seems like a sound idea in general. I’d probably rename the tables, let’s call them “big_hist” for the old big table and “big_split” for the new partitioned table that being used go forward – assuming the original table was called “big”. Then create a View that will look at both of those but call it the same as the old table, and let the view do a type cast on the old key like big_hist.id::bigint so it matches the new type, because the view will probably be a union and the type need to match. That way your application only has to pause long enough to do a few meta-commands then it all can resume, and like Olivier pointed you, you can fix the data by moving it from big_hist to big_split in the background as you have time.

 

I’d probably put it all in a transaction too:

 

Create table … -- all the commands to create your patitioned table big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other cols */ from big_hist;

Commit;

 

Try it on a dev system and if it works you’re off and running. I’d expect the view to slow things down a little, but probably not too much if you have good indexes. But at least you could transition without major downtime and then rename “big_split” back to “big” and drop “big_hist” when you’ve finished the transition. I might even be tempted to add a trigger so that all new inserts into “big” really go into “big_split” so “big_hist” doesn’t grow any more. Your imagination is probably the limit. 😊


Yes, "instead of" triggers for insert/update/delete will make the change transparent to the application side, other than the potential for slowness while in the process of moving the data and still using the view.

Also, I'd advocate for UNION ALL to avoid the "distinct" work that is required for plain UNION. I wish ALL were default behavior and "UNION DISTINCT" was how get that behavior.

If partitioning is going to happen anyway in the near future, now is a great time to make that happen. Unfortunately, PG13 is required to use logical replication from a non-partitioned table to a partitioned table so moving the data still requires some creative work.

  Rather than union, use inheritance with a constraint on the primary key: it will take care of the union transparently for you.
 --
Olivier Gautherot

Libre de virus. www.avast.com

Re: Alter the column data type of the large data volume table.

От
charles meng
Дата:
Hi Kevin,

This sounds like a good idea, I will work hard on this idea and let you know the result.

Most appreciated.

Kevin Brannen <KBrannen@efji.com> 于2020年12月5日周六 上午12:04写道:

>From: Olivier Gautherot <ogautherot@gautherot.net>

>>5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

>Sorry, just tried this one and it failed: type mismatch.

 

Seems like a sound idea in general. I’d probably rename the tables, let’s call them “big_hist” for the old big table and “big_split” for the new partitioned table that being used go forward – assuming the original table was called “big”. Then create a View that will look at both of those but call it the same as the old table, and let the view do a type cast on the old key like big_hist.id::bigint so it matches the new type, because the view will probably be a union and the type need to match. That way your application only has to pause long enough to do a few meta-commands then it all can resume, and like Olivier pointed you, you can fix the data by moving it from big_hist to big_split in the background as you have time.

 

I’d probably put it all in a transaction too:

 

Create table … -- all the commands to create your patitioned table big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other cols */ from big_hist;

Commit;

 

Try it on a dev system and if it works you’re off and running. I’d expect the view to slow things down a little, but probably not too much if you have good indexes. But at least you could transition without major downtime and then rename “big_split” back to “big” and drop “big_hist” when you’ve finished the transition. I might even be tempted to add a trigger so that all new inserts into “big” really go into “big_split” so “big_hist” doesn’t grow any more. Your imagination is probably the limit. 😊

 

HTH,

Kevin

.

 

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.