Re: UPDATE many records

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: UPDATE many records
Дата
Msg-id 91870B50-CDFB-489C-BE01-59B281A9CDCD@gmail.com
обсуждение исходный текст
Ответ на Re: UPDATE many records  (Israel Brewster <ijbrewster@alaska.edu>)
Ответы Re: UPDATE many records
Список pgsql-general
> On 6 Jan 2020, at 21:15, Israel Brewster <ijbrewster@alaska.edu> wrote:
>
>> On Jan 6, 2020, at 10:08 AM, Christopher Browne <cbbrowne@gmail.com> wrote:
>>
>> On Mon, 6 Jan 2020 at 13:36, Israel Brewster <ijbrewster@alaska.edu> wrote:
>> Thanks to a change in historical data, I have a need to update a large number of records (around 50 million). The
updateitself is straight forward, as I can just issue an "UPDATE table_name SET changed_field=new_value();" (yes,
new_valueis the result of a stored procedure, if that makes a difference) command via psql, and it should work.
However,due to the large number of records this command will obviously take a while, and if anything goes wrong during
theupdate (one bad value in row 45 million, lost connection, etc), all the work that has been done already will be lost
dueto the transactional nature of such commands (unless I am missing something). 
>>
>> Given that each row update is completely independent of any other row, I have the following questions:
>>
>> 1) Is there any way to set the command such that each row change is committed as it is calculated?
>> 2) Is there some way to run this command in parallel in order to better utilize multiple processor cores, other than
manuallybreaking the data into chunks and running a separate psql/update process for each chunk? Honestly, manual
parallelizingwouldn’t be too bad (there are a number of logical segregations I can apply), I’m just wondering if there
isa more automatic option. 
>>
>> Yeah, I'd be inclined to do this in batches.

I think you’re overcomplicating the matter.

I’d just do it as a single update in one transaction. It’s only 50M rows. It may take half an hour or so on decent
hardware,depending on how resource-intensive your function is. 

If that fails[1], only then would I start looking into batching things. But then you still need to figure out why it
failsand what to do about that; if it fails it will probably fail fast, and if not, then you’re looking at a one-off
situationthat won’t require more than a few workarounds - after which you can just run the update again. 

Ad 1). No harm has been done, it’s a single transaction that rolled back.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




В списке pgsql-general по дате отправления:

Предыдущее
От: Justin
Дата:
Сообщение: Re: UPDATE many records
Следующее
От: Christopher Browne
Дата:
Сообщение: Re: UPDATE many records