RE: Updating large tables without dead tuples

Поиск
Список
Период
Сортировка
От ldh@laurent-hasson.com
Тема RE: Updating large tables without dead tuples
Дата
Msg-id BY2PR15MB0872ED7F4F8158F6943A7DED85DD0@BY2PR15MB0872.namprd15.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Updating large tables without dead tuples  (Vik Fearing <vik.fearing@2ndquadrant.com>)
Список pgsql-performance

> -----Original Message-----
> From: Vik Fearing [mailto:vik.fearing@2ndquadrant.com]
> Sent: Friday, March 02, 2018 20:56
> To: ldh@laurent-hasson.com; pgsql-performance@lists.postgresql.org
> Cc: Stephen Frost <sfrost@snowman.net>
> Subject: Re: Updating large tables without dead tuples
>
> On 02/24/2018 12:27 AM, ldh@laurent-hasson.com wrote:
> > Hello
> >
> >
> >
> > I work with a large and wide table (about 300 million rows, about 50
> > columns), and from time to time, we get business requirements to make
> > some modifications. But sometimes, it's just some plain mistake. This
> > has happened to us a few weeks ago where someone made a mistake and we
> > had to update a single column of a large and wide table. Literally,
> > the source data screwed up a zip code and we had to patch on our end.
> >
> >
> >
> > Anyways. Query ran was:
> >
> >     update T set source_id = substr(sourceId, 2, 10);
> >
> > Took about 10h and created 100's of millions of dead tuples, causing
> > another couple of hours of vacuum.
> >
> >
> >
> > This was done during a maintenance window, and that table is read-only
> > except when we ETL data to it on a weekly basis, and so I was just
> > wondering why I should pay the "bloat" penalty for this type of
> > transaction. Is there a trick that could be use here?
> Yes, there is a trick I like to use here, as long as you don't mind locking the
> table (even against reads).
>
> I'll assume T.source_id is of type text.  If it's not, use whatever the actual type
> is.
>
> ALTER TABLE T
>     ALTER COLUMN source_id TYPE text USING substr(sourceId, 2, 10);
>
> I copied what you had verbatim, I earnestly hope you don't have two columns
> source_id and sourceId in your table.
>
> This will rewrite the entire table just the same as a VACUUM FULL after your
> UPDATE would.
>
> Don't forget to VACUUM ANALYZE this table after the operation.  Even though
> there will be no dead rows, you still need to VACUUM it to generate the
> visibility map and you need to ANALYZE it for statistics on your "new" column.
>
> Foreign keys remain intact with this solution and you don't have double wal
> logging like for an UPDATE.
> --
> Vik Fearing                                          +33 6 46 75 15 36
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

[Laurent Hasson]
Yes, sorry... only a single column source_id. I understand your idea... Is that because a TEXT field (vs a varchar)
wouldbe considered TOAST and be treated differently? 

Thanks,
Laurent.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Rambabu V
Дата:
Сообщение: Re: by mistake dropped physical file dropped for one table.
Следующее
От: dangal
Дата:
Сообщение: Memory size