Re: Dropping column from big table

Поиск
Список
Период
Сортировка
От sud
Тема Re: Dropping column from big table
Дата
Msg-id CAD=mzVVTE1Jx0fpsQ=OVW0CL5LiGg_RHA4kq+cbPqTShA5EwVw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Dropping column from big table  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Ответы Re: Dropping column from big table
Список pgsql-general

On Tue, Jul 16, 2024 at 6:07 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

> But the only issue would be "VACUUM FULL" will take a table lock and also it
> may take longer to run this vacuum on the full table considering the size of
> the table in TB's. Thus, is it fine to just leave it post execution of the
> "update" statement , so that the normal vacuum operation (which will be online
> operation) and that will take care of the removal of old rows ?

This is unlikely to recover the space.

The UPDATE will duplicate all rows. Since - presumably - there isn't
much free space within each partition the new rows will go at the end of
each partition, effectively doubling its size.

A regular VACUUM (whether autovacuum or invoked manually) will then
remove the old rows. and make the space available for new data. But
since that newly free space is at the beginning of each partition it
can't be returned to the OS. It would be available for new data written
to those partitions I guess not much is written to old partitons.

You could, however, do this in small steps and vacuum after each.
Something like this (in Python)

    conn = ...
    csr = conn.cursor()
    for i in range(100):
        csr.execute("UPDATE tab set id = id WHERE id % 100 = %s", (i,))
        conn.commit()
        csr.execute("VACUUM tab")
        conn.commit()

That might just be able to squeeze the new rows in between the existing
rows and not grow the table.


>  And also, As you mentioned we may also leave it as is and wait for the
> partition to be dropped, so that the dropped column with "not null" values
> which are still lying under the hood and are occupying space will be removed
> automatically. But even then, is that dropped column still lying in the rows
> with null values in it throughout its lifecycle, till the table exists in the
> database?

Yes. But a nullable column with a null value takes only a single bit of
storage, so that's negligible.

    

Thank you so much.

Normal vacuum marks the space occupied by the dead tuples as free or reusable but vacuum full removes those completely. However even with "vacuum full", the old rows will be removed completely from the storage , but the new rows will always be there with the 'dropped' column still existing under the hood along with the table storage,  with just carrying "null" values in it. However,  as it's a single bit of storage so will be having negligible overhead. If we want to fully remove that column from the table , we may have to create a new table and dump the data into that from the existing table and then rename it back to old. Is this understanding correct?

В списке pgsql-general по дате отправления:

Предыдущее
От: Ron Johnson
Дата:
Сообщение: Re: PostgreSQL Active-Active Clustering
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Dropping column from big table