Обсуждение: full vacuum of a very large table

Поиск
Список
Период
Сортировка

full vacuum of a very large table

От
"Nic Chidu"
Дата:
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent)
with the least amount of downtime.

Doing a full vacuum would be faster on:
 - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
 - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).

Any other suggestions?

Thanks,

Nic

Re: full vacuum of a very large table

От
"Plugge, Joe R."
Дата:
Personally, provided you have the room, I would build a new table off to the side and then migrate what you need to
keepto the new table, when done, and satisfied that you have all of the candidate rows, ranem the original to table to
"x_tablename"and rename the newly created table into place to take over.... if all is good .. simply drop the
x_tablenametable. 

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nic Chidu
Sent: Tuesday, March 29, 2011 10:56 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] full vacuum of a very large table

Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent)
withthe least amount of downtime.  

Doing a full vacuum would be faster on:
 - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
 - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).

Any other suggestions?

Thanks,

Nic

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: full vacuum of a very large table

От
raghu ram
Дата:


On Tue, Mar 29, 2011 at 9:26 PM, Nic Chidu <nic@chidu.net> wrote:
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent)
with the least amount of downtime.

Doing a full vacuum would be faster on:
 - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
 - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).

Any other suggestions?


Best recommended way is, take the dump of the table after dropping un-used rows from the table and restored back to the database. Dump and reload would be faster than a VACUUM FULL.

--Raghu Ram 

Thanks,

Nic

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: full vacuum of a very large table

От
Ashish Nauriyal
Дата:


On Tue, Mar 29, 2011 at 9:26 PM, Nic Chidu <nic@chidu.net> wrote:
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent)
with the least amount of downtime.


What version of PG are you with ?
If PG>8.2  and have disk space twice the size of target table and indexes, you can reorganize the table using pg_reorg without locking table with concurrent selections and updations.

 
Doing a full vacuum would be faster on:
 - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
 - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).

Any other suggestions?

Thanks,

Nic

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



--
Thanks,
Ashish Nauriyal
EnterpriseDB Corporation
The Enterprise Postgres Company


Re: full vacuum of a very large table

От
Steve Crawford
Дата:
On 03/29/2011 09:04 AM, Plugge, Joe R. wrote:
> Personally, provided you have the room, I would build a new table off to the side and then migrate what you need to
keepto the new table, when done, and satisfied that you have all of the candidate rows, ranem the original to table to
"x_tablename"and rename the newly created table into place to take over.... if all is good .. simply drop the
x_tablenametable. 
This looks attractive but can cause issues if there are views,
foreign-keys, etc. that depend on this table.

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nic Chidu
> Sent: Tuesday, March 29, 2011 10:56 AM
> To: pgsql-admin@postgresql.org
> Subject: [ADMIN] full vacuum of a very large table
>
> Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent)
withthe least amount of downtime. 
>
> Doing a full vacuum would be faster on:
>   - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
>   - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).
>
> Any other suggestions?
>
The phrase "most recent" leads me to believe this is time-based data.
This might be a good time to partition your table to avoid this issue in
the future. If you are clever about it (and try it out on a test
environment), you might be able to create inherited tables off your main
table and then move the "live" data from the parent to the child in
unobtrusive sized chunks. When no live data remains in the parent table,
truncate the parent table (only).

Moving forward, set up your application/processes to put the data in to
the appropriately sized (day?, week?, month? year?) child table. When
the time comes, just archive and drop the child.

Cheers,
Steve


Re: full vacuum of a very large table

От
Bob Lunney
Дата:
--- On Tue, 3/29/11, Nic Chidu <nic@chidu.net> wrote:

> From: Nic Chidu <nic@chidu.net>
> Subject: [ADMIN] full vacuum of a very large table
> To: pgsql-admin@postgresql.org
> Date: Tuesday, March 29, 2011, 11:56 AM
> Got a situation where a 130 mil rows
> (137GB) table needs to be brought down in size to  10
> mil records (most recent)
> with the least amount of downtime.
>
> Doing a full vacuum would be faster on:
>  - 120 mil rows deleted and 10 mil active (delete most of
> them then full vacuum)
>  - 10 mil deleted and 120 mil active. (delete small batches
> and full vacuum after each delete).
>
> Any other suggestions?
>
> Thanks,
>
> Nic
>
> --

Nic,

Since you know the where clause to delete the 120 mil rows why not use the converse of that to select the 10 mil rows
toretain into another table, then drop the original table?  No vacuum required! 

Be sure to use the "create table as select..." syntax to avoid WAL during creation of the new table, and use a
transactionto drop the original table and rename the new one.  That way users will be querying the original table right
upuntil the switch over, when they will start using the new table.   

Foreign keys and other constraints may complicate things a bit, so check those out first.  Also, don't forget to index
andanalyze the new table before the switch over. 

Good luck!

Bob Lunney




Re: full vacuum of a very large table

От
Shrinivas Devarkonda
Дата:
yeah , This is the best solution, Where you dont need  much resources and DB to do more work, You will save a lot here.

Just simple inserts from base table to new table with your condition,
verify ,
rename base table to some other, new table to base tablename.
analyze this table.





On Tue, Mar 29, 2011 at 9:34 PM, Plugge, Joe R. <JRPlugge@west.com> wrote:
Personally, provided you have the room, I would build a new table off to the side and then migrate what you need to keep to the new table, when done, and satisfied that you have all of the candidate rows, ranem the original to table to "x_tablename" and rename the newly created table into place to take over.... if all is good .. simply drop the x_tablename table.

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nic Chidu
Sent: Tuesday, March 29, 2011 10:56 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] full vacuum of a very large table

Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent) with the least amount of downtime.

Doing a full vacuum would be faster on:
 - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
 - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).

Any other suggestions?

Thanks,

Nic

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: full vacuum of a very large table

От
Steve Crawford
Дата:
On 03/29/2011 08:56 AM, Nic Chidu wrote:
> Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to  10 mil records (most recent)
> with the least amount of downtime.
>
> Doing a full vacuum would be faster on:
>   - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
>   - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).
>
> Any other suggestions?
I didn't really ask earlier, but what amount of downtime is acceptable
and how much space is available on your drive?

If a few minutes is acceptable and you aren't completely against the
wall in disk-space, the easiest is probably to delete all the old rows
and cluster the table. Cluster requires an exclusive lock, but so does
vacuum full. And in my experience with larger tables a cluster is an
order or magnitude or more faster (and you get shiny new indexes as
well). Analyze the table immediately afterward.

The exact solution will depend on the nature of your operation (what
depends on this table, what is the nature of data inserts and queries,
etc.). If it is critical that you be able to continuously insert data,
you might copy the live rows up to some recent point in time to a
holding table, then lock the table, copy the recently added rows,
truncate and unlock the table and backfill it from the other table. This
will minimize the unavailability for inserts but may not be feasible if
you have constant query requirements, foreign-key constraints or the like.

If availability is truly critical, be sure to test whatever approach you
take in advance.

Cheers,
Steve