Обсуждение: Fast Deletion For Large Tables
I have some tables with huge data. The tables have column timestamp and float. I am try to keep up to 6 day of data values. What I do is execute SQL below from crontab (UNIX to schedule commands). BEGIN; DELETE FROM table_1 WHERE column_time < ('now'::timestamp - '6 days'::interval); ..... DELETE FROM table_n WHERE column_time < ('now'::timestamp - '6 days'::interval); COMMIT; Everything is running fine, except take long time to finish. Because some tables stored values from 50,000 to 100,000 rows Some deletion need to deleted up to 45,000 rows. So I am thinking just delete the rows by their row number or row ID, like DELETE FROM a_table WHERE row_id < 45000; I know there is row_id in Oracle. Is there row_id for a table in Postgres? Thank Q!
Вложения
On Wed, 2 Oct 2002, Raymond Chui wrote: > I have some tables with huge data. > The tables have column timestamp and float. > I am try to keep up to 6 day of data values. > What I do is execute SQL below from crontab (UNIX to > schedule commands). > > BEGIN; > DELETE FROM table_1 WHERE column_time < ('now'::timestamp - '6 > days'::interval); > ..... > DELETE FROM table_n WHERE column_time < ('now'::timestamp - '6 > days'::interval); > COMMIT; > > > Everything is running fine, except take long time to finish. > Because some tables stored values from 50,000 to 100,000 rows > Some deletion need to deleted up to 45,000 rows. > > So I am thinking just delete the rows by their row number or row ID, > like > > DELETE FROM a_table WHERE row_id < 45000; > > I know there is row_id in Oracle. > Is there row_id for a table in Postgres? Not really of that sort IIRC Oracle's row_id definition, although you could probably fake something with a sequence.
>>>>> "rc" == Raymond Chui <raymond.chui@noaa.gov> writes: roland -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD RL Enterprises roland@rlenter.com 76-15 113th Street, Apt 3B roland@astrofoto.org Forest Hills, NY 11375
>>>>> "rc" == Raymond Chui <raymond.chui@noaa.gov> writes: rc> Everything is running fine, except take long time to finish. rc> Because some tables stored values from 50,000 to 100,000 rows rc> Some deletion need to deleted up to 45,000 rows. Is column_time indexed? roland -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD RL Enterprises roland@rlenter.com 76-15 113th Street, Apt 3B roland@astrofoto.org Forest Hills, NY 11375
Raymond, Partitioned tables would solve your issue since you could just truncate a partiotion in order to delete the unneeded data. Unfortunately they are not available in PostgreSQL. But maybe in a future release. Unfortunately you also cannot use a UNION ALL in a view to fake partitions. So only several changes to your applicatin would allow you to partion your data. But depending on your situation it may be worthwhile. E.g. if you always want to keep one weeks worth of data and purge a day at a time you could have 7 tables instead of one. table1_mon, table1_tue, table1_wed, etc. And you could select from all of them with a UNION ALL. There is no rowid in PostgreSQL. So while adding an additional column with a sequence would help as long as you have an index. Just indexing your date column maybe easier. But then you pay with slower inserts or the time and effort to build the index. Regards, Nikolaus Dilger On Wed, 02 October 2002, Raymond Chui wrote: > > > I have some tables with huge data. > The tables have column timestamp and float. > I am try to keep up to 6 day of data values. > What I do is execute SQL below from crontab (UNIX to > schedule commands). > > BEGIN; > DELETE FROM table_1 WHERE column_time < > ('now'::timestamp - '6 > days'::interval); > ..... > DELETE FROM table_n WHERE column_time < > ('now'::timestamp - '6 > days'::interval); > COMMIT; > > > Everything is running fine, except take long time to > finish. > Because some tables stored values from 50,000 to > 100,000 rows > Some deletion need to deleted up to 45,000 rows. > > So I am thinking just delete the rows by their row > number or row ID, > like > > DELETE FROM a_table WHERE row_id < 45000; > > I know there is row_id in Oracle. > Is there row_id for a table in Postgres? > Thank Q!
On Wed, Oct 02, 2002 at 02:20:39PM -0400, Raymond Chui wrote: > Everything is running fine, except take long time to finish. > Because some tables stored values from 50,000 to 100,000 rows > Some deletion need to deleted up to 45,000 rows. how often do you make this "delete" process? do you vacuum before *and* after it? with deleteing 45000 out of 50000 (or 100000) records no index can speed up. anyway - deleting 45000 records shouldn't take very long. do you have any foreign keys from this table to others? any indices? try maybe this path: connect vacuum drop indices disconnect connect delete from ... create indices vacuum analyze disconnect could work better. depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ Mój Boże, spraw abym milczał, dopóki się nie upewnię, że naprawdę mam coś do powiedzenia. (c) 1998 depesz
Вложения
On 5 Oct 2002 at 20:17, Nikolaus Dilger wrote: > Partitioned tables would solve your issue since you > could just truncate a partiotion in order to delete the > unneeded data. Unfortunately they are not available in > PostgreSQL. But maybe in a future release. If you don't mind realigning your schema, inheriting tables can closely mimick partitions upto certain extent. Reading the original post below, I recommend you to consider inheritance approach. Create a base table which is empty and create inherited partitions. You can get all the data in child table by querying upon base table. So your application would not require any modification as long as selecting data is considered. You have to write a wrapper on insert/update/delete code to operate upon a particular partition. > There is no rowid in PostgreSQL. So while adding an > additional column with a sequence would help as long as > you have an index. Just indexing your date column > maybe easier. But then you pay with slower inserts or > the time and effort to build the index. There is oid and if you don't use it you can disable it saving some space if you have large data. HTH Bye Shridhar -- Bachelor: A man who chases women and never Mrs. one.
On Sat, Oct 05, 2002 at 08:17:38PM -0700, Nikolaus Dilger wrote: > Raymond, > > Partitioned tables would solve your issue since you > could just truncate a partiotion in order to delete the > unneeded data. Unfortunately they are not available in > PostgreSQL. But maybe in a future release. > > Unfortunately you also cannot use a UNION ALL in a view > to fake partitions. > Hmm, you haven't tried this recently, have you? With pgsql 7.2.1, It seems to work just fine: test=# create view transactions as select * from monday union all select * from tuesday union all select * from wednesday; CREATE test=# \d transactions View "transactions" Column | Type | Modifiers ----------+---------+----------- daynum | integer | transact | integer | View definition: (SELECT monday.daynum, monday.transact FROM monday UNION ALL SELECT tuesday.daynum, tuesday.transact FROMtuesday) UNION ALL SELECT wednesday.daynum, wednesday.transact FROM wednesday; test=# select * from transactions; daynum | transact --------+---------- 1 | 1 1 | 2 1 | 3 2 | 4 2 | 5 2 | 6 (6 rows) test=# Ross