Обсуждение: Fast Deletion For Large Tables

Поиск
Список
Период
Сортировка

Fast Deletion For Large Tables

От
Raymond Chui
Дата:
I have some tables with huge data.
The tables have column timestamp and float.
I am try to keep up to 6 day of data values.
What I do is execute SQL below from crontab (UNIX to
schedule commands).

BEGIN;
DELETE FROM table_1 WHERE column_time < ('now'::timestamp - '6
days'::interval);
.....
DELETE FROM table_n WHERE column_time < ('now'::timestamp - '6
days'::interval);
COMMIT;


Everything is running fine, except take long time to finish.
Because some tables stored values from 50,000 to 100,000 rows
Some deletion need to deleted up to 45,000 rows.

So I am thinking just delete the rows by their row number or row ID,
like

DELETE FROM a_table WHERE row_id < 45000;

I know there is row_id in Oracle.
Is there row_id for a table in Postgres?
Thank Q!

Вложения

Re: Fast Deletion For Large Tables

От
Stephan Szabo
Дата:
On Wed, 2 Oct 2002, Raymond Chui wrote:

> I have some tables with huge data.
> The tables have column timestamp and float.
> I am try to keep up to 6 day of data values.
> What I do is execute SQL below from crontab (UNIX to
> schedule commands).
>
> BEGIN;
> DELETE FROM table_1 WHERE column_time < ('now'::timestamp - '6
> days'::interval);
> .....
> DELETE FROM table_n WHERE column_time < ('now'::timestamp - '6
> days'::interval);
> COMMIT;
>
>
> Everything is running fine, except take long time to finish.
> Because some tables stored values from 50,000 to 100,000 rows
> Some deletion need to deleted up to 45,000 rows.
>
> So I am thinking just delete the rows by their row number or row ID,
> like
>
> DELETE FROM a_table WHERE row_id < 45000;
>
> I know there is row_id in Oracle.
> Is there row_id for a table in Postgres?

Not really of that sort IIRC Oracle's row_id definition, although you
could probably fake something with a sequence.



Re: [GENERAL] Fast Deletion For Large Tables

От
Roland Roberts
Дата:
>>>>> "rc" == Raymond Chui <raymond.chui@noaa.gov> writes:

roland
--
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                     76-15 113th Street, Apt 3B
roland@astrofoto.org                       Forest Hills, NY 11375

Re: [GENERAL] Fast Deletion For Large Tables

От
Roland Roberts
Дата:
>>>>> "rc" == Raymond Chui <raymond.chui@noaa.gov> writes:

    rc> Everything is running fine, except take long time to finish.
    rc> Because some tables stored values from 50,000 to 100,000 rows
    rc> Some deletion need to deleted up to 45,000 rows.

Is column_time indexed?

roland
--
               PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                     76-15 113th Street, Apt 3B
roland@astrofoto.org                       Forest Hills, NY 11375

Re: Fast Deletion For Large Tables

От
"Nikolaus Dilger"
Дата:
Raymond,

Partitioned tables would solve your issue since you
could just truncate a partiotion in order to delete the
unneeded data.  Unfortunately they are not available in
PostgreSQL.  But maybe in a future release.

Unfortunately you also cannot use a UNION ALL in a view
to fake partitions.

So only several changes to your applicatin would allow
you to partion your data.  But depending on your
situation it may be worthwhile.  E.g. if you always
want to keep one weeks worth of data and purge a day at
a time you could have 7 tables instead of one.
table1_mon, table1_tue, table1_wed, etc.  And you could
select from all of them with a UNION ALL.

There is no rowid in PostgreSQL.  So while adding an
additional column with a sequence would help as long as
you have an index.  Just indexing your date column
maybe easier.  But then you pay with slower inserts or
the time and effort to build the index.

Regards,
Nikolaus Dilger

On Wed, 02 October 2002, Raymond Chui wrote:

>
>
> I have some tables with huge data.
> The tables have column timestamp and float.
> I am try to keep up to 6 day of data values.
> What I do is execute SQL below from crontab (UNIX to
> schedule commands).
>
> BEGIN;
> DELETE FROM table_1 WHERE column_time <
> ('now'::timestamp - '6
> days'::interval);
> .....
> DELETE FROM table_n WHERE column_time <
> ('now'::timestamp - '6
> days'::interval);
> COMMIT;
>
>
> Everything is running fine, except take long time to
> finish.
> Because some tables stored values from 50,000 to
> 100,000 rows
> Some deletion need to deleted up to 45,000 rows.
>
> So I am thinking just delete the rows by their row
> number or row ID,
> like
>
> DELETE FROM a_table WHERE row_id < 45000;
>
> I know there is row_id in Oracle.
> Is there row_id for a table in Postgres?
> Thank Q!

Re: Fast Deletion For Large Tables

От
Hubert depesz Lubaczewski
Дата:
On Wed, Oct 02, 2002 at 02:20:39PM -0400, Raymond Chui wrote:
> Everything is running fine, except take long time to finish.
> Because some tables stored values from 50,000 to 100,000 rows
> Some deletion need to deleted up to 45,000 rows.

how often do you make this "delete" process?
do you vacuum before *and* after it?
with deleteing 45000 out of 50000 (or 100000) records no index can speed
up. anyway - deleting 45000 records shouldn't take very long.
do you have any foreign keys from this table to others? any indices?

try maybe this path:
connect
vacuum
drop indices
disconnect
connect
delete from ...
create indices
vacuum analyze
disconnect

could work better.

depesz

--
hubert depesz lubaczewski                          http://www.depesz.pl/
------------------------------------------------------------------------
Mój Boże, spraw abym milczał, dopóki się nie upewnię,  że  naprawdę  mam
coś do powiedzenia.                                      (c) 1998 depesz


Вложения

Re: [GENERAL] Fast Deletion For Large Tables

От
"Shridhar Daithankar"
Дата:
On 5 Oct 2002 at 20:17, Nikolaus Dilger wrote:
> Partitioned tables would solve your issue since you
> could just truncate a partiotion in order to delete the
> unneeded data.  Unfortunately they are not available in
> PostgreSQL.  But maybe in a future release.

If you don't mind realigning your schema, inheriting tables can closely mimick
partitions upto certain extent.

Reading the original post below, I recommend you to consider inheritance
approach. Create a base table which is empty and create inherited partitions.
You can get all the data in child table by querying upon base table. So your
application would not require any modification as long as selecting data is
considered.

You have to write a wrapper on insert/update/delete code to operate upon a
particular partition.

> There is no rowid in PostgreSQL.  So while adding an
> additional column with a sequence would help as long as
> you have an index.  Just indexing your date column
> maybe easier.  But then you pay with slower inserts or
> the time and effort to build the index.

There is oid and if you don't use it you can disable it saving some space if
you have large data.

HTH

Bye
 Shridhar

--
Bachelor:    A man who chases women and never Mrs. one.


Re: Fast Deletion For Large Tables

От
"Ross J. Reedstrom"
Дата:
On Sat, Oct 05, 2002 at 08:17:38PM -0700, Nikolaus Dilger wrote:
> Raymond,
>
> Partitioned tables would solve your issue since you
> could just truncate a partiotion in order to delete the
> unneeded data.  Unfortunately they are not available in
> PostgreSQL.  But maybe in a future release.
>
> Unfortunately you also cannot use a UNION ALL in a view
> to fake partitions.
>

Hmm, you haven't tried this recently, have you? With pgsql 7.2.1,
It seems to work just fine:

test=# create view transactions as select * from monday union all select * from tuesday union all select * from
wednesday; 
CREATE
test=# \d transactions
      View "transactions"
  Column  |  Type   | Modifiers
----------+---------+-----------
 daynum   | integer |
 transact | integer |
View definition: (SELECT monday.daynum, monday.transact FROM monday UNION ALL SELECT tuesday.daynum, tuesday.transact
FROMtuesday) UNION ALL SELECT wednesday.daynum, wednesday.transact FROM wednesday; 

test=# select * from transactions;
 daynum | transact
--------+----------
      1 |        1
      1 |        2
      1 |        3
      2 |        4
      2 |        5
      2 |        6
(6 rows)

test=#

Ross