Re: Deleting certain duplicates

Поиск
Список
Период
Сортировка
От Greg Sabino Mullane
Тема Re: Deleting certain duplicates
Дата
Msg-id 82428c807d3c47fbcd0c013dbcab12b5@biglumber.com
обсуждение исходный текст
Ответ на Deleting certain duplicates  ("Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>)
Список pgsql-performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> So I was thinking maybe of doing the deletion in chunks, perhaps based on
> reception time.
> Are there any suggestions for a better way to do this, or using multiple
> queries to delete selectively a week at a time based on the reception_time.
> I would say there are a lot of duplicate entries between mid march to the
> first week of April.

You are on the right track, in that dividing up the table will help. However,
you cannot divide on the reception_time as that is the unique column. Analyze
your data and divide on a row with a fairly uniform distribution over the
time period in question. Then copy a segment out, clean it up, and put it
back in. Make sure there is an index on the column in question, of course.

For example, if 1/10 of the table has a "units" of 12, you could do something
like this:

CREATE INDEX units_dev ON forecastelement (units);

CREATE TEMPORARY TABLE units_temp AS SELECT * FROM forecastelement WHERE units='12';

CREATE INDEX units_oid_index ON units_temp(oid);

(Delete out duplicate rows from units_temp using your previous query or something else)

DELETE FROM forecastelement WHERE units='12';

INSERT INTO forecastelement SELECT * FROM units_temp;

DELETE FROM units_temp;

Repeat as needed until all rows are done. Subsequent runs can be done by doing a

INSERT INTO units_temp SELECT * FROM forecastelement WHERE units='...'

and skipping the CREATE INDEX steps.

On the other hand, your original deletion query may work as is, with the addition
of an oid index. Perhaps try an EXPLAIN on it.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200404200706

-----BEGIN PGP SIGNATURE-----

iD8DBQFAhQVWvJuQZxSWSsgRAvLEAKDCVcX3Llm8JgszI/BBC1SobtjVawCfVGKu
ERcV5J2JolwgZRhMbXnNM90=
=JqET
-----END PGP SIGNATURE-----



В списке pgsql-performance по дате отправления:

Предыдущее
От: ohp@pyrenet.fr
Дата:
Сообщение: Re: Wierd context-switching issue on Xeon
Следующее
От: Jeff
Дата:
Сообщение: Re: Wierd context-switching issue on Xeon