Re: Deleting certain duplicates
От | Shea,Dan [CIS] |
---|---|
Тема | Re: Deleting certain duplicates |
Дата | |
Msg-id | F2D63B916C88C14D9B59F93C2A5DD33F0B921A@cisxa.cmc.int.ec.gc.ca обсуждение исходный текст |
Ответ на | Deleting certain duplicates ("Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>) |
Ответы |
Re: Deleting certain duplicates
(Rajesh Kumar Mallah <mallah@trade-india.com>)
|
Список | pgsql-performance |
The index is Indexes: "forecastelement_rwv_idx" btree (region_id, wx_element, valid_time) -----Original Message----- From: Shea,Dan [CIS] [mailto:Dan.Shea@ec.gc.ca] Sent: Monday, April 12, 2004 10:39 AM To: Postgres Performance Subject: [PERFORM] Deleting certain duplicates We have a large database which recently increased dramatically due to a change in our insert program allowing all entries. PWFPM_DEV=# select relname,relfilenode,reltuples from pg_class where relname = 'forecastelement'; relname | relfilenode | reltuples -----------------+-------------+------------- forecastelement | 361747866 | 4.70567e+08 Column | Type | Modifiers ----------------+-----------------------------+----------- version | character varying(99) | origin | character varying(10) | timezone | character varying(99) | region_id | character varying(20) | wx_element | character varying(99) | value | character varying(99) | flag | character(3) | units | character varying(99) | valid_time | timestamp without time zone | issue_time | timestamp without time zone | next_forecast | timestamp without time zone | reception_time | timestamp without time zone | The program is supposed to check to ensure that all fields but the reception_time are unique using a select statement, and if so, insert it. Due an error in a change, reception time was included in the select to check for duplicates. The reception_time is created by a program creating the dat file to insert. Essentially letting all duplicate files to be inserted. I tried the delete query below. PWFPM_DEV=# delete from forecastelement where oid not in (select min(oid) from forecastelement group by version,origin,timezone,region_id,wx_element,value,flag,units,valid_time,iss ue_time,next_forecast); It ran for 3 days creating what I assume is an index in pgsql_tmp of the group by statement. The query ended up failing with "dateERROR:write failed". Well the long weekend is over and we do not have the luxury of trying this again. So I was thinking maybe of doing the deletion in chunks, perhaps based on reception time. Are there any suggestions for a better way to do this, or using multiple queries to delete selectively a week at a time based on the reception_time. I would say there are a lot of duplicate entries between mid march to the first week of April. ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
В списке pgsql-performance по дате отправления: