Re: How to do faster DML

Поиск
Список
Период
Сортировка
От Peter J. Holzer
Тема Re: How to do faster DML
Дата
Msg-id 20240203202947.u45hpyiwzuodbav2@hjp.at
обсуждение исходный текст
Ответ на Re: How to do faster DML  (Lok P <loknath.73@gmail.com>)
Список pgsql-general
On 2024-02-03 19:25:12 +0530, Lok P wrote:
> Apology. One correction, the query is like below. I. E filter will be on on
> ctid which I believe is equivalent of rowid in oracle and we will not need the
> index on Id column then. 
>
>  But, it still runs long, so thinking any other way to make the duplicate
> removal faster? 
>
> Also wondering , the index creation which took ~2.5hrs+ , would that have been
> made faster any possible way by allowing more db resource through some session
> level db parameter setting?
>
> create table TAB1_New
> as
> SELECT  * from TAB1 A
> where CTID in
>       (select min(CTID) from TAB1
>       group by ID having count(ID)>=1 );

That »having count(ID)>=1« seems redundant to me. Surely every id which
occurs in the table occurs at least once?

Since you want ID to be unique I assume that it is already almost
unique - so only a small fraction of the ids will be duplicates. So I
would start with creating a list of duplicates:

create table tab1_dups as
select id, count(*) from tab1 group by id having count(*) > 1;

This will still take some time because it needs to build a temporary
structure large enough to hold a count for each individual id. But at
least then you'll have a much smaller table to use for further cleanup.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Francisco Olarte
Дата:
Сообщение: Re: How to do faster DML
Следующее
От: Lok P
Дата:
Сообщение: Re: How to do faster DML