Обсуждение: Multiple occurence

Поиск
Список
Период
Сортировка

Multiple occurence

От
Nipuna
Дата:
Hi,

I have large table named duplicate_files (47GB) shown below. I need to  find the multiple occurrence of the file_name with same size in each path.



FILE_NAME    
FILESIZE   
FULL_PATH
ABC.txt12I_12_122
ABC.txt14I_12_123
ABC.txt12I_12_125
ABC.txt12I_13_156
ABC.txt14I_14_123
ABC.txt12I_11_125
ABC.txt15I_12_123
ABC.txt16I_12_123
ABC.txt11I_12_123








The output is shown below.

FILE_NAME  FILESIZE
FULL_PATH
ABC.txt 12I_12_122
ABC.txt 
 12I_12_125
ABC.txt        
 12I_13_156
ABC.txt 12I_11_125
 
I used the query below to get the output.But it took me 6 hrs to get the output. Is there any other better way to increase the speed for faster results?


select file_name,filesize, full_path from duplicate_files f1 where 





(file_name,filesize) in (select file_name,filesize from duplicate_files group by file_name,filesize having count(file_name) >1);








Any help or advice appreciated. Thanks































































 
Regards,
             Nipuna
 

Re: Multiple occurence

От
David Johnston
Дата:
nipuna wrote
>     select file_name,filesize, full_path from duplicate_files f1 where
>
>   (file_name,filesize) in (select file_name,filesize from duplicate_files
> group by file_name,filesize having count(file_name) >1);

Maybe:

Select file_name, filesize, array_agg(full_path)
From duplicate_files
Group by file_name, filesize
Having count(*) > 1

You can always unnest the array later if needed.

David J.







--
View this message in context: http://postgresql.1045698.n5.nabble.com/Multiple-occurence-tp5796336p5796345.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.