Обсуждение: Multiple occurence

Поиск

Список

Период

Сортировка

Multiple occurence

От

Nipuna

Дата:

17 марта 2014 г., 17:29:44

Hi,

I have large table named duplicate_files (47GB) shown below. I need to find the multiple occurrence of the file_name with same size in each path.

FILE_NAME	FILESIZE	FULL_PATH
ABC.txt	12	I_12_122
ABC.txt	14	I_12_123
ABC.txt	12	I_12_125
ABC.txt	12	I_13_156
ABC.txt	14	I_14_123
ABC.txt	12	I_11_125
ABC.txt	15	I_12_123
ABC.txt	16	I_12_123
ABC.txt	11	I_12_123

The output is shown below.

FILE_NAME	FILESIZE	FULL_PATH
ABC.txt	12	I_12_122
ABC.txt	12	I_12_125
ABC.txt	12	I_13_156
ABC.txt	12	I_11_125

I used the query below to get the output.But it took me 6 hrs to get the output. Is there any other better way to increase the speed for faster results?

select file_name,filesize, full_path from duplicate_files f1 where
	(file_name,filesize) in (select file_name,filesize from duplicate_files group by file_name,filesize having count(file_name) >1);

Any help or advice appreciated. Thanks

Regards,

Nipuna

Re: Multiple occurence

От

David Johnston

Дата:

17 марта 2014 г., 18:36:50

nipuna wrote
>     select file_name,filesize, full_path from duplicate_files f1 where
>
>   (file_name,filesize) in (select file_name,filesize from duplicate_files
> group by file_name,filesize having count(file_name) >1);

Maybe:

Select file_name, filesize, array_agg(full_path)
From duplicate_files
Group by file_name, filesize
Having count(*) > 1

You can always unnest the array later if needed.

David J.







--
View this message in context: http://postgresql.1045698.n5.nabble.com/Multiple-occurence-tp5796336p5796345.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Multiple occurence

Multiple occurence

Re: Multiple occurence