I have large table named duplicate_files (47GB) shown below. I need to find the multiple occurrence of the file_name with same size in each path.
FILE_NAME
FILESIZE
FULL_PATH
ABC.txt
12
I_12_122
ABC.txt
14
I_12_123
ABC.txt
12
I_12_125
ABC.txt
12
I_13_156
ABC.txt
14
I_14_123
ABC.txt
12
I_11_125
ABC.txt
15
I_12_123
ABC.txt
16
I_12_123
ABC.txt
11
I_12_123
The output is shown below.
FILE_NAME
FILESIZE
FULL_PATH
ABC.txt
12
I_12_122
ABC.txt
12
I_12_125
ABC.txt
12
I_13_156
ABC.txt
12
I_11_125
I used the query below to get the output.But it took me 6 hrs to get the output. Is there any other better way to increase the speed for faster results?
select file_name,filesize, full_path from duplicate_files f1 where
(file_name,filesize) in (select file_name,filesize from duplicate_files group by file_name,filesize having count(file_name) >1);