On Thu, Feb 4, 2016 at 2:56 PM, Seth P <seth-p@outlook.com> wrote:
> Meanwhile, your results are curious not just because they differ from
> mine, but because you observe the DISTINCT removing *any* duplicates.
> There shouldn't be any, since rrr.idx is a unique serial (idx integer NOT
> NULL DEFAULT nextval('rrr_idx_seq'::regclass)). In particular, the
> following query returns no rows:
>
> SELECT rrr.idx FROM public.rrr GROUP BY rrr.idx HAVING count(*)>1
>
> Put another way, in your database, can you find the duplicates being
> removed by DISTINCT
>
B
=E2=80=8But the query is written as a series of INNER JOINs against "rrr" a=
nd so
its output is not constrained by UNIQUE(rrr.idx). So it very well may
return multiple rows for a given rrr.idx but when you restrict the output
to "rrr.*" and add DISTINCT the result is fewer records since now you only
have one of each rrr.idx.
=E2=80=8B
=E2=80=8B
=E2=80=8BI made one possibly meaningful change when I reported my discrepan=
cy
earlier. The query with the =E2=80=8B416,075 DISTINCT row count uses "SELE=
CT
DISTINCT rrr" while the one returning 415,874 uses "SELECT DISTINCT rrr.*".
=E2=80=8BThat said, running the following returns zero records:
=E2=80=8BSELECT
rrr.idx, count(*)
FROM public.rrr
INNER JOIN public.fff
ON fff.idx =3D rrr.fff_idx
INNER JOIN public.uuu
ON uuu.universe_hash =3D 5188205190738336870 AND
uuu.bababa =3D rrr.bababa
WHERE
fff.file_name_model_ver =3D '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
GROUP BY rrr.idx
HAVING count(*) > 1
=E2=80=8BSo indeed the query does not result in multiple rrr.idx values bei=
ng
output for the DISTINCT to remove.
=E2=80=8BDavid J.=E2=80=8B