Обсуждение: Select unique set of record, distinct on, please help!!!

Поиск
Список
Период
Сортировка

Select unique set of record, distinct on, please help!!!

От
worker
Дата:
Dear all,
     I am still new to the PostGres DB, I 've been using other DB for
quite a while, so SQL is OK with me. I am running into problem trying
to condense duplicate records in a table into one record. Naturally, I
think about 'DISTINCT ON' in select, but I just couldn't figure out
the proper syntax to use it correctly. So, please help.

    My table is as follow:
   Column    |         Type                |
Modifiers
-----------------+-------------------------------
+-------------------------------------------------------
   tag_id      | character varying(15) |
 enddate     | date                         |
 storetcodei | integer                     |
      gtlt        | character(1)             |
    value       | real                          |
    autoid      | integer                     | not null default
nextval('results_autoid_seq'::regclass)


    The main issue is that a lot of records have the same 'tag_id' and
'enddate', which is redundant. So I was trying to get rid of all the
duplicates, but just leave one copy in the database, simple you say,
right? NOT!

    I used many variants of the following query, but it seems psql is
not doing the right thing, it always gotten rid of all duplicate, not
leaving even one record from all the duplicates, in other db, all I
need is 'unique', so please could someone just give me a SELECT INOT
or CREATE TABLE AS statements that can rid of the duplicate but leave
one copy in the table.

create table results1 as select distinct on (tag_id,enddate) * from
resultsall;

    I searched quite a bit about distinct on, I can not understand the
logic behind the combination of distinct on and order by, so please
help with a working statement?
    Many thanks,
    T.


Re: Select unique set of record, distinct on, please help!!!

От
"Scott Marlowe"
Дата:
On Fri, May 16, 2008 at 10:13 AM, worker <tzhai2007@gmail.com> wrote:
> Dear all,
>     I am still new to the PostGres DB, I 've been using other DB for
> quite a while, so SQL is OK with me. I am running into problem trying
> to condense duplicate records in a table into one record. Naturally, I
> think about 'DISTINCT ON' in select, but I just couldn't figure out
> the proper syntax to use it correctly. So, please help.
>
>    My table is as follow:
>   Column    |         Type                |
> Modifiers
> -----------------+-------------------------------
> +-------------------------------------------------------
>   tag_id      | character varying(15) |
>  enddate     | date                         |
>  storetcodei | integer                     |
>      gtlt        | character(1)             |
>    value       | real                          |
>    autoid      | integer                     | not null default
> nextval('results_autoid_seq'::regclass)
>
>
>    The main issue is that a lot of records have the same 'tag_id' and
> 'enddate', which is redundant. So I was trying to get rid of all the
> duplicates, but just leave one copy in the database, simple you say,
> right? NOT!
>
>    I used many variants of the following query, but it seems psql is
> not doing the right thing, it always gotten rid of all duplicate, not
> leaving even one record from all the duplicates, in other db, all I
> need is 'unique', so please could someone just give me a SELECT INOT
> or CREATE TABLE AS statements that can rid of the duplicate but leave
> one copy in the table.
>
> create table results1 as select distinct on (tag_id,enddate) * from
> resultsall;

What exactly happens that's wrong.  I've used something like that
before and it should work.

Another possibility is to do it in place.  With transactions you can
always try something, check it, and roll back as needed.

let's say you have an id field that IS unique, we'll call it id (you
can create one if you don't have one)

alter table results1 add id int;
create temp sequence myseq;
update results1 set id=nextval('myseq');
select =id from results1 r1 join results r2 on (r1.tag_id=r2.tag and
r1.enddate=r2.enddate and r1.id > r2.id);

That query should return all the max ids of the matching rows.  The
id>id is the "order by" here.  You could replace it with some other
comparison for ordering.