Обсуждение: filter duplicates by priority

Поиск
Список
Период
Сортировка

filter duplicates by priority

От
"Clark Slater"
Дата:
Hello-

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries.  There are certain cases where there can be
repetitive part numbers that are priced differently.  I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

Any suggestions are appreciated.

--------------------------------------

A more detailed explanation of my problem follows:

I am working on an e-commerce system that has different lists of products
which contain many of the same products, at different prices.  When a user
searches for a certain set of part numbers, I would like the resulting
products (and prices) to come from one of the lists, according to the
list's priority.  Each user can have a different set of lists and
priorities.

Table: product_lists
  id |             name             | priority |  user_id
-----+------------------------------+----------+----------
  5  |  General List of Products    |     2    |     23
  3  |  Different List of Products  |     3    |     23
 150 |  Customer-Specific Products  |     1    |     23

Table: products
 product_list_id | part_number |  price
-----------------+-------------+--------
        3        |   92298A    | 123.38
        5        |   92298A    | 111.04
        3        |   C39207    |  78.38
       150       |   C39207    |  67.93

Below is a simplified example of the structure of the query I am working
with.  I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of
joins that determines the price.  The pricing joins from one sub-query to
the next vary, so a collection of sub-queries seemed to be a logical
solution.  Some part numbers are found in only one of the lists, while
other part numbers are repeated across lists at different prices.

This is what I would *like* to say:

SELECT DISTINCT ON (part_number) * FROM (

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

) AS filter_duplicates ORDER BY priority,part_number

I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first.  Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority.  But, the above statement fails because the DISTINCT ON
expression must match the leftmost ORDER BY expression.  However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.


Re: filter duplicates by priority

От
"Hartman, Matthew"
Дата:
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Clark Slater
> Sent: Tuesday, July 14, 2009 10:04 AM
> Subject: [GENERAL] filter duplicates by priority

Maybe I'm missing something, but why not something like (incoming
pseudo-SQL):

Select    part_number, max(priority)
From        (
        Select part_number, priority from TableAndCriteria
        Union all
        Select part_number, priority from TableAndCriteria
        Union all
        Select part_number, priority from TableAndCriteria
    ) as allTables
Group by part_number


Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
.now.


Re: filter duplicates by priority

От
Tom Lane
Дата:
"Clark Slater" <pg@slatech.com> writes:
> I am trying to use DISTINCT ON to filter out *potential* duplicate values
> from a set of sub queries.  There are certain cases where there can be
> repetitive part numbers that are priced differently.  I'm trying to start
> with the full list, ordered by priority, and then remove any repeats that
> have a lesser priority.

> SELECT DISTINCT ON (part_number) * FROM (
> SELECT part_number, priority FROM ...
> UNION ALL
> SELECT part_number, priority FROM ...
> UNION ALL
> SELECT part_number, priority FROM ...
> ) AS filter_duplicates ORDER BY priority,part_number

> The above statement does not work because if I ORDER BY
> priority,part_number then I have to DISTINCT ON (priority,part_number).
> But DISTINCT ON (priority, part_number) does not remove the repeated rows
> because the same part_number with a different priority becomes a distinct
> tuple.

AFAICS, changing it to ORDER BY part_number,priority would solve the
stated problem.  If you really need the final result in priority rather
than part number order, put the whole thing in a sub-select and re-sort
outside it.

            regards, tom lane

Re: filter duplicates by priority

От
Martijn van Oosterhout
Дата:
On Tue, Jul 14, 2009 at 10:04:12AM -0400, Clark Slater wrote:
> Hello-
>
> I am trying to use DISTINCT ON to filter out *potential* duplicate values
> from a set of sub queries.  There are certain cases where there can be
> repetitive part numbers that are priced differently.  I'm trying to start
> with the full list, ordered by priority, and then remove any repeats that
> have a lesser priority.

I think what you need to do is order by part_number first, do the
DISTINCT ON () and then do an ORDER BY priority around that.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: filter duplicates by priority

От
Sam Mason
Дата:
On Tue, Jul 14, 2009 at 10:04:12AM -0400, Clark Slater wrote:
> SELECT DISTINCT ON (part_number) * FROM (
> SELECT part_number, priority FROM ...
> UNION ALL
> SELECT part_number, priority FROM ...
> UNION ALL
> SELECT part_number, priority FROM ...
> ) AS filter_duplicates ORDER BY priority,part_number
>
> The above statement does not work because if I ORDER BY
> priority,part_number then I have to DISTINCT ON (priority,part_number).
> But DISTINCT ON (priority, part_number) does not remove the repeated rows
> because the same part_number with a different priority becomes a distinct
> tuple.

I think you just want to swap the ORDER BY columns around; i.e:

  ORDER BY part_number, priority

--
  Sam  http://samason.me.uk/

Re: filter duplicates by priority

От
"Clark Slater"
Дата:
This solved my problem.  Now why didn't I think of that!?  Thank you very
much everybody.  This list is an incredible resource.

-Clark


> AFAICS, changing it to ORDER BY part_number,priority would solve the
> stated problem.  If you really need the final result in priority rather
> than part number order, put the whole thing in a sub-select and re-sort
> outside it.
>
>             regards, tom lane
>
> "Clark Slater" <pg@slatech.com> writes:
>> I am trying to use DISTINCT ON to filter out *potential* duplicate
>> values
>> from a set of sub queries.  There are certain cases where there can be
>> repetitive part numbers that are priced differently.  I'm trying to
>> start
>> with the full list, ordered by priority, and then remove any repeats
>> that
>> have a lesser priority.
>
>> SELECT DISTINCT ON (part_number) * FROM (
>> SELECT part_number, priority FROM ...
>> UNION ALL
>> SELECT part_number, priority FROM ...
>> UNION ALL
>> SELECT part_number, priority FROM ...
>> ) AS filter_duplicates ORDER BY priority,part_number
>
>> The above statement does not work because if I ORDER BY
>> priority,part_number then I have to DISTINCT ON (priority,part_number).
>> But DISTINCT ON (priority, part_number) does not remove the repeated
>> rows
>> because the same part_number with a different priority becomes a
>> distinct
>> tuple.