Обсуждение: filter duplicates by priority
Hello-
I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries. There are certain cases where there can be
repetitive part numbers that are priced differently. I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.
SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number
The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.
Any suggestions are appreciated.
--------------------------------------
A more detailed explanation of my problem follows:
I am working on an e-commerce system that has different lists of products
which contain many of the same products, at different prices. When a user
searches for a certain set of part numbers, I would like the resulting
products (and prices) to come from one of the lists, according to the
list's priority. Each user can have a different set of lists and
priorities.
Table: product_lists
id | name | priority | user_id
-----+------------------------------+----------+----------
5 | General List of Products | 2 | 23
3 | Different List of Products | 3 | 23
150 | Customer-Specific Products | 1 | 23
Table: products
product_list_id | part_number | price
-----------------+-------------+--------
3 | 92298A | 123.38
5 | 92298A | 111.04
3 | C39207 | 78.38
150 | C39207 | 67.93
Below is a simplified example of the structure of the query I am working
with. I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of
joins that determines the price. The pricing joins from one sub-query to
the next vary, so a collection of sub-queries seemed to be a logical
solution. Some part numbers are found in only one of the lists, while
other part numbers are repeated across lists at different prices.
This is what I would *like* to say:
SELECT DISTINCT ON (part_number) * FROM (
SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)
UNION ALL
SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)
UNION ALL
SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)
) AS filter_duplicates ORDER BY priority,part_number
I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first. Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority. But, the above statement fails because the DISTINCT ON
expression must match the leftmost ORDER BY expression. However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Clark Slater
> Sent: Tuesday, July 14, 2009 10:04 AM
> Subject: [GENERAL] filter duplicates by priority
Maybe I'm missing something, but why not something like (incoming
pseudo-SQL):
Select part_number, max(priority)
From (
Select part_number, priority from TableAndCriteria
Union all
Select part_number, priority from TableAndCriteria
Union all
Select part_number, priority from TableAndCriteria
) as allTables
Group by part_number
Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
.now.
"Clark Slater" <pg@slatech.com> writes:
> I am trying to use DISTINCT ON to filter out *potential* duplicate values
> from a set of sub queries. There are certain cases where there can be
> repetitive part numbers that are priced differently. I'm trying to start
> with the full list, ordered by priority, and then remove any repeats that
> have a lesser priority.
> SELECT DISTINCT ON (part_number) * FROM (
> SELECT part_number, priority FROM ...
> UNION ALL
> SELECT part_number, priority FROM ...
> UNION ALL
> SELECT part_number, priority FROM ...
> ) AS filter_duplicates ORDER BY priority,part_number
> The above statement does not work because if I ORDER BY
> priority,part_number then I have to DISTINCT ON (priority,part_number).
> But DISTINCT ON (priority, part_number) does not remove the repeated rows
> because the same part_number with a different priority becomes a distinct
> tuple.
AFAICS, changing it to ORDER BY part_number,priority would solve the
stated problem. If you really need the final result in priority rather
than part number order, put the whole thing in a sub-select and re-sort
outside it.
regards, tom lane
On Tue, Jul 14, 2009 at 10:04:12AM -0400, Clark Slater wrote: > Hello- > > I am trying to use DISTINCT ON to filter out *potential* duplicate values > from a set of sub queries. There are certain cases where there can be > repetitive part numbers that are priced differently. I'm trying to start > with the full list, ordered by priority, and then remove any repeats that > have a lesser priority. I think what you need to do is order by part_number first, do the DISTINCT ON () and then do an ORDER BY priority around that. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Вложения
On Tue, Jul 14, 2009 at 10:04:12AM -0400, Clark Slater wrote: > SELECT DISTINCT ON (part_number) * FROM ( > SELECT part_number, priority FROM ... > UNION ALL > SELECT part_number, priority FROM ... > UNION ALL > SELECT part_number, priority FROM ... > ) AS filter_duplicates ORDER BY priority,part_number > > The above statement does not work because if I ORDER BY > priority,part_number then I have to DISTINCT ON (priority,part_number). > But DISTINCT ON (priority, part_number) does not remove the repeated rows > because the same part_number with a different priority becomes a distinct > tuple. I think you just want to swap the ORDER BY columns around; i.e: ORDER BY part_number, priority -- Sam http://samason.me.uk/
This solved my problem. Now why didn't I think of that!? Thank you very much everybody. This list is an incredible resource. -Clark > AFAICS, changing it to ORDER BY part_number,priority would solve the > stated problem. If you really need the final result in priority rather > than part number order, put the whole thing in a sub-select and re-sort > outside it. > > regards, tom lane > > "Clark Slater" <pg@slatech.com> writes: >> I am trying to use DISTINCT ON to filter out *potential* duplicate >> values >> from a set of sub queries. There are certain cases where there can be >> repetitive part numbers that are priced differently. I'm trying to >> start >> with the full list, ordered by priority, and then remove any repeats >> that >> have a lesser priority. > >> SELECT DISTINCT ON (part_number) * FROM ( >> SELECT part_number, priority FROM ... >> UNION ALL >> SELECT part_number, priority FROM ... >> UNION ALL >> SELECT part_number, priority FROM ... >> ) AS filter_duplicates ORDER BY priority,part_number > >> The above statement does not work because if I ORDER BY >> priority,part_number then I have to DISTINCT ON (priority,part_number). >> But DISTINCT ON (priority, part_number) does not remove the repeated >> rows >> because the same part_number with a different priority becomes a >> distinct >> tuple.