Fixing DISTINCT ON for duplicate keys
От | Tom Lane |
---|---|
Тема | Fixing DISTINCT ON for duplicate keys |
Дата | |
Msg-id | 16673.1217525917@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Fixing DISTINCT ON for duplicate keys
(David Fetter <david@fetter.org>)
|
Список | pgsql-hackers |
I looked into this trouble report: http://archives.postgresql.org/pgsql-sql/2008-07/msg00123.php The problem is that by the time we get to transformDistinctClause(), any duplicate entries in the ORDER BY list have been eliminated (see addTargetToSortList). But transformDistinctClause expects a one-for-one match of the two lists, and so it complains. Clearly, duplicate DISTINCT ON items are just as redundant as duplicate ORDER BY items are, and so it seems that suppressing them is a reasonable thing to do. But I'm thinking that as long as we're touching this old code, there are some other things that should be fixed: * There's not really any semantic significance to the ordering of the DISTINCT ON list anyway, so it would be reasonable to rearrange the ordering of the list to match the ORDER BY list, rather than making the user do it. * It's really bletcherous that the code physically modifies the user-given ORDER BY. This damage is visible in stored rules --- they don't come out the same way you wrote them. While I don't mind the idea of dropping redundant entries, adding ORDER BY entries that the user never wrote seems bogus. It overconstrains the query, in a way that doesn't matter given our current implementation but could matter in the future. What I am thinking we could do about the latter is modify the querytree semantics a bit. Instead of insisting that the transformed distinctClause be equal to a prefix of the sortClause, allow either one to be a prefix of the other. Then the planner simply takes the longer one as its internal sort-order target. With this rule, the sortClause stays as what the user wrote (less any duplicate keys). The parser is required to remove any duplicate keys from the distinctClause and rearrange it if needed so that it has a common prefix with the sortClause (or throw error if this is not possible). This would be invisible to the user in plain SELECT DISTINCT, and in SELECT DISTINCT ON would mean that the list is dumped in a "canonical" order that matches ORDER BY, but isn't changed in any semantic way. Now this is probably too big a change to be prudent to back-patch. Is it worth coming up with a second patch that just tries to get transformDistinctClause to remove duplicates? Since the problem has existed for a very long time (at least back to 7.0 according to my testing) with no prior reports, it doesn't seem very important to fix. I'm a bit worried about putting a patch into only the back branches --- it would go out with little testing and so the odds of introducing a fresh problem seem uncomfortably high compared to the benefit. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: