Обсуждение: Maintain the pathkesy for subquery from outer side information
When I am working on the UnqiueKey stuff, I find the following cases. SELECT * FROM (SELECT * FROM t offset 0) v ORDER BY a; // root->query_keys = A. root->order_pathkeys = A // Current: subroot->query_pathkeys = NIL. // Expected: subroot->xxxx_pathkeys = [A]. SELECT * FROM (SELECT * FROM t offset 0) v, t2 WHERE t2.a = t.a; // root->query_keys = NIL // Current: subroot->query_keys = NIL // Expected: subroot->xxx_pathkeys = A To resolve this issue, I want to add a root->outer_pathkeys which means the interesting order from the outer side for a subquery. To cover the cases like below // root->query_keys = root->order_keys = b. // Expected: subroot->xxx_pathkeys = (a)? (b)? SELECT * FROM (SELECT * FROM t offset 0) v, t2 WHERE t2.a = t.a order by v.b; the root->outer_pathkeys should be a list of lists. in above case subroot->outer_pathkeys should be [ [a], [b] ], this value may be checked at many places, like pathkeys_useful_for_ordering, get_useful_pathkeys_for_relation, build_index_paths and more. My list might be incomplete, but once we have a new place to check and the data is maintained already, it would be easy to improve. My thinking is we maintain the root->outer_pathkeys first, and then improve the well known function as the first step. What do you think? -- Best Regards Andy Fan (https://www.aliyun.com/)
Andy Fan <zhihui.fan1213@gmail.com> writes: > When I am working on the UnqiueKey stuff, I find the following cases. > SELECT * FROM (SELECT * FROM t offset 0) v ORDER BY a; > // root->query_keys = A. root->order_pathkeys = A > // Current: subroot->query_pathkeys = NIL. > // Expected: subroot->xxxx_pathkeys = [A]. Why do you "expect" that? I think pushing the outer ORDER BY past a LIMIT is an unacceptable semantics change. regards, tom lane
On Sat, Jul 24, 2021 at 10:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andy Fan <zhihui.fan1213@gmail.com> writes: > > When I am working on the UnqiueKey stuff, I find the following cases. > > SELECT * FROM (SELECT * FROM t offset 0) v ORDER BY a; > > // root->query_keys = A. root->order_pathkeys = A > > // Current: subroot->query_pathkeys = NIL. > > // Expected: subroot->xxxx_pathkeys = [A]. > > Why do you "expect" that? I think pushing the outer ORDER BY past a > LIMIT is an unacceptable semantics change. > > regards, tom lane I don't mean push down a "ORDER BY" clause to subquery, I mean push down an "interesting order" to subquery. for example we have index t(a); then SELECT * FROM (SELECT a FROM t OFFSET 0) v ORDER BY a; In the current implementation, when we are planning the subuqery, planners think the "pathkey a" is not interesting, but it should be IIUC. -- Best Regards Andy Fan (https://www.aliyun.com/)
Andy Fan <zhihui.fan1213@gmail.com> writes: > On Sat, Jul 24, 2021 at 10:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Why do you "expect" that? I think pushing the outer ORDER BY past a >> LIMIT is an unacceptable semantics change. > I don't mean push down a "ORDER BY" clause to subquery, I mean push > down an "interesting order" to subquery. for example we have index t(a); > then SELECT * FROM (SELECT a FROM t OFFSET 0) v ORDER BY a; > In the current implementation, when we are planning the subuqery, planners > think the "pathkey a" is not interesting, but it should be IIUC. No, it should not be. (1) We have long treated "OFFSET 0" as an optimization fence. That means that the outer query shouldn't affect what plan you get for the subquery. (2) If you ignore point (1), you could argue that choosing a different result order doesn't matter for this subquery. However, it potentially *does* matter for a large fraction of the cases in which we'd not have flattened the subquery into the outer query. In subqueries involving things like volatile functions, aggregates, window functions, etc, encouraging the sub-planner to pick a different result ordering could lead to visibly different output. I think that in cases where there's not a semantic hazard involved, we'd usually have pulled up the subquery so that this is all moot anyway. regards, tom lane
> I think that in cases where there's not a semantic hazard involved, > we'd usually have pulled up the subquery so that this is all moot > anyway. > I get your point with this statement. Things driven by this idea look practical and lucky. But for the UniqueKey stuff, we are not that lucky. SELECT pk FROM t; -- Maintain the UniqueKey would be not necessary. However SELECT DISTINCT pk FROM (SELECT volatile_f(a), pk from t) WHERE ..; Maintaining the UniqueKey in subquery is necessary since it is useful outside. -- Best Regards Andy Fan (https://www.aliyun.com/)