Rethinking representation of sort/hash semantics in queries and plans
От | Tom Lane |
---|---|
Тема | Rethinking representation of sort/hash semantics in queries and plans |
Дата | |
Msg-id | 8075.1290913353@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Rethinking representation of sort/hash semantics in
queries and plans
(Martijn van Oosterhout <kleptog@svana.org>)
Re: Rethinking representation of sort/hash semantics in queries and plans (Dimitri Fontaine <dimitri@2ndQuadrant.fr>) Re: Rethinking representation of sort/hash semantics in queries and plans (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
In recent discussions of the plan-tree representation for KNNGIST index scans, there seemed to be agreement that it'd be a good idea to explicitly represent the expected sort ordering of the output. While thinking about how best to do that, it struck me that there's some pretty horrendous impedance mismatches in the way we do things now. Different parts of the system use two different representations of sort ordering: * A sort operator (which can have < or > semantics) plus nulls-first flag * A btree opfamily plus direction and nulls-first flags Getting from one of these representations to the other is not exactly cheap, as it requires one or more syscache lookups. But consider what happens when we process a simple SELECT ... ORDER BY query to produce a Sort plan: 1. The parser generates a SortGroupClause, which contains the sort-operator representation. This involves looking up the default btree opclass for the ORDER BY column's datatype, then finding the < or > member of the opclass. (These lookups are buffered by the typcache, but we'll still have to do them at least once per session.) If you use ORDER BY USING then you might think it's cheaper ... but we still do a lookup to verify that the operator is in some btree family. 2. The planner generates a PathKey, which is based on the opfamily representation, so we have to do get_ordering_op_properties to go back the other way. 3. If a sort plan is chosen, createplan.c uses get_opfamily_member to go from the PathKey representation back to sort-operator representation, because the Sort plan node type stores sort operators. 4. At runtime, tuplesort_begin_heap needs the comparison function for the sort operator, so it calls get_ordering_op_properties *again* to re-derive the opfamily, from which it can extract the right support function using get_opfamily_proc. Things are only marginally less comical if an indexscan plan is chosen, and that's only because the IndexScan plan node doesn't contain any explicit representation of the output sort order. If we were to solve the KNNGIST issue by installing sort operator lists in IndexScan, it'd be about as ugly as this. (For some extra amusement, trace through where build_index_pathkeys' data comes from...) I have not traced through the behavior for hash-based plans as carefully, but I believe that there are a similar number of conversions between operator OID and hash opfamily OID representations. We got into this mess by revising the planner to use opfamily OIDs to define sort/hash semantics without changing the structures that are its input and output APIs. I think it's probably time to fix this. I don't have any evidence at the moment about what fraction of SearchSysCache load is coming from these repeated conversions, but it can't be trivial. And quite aside from any performance benefits, it would be conceptually cleaner to have only one representation of sort semantics not two. If you look closely at what we're doing with sort operators (get_ordering_op_properties pretty much encapsulates this), it turns out that a sort operator is shorthand for three pieces of information: 1. btree opfamily OID 2. specific input datatype for the opfamily 3. ascending or descending direction So to fix these problems we'd need to replace sort operator OIDs in SortGroupClause and plan nodes with those three items. Obviously, this would be slightly bulkier, but the extra cost added to copying parse and plan trees should be tiny compared to the avoided syscache lookups. A possible compromise is to avoid storing the specific input datatype. In most cases it's the same as the column datatype or expression datatype that we're feeding to the sort operation, which I believe we can always get from somewhere else in the plan. However, it can be different in binary-compatibility cases (such as feeding a specific array type to anyarray_ops, or varchar to text_ops). If we don't store it then we'd need to add extra complexity in get_opfamily_proc and similar places to search for a binary-compatible member function or operator. This extra cost would be paid only when actually using binary-compatible cases, but it still seems like it'd be better to pay a little extra storage to avoid it. In a similar fashion, I think that the eqop member of SortGroupClause should be replaced by a hash opfamily OID and specific input datatype (no direction needed here, of course), and likewise for hash operator OIDs in hash-based plan node types. A possible objection to this idea is that instead of having stored rules and views depending on specific operators, they'd now be shown as depending on specific opfamilies. If you were to drop the relevant operators then you'd get failures at runtime because no suitable member operator or function could be found. However, with the current scheme it's possible to drop the opfamily that makes use of a particular operator valid in a given query context; so you can still get a failure, though it would happen a bit upstream in the planner when it fails to locate the opfamily. On balance I don't think this is any better or worse, just different. We don't support dynamic modification of opclasses terribly well anyhow, given all the caching that is done for them. Thoughts, objections? regards, tom lane
В списке pgsql-hackers по дате отправления: