Обсуждение: Optimize SELECT * in EXISTS
I read in the SQL standard that SELECT * in EXISTS is not supposed to be expanded to all columns, but only to an arbitrary literal. This corresponds to the recommendation in the PostgreSQL documentation to write EXISTS (SELECT 1 ...) instead. But not even our own tests and example code use that latter convention consistently, so I think many users don't know it or observe it either. So implementing that little optimization for SELECT * seems reasonable. The attached patch implements the transformation, meaning in EXISTS (SELECT * FROM ...), the star is replaced by an empty select list (taking advantage of the support for zero-column tables in PostgreSQL). There are plenty of tests involving this construct, so I didn't add any more explicit tests. (But it might be worth adding a test involving column privileges.) Thoughts?
Вложения
On Mon, 23 Feb 2026 at 21:21, Peter Eisentraut <peter@eisentraut.org> wrote: > I read in the SQL standard that SELECT * in EXISTS is not supposed to be > expanded to all columns, but only to an arbitrary literal. This > corresponds to the recommendation in the PostgreSQL documentation to > write EXISTS (SELECT 1 ...) instead. But not even our own tests and > example code use that latter convention consistently, so I think many > users don't know it or observe it either. So implementing that little > optimization for SELECT * seems reasonable. > Thoughts? Don't we already do this in simplify_EXISTS_query()? David
On 23.02.26 09:41, David Rowley wrote: > On Mon, 23 Feb 2026 at 21:21, Peter Eisentraut <peter@eisentraut.org> wrote: >> I read in the SQL standard that SELECT * in EXISTS is not supposed to be >> expanded to all columns, but only to an arbitrary literal. This >> corresponds to the recommendation in the PostgreSQL documentation to >> write EXISTS (SELECT 1 ...) instead. But not even our own tests and >> example code use that latter convention consistently, so I think many >> users don't know it or observe it either. So implementing that little >> optimization for SELECT * seems reasonable. > >> Thoughts? > > Don't we already do this in simplify_EXISTS_query()? It appears so, but then I think we should update the documentation along the lines I showed, because I found it misleading about this. New docs-only patch attached.
Вложения
Peter Eisentraut <peter@eisentraut.org> writes:
> It appears so, but then I think we should update the documentation along
> the lines I showed, because I found it misleading about this. New
> docs-only patch attached.
+1 for being more explicit here, but that specific wording reads a
little awkwardly to me. What do you think of the attached?
(The new code comments are fine, I didn't touch them.)
regards, tom lane
diff --git a/doc/src/sgml/func/func-subquery.sgml b/doc/src/sgml/func/func-subquery.sgml
index a9f2b12e48c..4453ab3a6ab 100644
--- a/doc/src/sgml/func/func-subquery.sgml
+++ b/doc/src/sgml/func/func-subquery.sgml
@@ -70,8 +70,14 @@ EXISTS (<replaceable>subquery</replaceable>)
and not on the contents of those rows, the output list of the
subquery is normally unimportant. A common coding convention is
to write all <literal>EXISTS</literal> tests in the form
- <literal>EXISTS(SELECT 1 WHERE ...)</literal>. There are exceptions to
- this rule however, such as subqueries that use <token>INTERSECT</token>.
+ <literal>EXISTS(SELECT * FROM ... WHERE ...)</literal>, another common
+ convention is to write <literal>EXISTS(SELECT 1 FROM ... WHERE
+ ...)</literal> or some other dummy constant. These conventions are
+ actually equivalent in <productname>PostgreSQL</productname>, which
+ will optimize away evaluation of the subquery's output list altogether
+ when it cannot affect the number of rows returned. (A counterexample
+ that cannot be optimized away is an output list containing a
+ set-returning function, since the function might return zero rows.)
</para>
<para>
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index e9dc9d31f05..1f18934b2a4 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1643,7 +1643,13 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
* Note: by suppressing the targetlist we could cause an observable behavioral
* change, namely that any errors that might occur in evaluating the tlist
* won't occur, nor will other side-effects of volatile functions. This seems
- * unlikely to bother anyone in practice.
+ * unlikely to bother anyone in practice. Note that any column privileges are
+ * still checked even if the reference is removed here.
+ *
+ * The SQL standard specifies that a SELECT * immediately inside EXISTS
+ * expands to not all columns but an arbitrary literal. That is kind of the
+ * same idea, but our optimization goes further in that it throws away the
+ * entire targetlist, and not only if it was written as *.
*
* Returns true if was able to discard the targetlist, else false.
*/
On 24.02.26 22:07, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> It appears so, but then I think we should update the documentation along >> the lines I showed, because I found it misleading about this. New >> docs-only patch attached. > > +1 for being more explicit here, but that specific wording reads a > little awkwardly to me. What do you think of the attached? Sounds good, thanks.
Peter Eisentraut <peter@eisentraut.org> writes:
> On 24.02.26 22:07, Tom Lane wrote:
>> +1 for being more explicit here, but that specific wording reads a
>> little awkwardly to me. What do you think of the attached?
> Sounds good, thanks.
Pushed.
regards, tom lane