Обсуждение: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

Поиск
Список
Период
Сортировка

Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
Since e2debb643, we've had the ability to determine if a column is
NULLable as early as during constant folding.  This seems like a good
time to consider converting COUNT(not_null_col) into COUNT(*), which
is faster and may result in far fewer columns being deformed from the
tuple.

To make this work, I invented "SupportRequestSimplifyAggref", which is
similar to the existing SupportRequestSimplify, which is for
FuncExprs.  Aggregates use Aggrefs, so we need something else.

It's easy to see that count(*) is faster. Here's a quick test in an
unpatched master:

create table t (a int, b int, c int, d int, e int, f int, g int, h int
not null);
insert into t (h) select 1 from generate_Series(1,1000000);
vacuum freeze t;

master:
select count(h) from t;
Time: 16.442 ms
Time: 16.255 ms
Time: 16.322 ms

master:
select count(*) from t;
Time: 12.203 ms
Time: 11.402 ms
Time: 12.054 ms (+37%)

With the patch applied, both queries will perform the same.

It may be possible to apply transformations to other aggregate
functions too, but I don't want to discuss that here. I mostly want to
determine if the infrastructure is ok and do the count(*) one because
it seems like the most likely one to be useful.

One thing I wasn't too sure about was if we should make it possible
for the support function to return something that's not an Aggref.  In
theory, something like COUNT(NULL) could just return '0'::bigint.
While that does seem an optimisation that wouldn't be applied very
often, I have opted to leave it so that such an optimisation *could*
be done by the support function. I also happen to test that that
doesn't entirely break the query, as ordinarily it would if we didn't
have Query.hasAggs (It's not too dissimilar to removing unused columns
from a subquery)

Should we do this?

David

Вложения

Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
Corey Huinker
Дата:
It may be possible to apply transformations to other aggregate
functions too, but I don't want to discuss that here. I mostly want to
determine if the infrastructure is ok and do the count(*) one because
it seems like the most likely one to be useful.

One thing I wasn't too sure about was if we should make it possible
for the support function to return something that's not an Aggref.  In
theory, something like COUNT(NULL) could just return '0'::bigint.
While that does seem an optimisation that wouldn't be applied very
often, I have opted to leave it so that such an optimisation *could*
be done by the support function. I also happen to test that that
doesn't entirely break the query, as ordinarily it would if we didn't
have Query.hasAggs (It's not too dissimilar to removing unused columns
from a subquery)

Should we do this?

David

+1

Automatic query improvements are a Good Thing. We're never going to educate everyone that COUNT(1) is an anti-pattern, so it's easier to make it not an anti-pattern.

I'm in favor of the COUNT(NULL) optimization as well, as one of my favorite programming tropes is "There is nothing faster than nothing".

The check seems lightweight enough to me. Applies clean and tests pass. Test coverage seems to cover all the cases.

Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
On Thu, 30 Oct 2025 at 16:40, Corey Huinker <corey.huinker@gmail.com> wrote:
> I'm in favor of the COUNT(NULL) optimization as well, as one of my favorite programming tropes is "There is nothing
fasterthan nothing".
 

I think it would be much more interesting to do that if it could be
detected in cases like:

SELECT COUNT(col) FROM table WHERE col IS NULL;

which might be a more realistic thing if the query without the WHERE
clause was part of a VIEW. However, we don't currently have any
infrastructure to detect when a column *is* NULL. There's only the
opposite with expr_is_nonnullable() or var_is_nonnullable().

This does make me wonder if constant-folding is too early to do this
transformation, as it currently happens before
add_base_clause_to_rel() therefore we can't really transform cases
such as:

SELECT count(nullable_col) FROM t WHERE nullable_col IS NOT NULL;

There might be a better spot in planning to do this at a point after
add_base_clause_to_rel() is called. It just has to happen before the
search for Aggrefs with the same aggtransfn in preprocess_aggref() as
it's too late to swap aggregate functions around when they've already
been grouped together with other aggs that can share the same
transition state. I'm just subtly aware about Tom's complaints with
the restriction_is_always_true() code as he thought it should go into
the constant folding code, where it mostly now is, per Richard's work
to put it there.

> The check seems lightweight enough to me. Applies clean and tests pass. Test coverage seems to cover all the cases.

Thanks for having a look and testing.

I've attached a very slightly revised version of the patch. I became
aware of a function named get_func_support(), which can be used rather
than fetching the pg_proc tuple from SysCache, which I was doing in
v1.  No other changes.

David

Вложения

Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
Corey Huinker
Дата:
which might be a more realistic thing if the query without the WHERE
clause was part of a VIEW. However, we don't currently have any
infrastructure to detect when a column *is* NULL. There's only the
opposite with expr_is_nonnullable() or var_is_nonnullable().

But we'd still catch NULL constants, yes?
 
I've attached a very slightly revised version of the patch. I became
aware of a function named get_func_support(), which can be used rather
than fetching the pg_proc tuple from SysCache, which I was doing in
v1.  No other changes.

David

 The change is a big win for clarity. Applies clean. Passes.

Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
"Matheus Alcantara"
Дата:
Hi, thanks for working on this!

On Thu Oct 30, 2025 at 1:20 AM -03, David Rowley wrote:
> I've attached a very slightly revised version of the patch. I became
> aware of a function named get_func_support(), which can be used rather
> than fetching the pg_proc tuple from SysCache, which I was doing in
> v1.  No other changes.
>
I looked the code and it seems to be in a good shape, but I tried to
apply the v2 on top of e7ccb247b38 in master to run some tests and a
rebase is necessary.

git am v2-0001-Have-the-planner-replace-COUNT-ANY-with-COUNT-whe.patch
Applying: Have the planner replace COUNT(ANY) with COUNT(*), when possible
error: patch failed: contrib/postgres_fdw/expected/postgres_fdw.out:2975
error: contrib/postgres_fdw/expected/postgres_fdw.out: patch does not apply
error: patch failed: src/test/regress/expected/aggregates.out:1219
error: src/test/regress/expected/aggregates.out: patch does not apply
Patch failed at 0001 Have the planner replace COUNT(ANY) with COUNT(*), when possible

--
Matheus Alcantara




Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
On Tue, 4 Nov 2025 at 09:38, Matheus Alcantara <matheusssilv97@gmail.com> wrote:
> I looked the code and it seems to be in a good shape, but I tried to
> apply the v2 on top of e7ccb247b38 in master to run some tests and a
> rebase is necessary.

Are you sure you've not got something else in your branch? It applies
ok here, and the CFbot isn't complaining either. CFBot's is based on
cf8be0225, which is 2 commits before the one you're trying, but
src/test/regress/expected/aggregates.out hasn't been changed since
2025-10-07.

David



Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
On Sat, 1 Nov 2025 at 14:19, Corey Huinker <corey.huinker@gmail.com> wrote:
>>
>> which might be a more realistic thing if the query without the WHERE
>> clause was part of a VIEW. However, we don't currently have any
>> infrastructure to detect when a column *is* NULL. There's only the
>> opposite with expr_is_nonnullable() or var_is_nonnullable().
>
> But we'd still catch NULL constants, yes?

Yes. It could. I've left that part of the patch #ifdef'd out. I wasn't
planning on using it. I just left it there as an example for if
someone wanted to test it.

>  The change is a big win for clarity. Applies clean. Passes.

Thanks for checking.

David



Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
"Matheus Alcantara"
Дата:
On Mon Nov 3, 2025 at 7:47 PM -03, David Rowley wrote:
> On Tue, 4 Nov 2025 at 09:38, Matheus Alcantara <matheusssilv97@gmail.com> wrote:
>> I looked the code and it seems to be in a good shape, but I tried to
>> apply the v2 on top of e7ccb247b38 in master to run some tests and a
>> rebase is necessary.
>
> Are you sure you've not got something else in your branch? It applies
> ok here, and the CFbot isn't complaining either. CFBot's is based on
> cf8be0225, which is 2 commits before the one you're trying, but
> src/test/regress/expected/aggregates.out hasn't been changed since
> 2025-10-07.
>
Yes, my branch is clean, I even tried to apply on a cleaned git clone
but it is still failling to apply, very strange. I've added the cfbot
remote and cherry picked your commit and this works. I'll investigate
later why I'm not able to apply your patch directly.

I've tested and benchmarked the patch using count(1) and
count(not_null_col) and I've got similar results, ~30% of improvements
compared with master.

The code seems good to me, I don't have too many comments, I'm just not
sure if we should keep the #ifdef NOT_USED block but I'm not totally
against it. I'm +1 for the idea.

Thanks

--
Matheus Alcantara
EDB: http://www.enterprisedb.com



Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
On Wed, 5 Nov 2025 at 08:51, Matheus Alcantara <matheusssilv97@gmail.com> wrote:
>
> On Mon Nov 3, 2025 at 7:47 PM -03, David Rowley wrote:
> > Are you sure you've not got something else in your branch? It applies
> > ok here, and the CFbot isn't complaining either. CFBot's is based on
> > cf8be0225, which is 2 commits before the one you're trying, but
> > src/test/regress/expected/aggregates.out hasn't been changed since
> > 2025-10-07.
> >
> Yes, my branch is clean, I even tried to apply on a cleaned git clone
> but it is still failling to apply, very strange. I've added the cfbot
> remote and cherry picked your commit and this works. I'll investigate
> later why I'm not able to apply your patch directly.

Did you look at: git diff origin/master..master ?
I've certainly accidentally periodically committed to my local master
which I ended up doing: git reset --hard origin/master to fix

> The code seems good to me, I don't have too many comments, I'm just not
> sure if we should keep the #ifdef NOT_USED block but I'm not totally
> against it. I'm +1 for the idea.

Thanks for the review.  I might not have been clear that I had only
intended the NOT_USED part as an example for during the review period.
I'd never intended it going any further.

I've attached a version with the NOT_USED part removed (and a bunch of
#includes I forgot to remove). The only other change was a minor
revision to some comments.

The primary concern I have now is when in planning that we do this
Aggref simplification. Maybe I shouldn't be too concerned about that
as there doesn't seem to be a current reason not to put it where it
is. If someone comes up with a reason to do it later in planning at
some point in the future, we can consider moving it then. That sort of
excludes extensions with aggregates that want to have a
SupportRequestSimplifyAggref support function that might need the
processing done later in planning, but that just feels like a
situation that's unlikely to arise.

David

Вложения

Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
"Matheus Alcantara"
Дата:
On Tue Nov 4, 2025 at 9:16 PM -03, David Rowley wrote:
> On Wed, 5 Nov 2025 at 08:51, Matheus Alcantara <matheusssilv97@gmail.com> wrote:
>>
>> On Mon Nov 3, 2025 at 7:47 PM -03, David Rowley wrote:
>> > Are you sure you've not got something else in your branch? It applies
>> > ok here, and the CFbot isn't complaining either. CFBot's is based on
>> > cf8be0225, which is 2 commits before the one you're trying, but
>> > src/test/regress/expected/aggregates.out hasn't been changed since
>> > 2025-10-07.
>> >
>> Yes, my branch is clean, I even tried to apply on a cleaned git clone
>> but it is still failling to apply, very strange. I've added the cfbot
>> remote and cherry picked your commit and this works. I'll investigate
>> later why I'm not able to apply your patch directly.
>
> Did you look at: git diff origin/master..master ?
> I've certainly accidentally periodically committed to my local master
> which I ended up doing: git reset --hard origin/master to fix
>
Yes, I ran git reset before trying to apply the v2 and it still had
conflicts, very strange. Anyway the v3 applied clean on my environment
now.

>> The code seems good to me, I don't have too many comments, I'm just not
>> sure if we should keep the #ifdef NOT_USED block but I'm not totally
>> against it. I'm +1 for the idea.
>
> Thanks for the review.  I might not have been clear that I had only
> intended the NOT_USED part as an example for during the review period.
> I'd never intended it going any further.
>
Ok, it make sense now, thanks for making it clear.

> I've attached a version with the NOT_USED part removed (and a bunch of
> #includes I forgot to remove). The only other change was a minor
> revision to some comments.
>
Thanks, it looks cleaner.

> The primary concern I have now is when in planning that we do this
> Aggref simplification. Maybe I shouldn't be too concerned about that
> as there doesn't seem to be a current reason not to put it where it
> is. If someone comes up with a reason to do it later in planning at
> some point in the future, we can consider moving it then. That sort of
> excludes extensions with aggregates that want to have a
> SupportRequestSimplifyAggref support function that might need the
> processing done later in planning, but that just feels like a
> situation that's unlikely to arise.
>
I think it's ok to leave where it is implemented now and it make sense
to me. The SupportRequestSimplifyAggref is similar with
SupportRequestSimplify which is used by simplify_function() that is
called at eval_const_expressions_mutator(), the simplify_aggref() is
also called at the same function, so it seems to be consistent.

--
Matheus Alcantara
EDB: http://www.enterprisedb.com




Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
On Wed, 5 Nov 2025 at 13:16, David Rowley <dgrowleyml@gmail.com> wrote:
> I've attached a version with the NOT_USED part removed (and a bunch of
> #includes I forgot to remove). The only other change was a minor
> revision to some comments.

This patch needed to be rebased due to the changes made in b140c8d7a.
I also adjusted a few comments and adjusted some code in
simplify_aggref() which mistakenly assumed the support function would
always return an Aggref. That conflicted with what I'd written in the
header comment for the SupportRequestSimplifyAggref struct; "(probably
another Aggref)". Which is leaving the door open for more aggressive
optimisations that someone might want to do, e.g. the mentioned
COUNT(NULL) replaced with '0'::bigint.

I'm not seeing any reason now not to go ahead with this now. Does
anyone else want to take a look at it before I start wielding the
sword of commitment on it?

David

Вложения

Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)

От
David Rowley
Дата:
On Wed, 26 Nov 2025 at 12:37, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm not seeing any reason now not to go ahead with this now. Does
> anyone else want to take a look at it before I start wielding the
> sword of commitment on it?

And pushed.  Many thanks to Corey and Matheus for having a look at this.

David