Обсуждение: Declaring a strict function returns not null / eval speed

Поиск
Список
Период
Сортировка

Declaring a strict function returns not null / eval speed

От
Andres Freund
Дата:
Hi,


We spend a surprising amount of time during expression evaluation to reevaluate whether input to a strict function (or similar) is not null, even though the value either comes from a strict function, or a column declared not null.

Now you can rightfully say that a strict function still can return NULL, even when called with non-NULL input. But practically that's quite rare. Most of the common byvalue type operators are strict, and approximately none of those return NULL when actually called.

That makes me wonder if it's worthwhile to invent a function property declaring strict strictness or such. It'd allow for some quite noticable improvements for e.g. queries aggregating a lot of rows, we spend a fair time checking whether the transition value has "turned" not null. I'm about to submit a patch making that less expensive, but it's still expensive.

I can also imagine that being able to propagate NOT NULL further up the parse-analysis tree could be beneficial for planning, but I've not looked at it in any detail.


A related issue is that we, during executor initialization, currently "loose" information about a column's NOT NULLness just above the lower scan nodes. Efficiency wise that's a substantial loss for many realistic queries: For JITed deforming that basically turns a bunch of mov instructions with constant offsets into much slower attribute by attribute trawling through the tuple. The latter can approximately not take advantage of the superscalar nature of just about any relevant processor. And for non JITed execution an expression step that used a cheaper deforming routine for the cases where only leading not null columns are accessed would also yield significant speedups. This is made worse by the fact that we often not actually deform at the scan nodes, due to the physical tlist optimization. This is especially bad for nodes storing tuples as minimal tuples (e.g. hashjoin, hashagg), where often a very significant fraction of time of spent re-deforming columns that already were deformed earlier.

It doesn't seem very hard to propagate attnotnull upwards in a good number of the cases. We don't need to do so everywhere for it to be beneficial.

Comments?

Andres



Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Declaring a strict function returns not null / eval speed

От
Andreas Karlsson
Дата:
On 10/1/19 9:38 AM, Andres Freund wrote:
> We spend a surprising amount of time during expression evaluation to 
> reevaluate whether input to a strict function (or similar) is not null, 
> even though the value either comes from a strict function, or a column 
> declared not null.
> 
> Now you can rightfully say that a strict function still can return NULL, 
> even when called with non-NULL input. But practically that's quite rare. 
> Most of the common byvalue type operators are strict, and approximately 
> none of those return NULL when actually called.
> 
> That makes me wonder if it's worthwhile to invent a function property 
> declaring strict strictness or such. It'd allow for some quite noticable 
> improvements for e.g. queries aggregating a lot of rows, we spend a fair 
> time checking whether the transition value has "turned" not null. I'm 
> about to submit a patch making that less expensive, but it's still 
> expensive.
> 
> I can also imagine that being able to propagate NOT NULL further up the 
> parse-analysis tree could be beneficial for planning, but I've not 
> looked at it in any detail.

Agreed, this sounds like something useful to do since virtually all 
strict functions cannot return NULL, especially the ones which are used 
in tight loops. The main design issue seems to be to think up a name for 
this new level of strictness which is not too confusing for end users.

We also have a handful of non-strict functions (e.g. concat() and range 
constructors like tstzrange()) which are guaranteed to never return 
NULL, but I do not think they are many enough or performance critical 
enough to be worth adding this optimization to.

Andreas



Re: Declaring a strict function returns not null / eval speed

От
Tels
Дата:
Moin,

On 2019-10-20 13:30, Andreas Karlsson wrote:
> On 10/1/19 9:38 AM, Andres Freund wrote:
>> We spend a surprising amount of time during expression evaluation to 
>> reevaluate whether input to a strict function (or similar) is not 
>> null, even though the value either comes from a strict function, or a 
>> column declared not null.
>> 
>> Now you can rightfully say that a strict function still can return 
>> NULL, even when called with non-NULL input. But practically that's 
>> quite rare. Most of the common byvalue type operators are strict, and 
>> approximately none of those return NULL when actually called.
>> 
>> That makes me wonder if it's worthwhile to invent a function property 
>> declaring strict strictness or such. It'd allow for some quite 
>> noticable improvements for e.g. queries aggregating a lot of rows, we 
>> spend a fair time checking whether the transition value has "turned" 
>> not null. I'm about to submit a patch making that less expensive, but 
>> it's still expensive.
>> 
>> I can also imagine that being able to propagate NOT NULL further up 
>> the parse-analysis tree could be beneficial for planning, but I've not 
>> looked at it in any detail.
> 
> Agreed, this sounds like something useful to do since virtually all
> strict functions cannot return NULL, especially the ones which are
> used in tight loops. The main design issue seems to be to think up a
> name for this new level of strictness which is not too confusing for
> end users.

STRICT NONULL? That way you could do

   CREATE FUNCTION f1 ... STRICT;
   CREATE FUNCTION f2 ... STRICT NONULL;
   CREATE FUNCTION f3 ... NONULL;

and the last wold throw "not implementet yet"? "NEVER RETURNS NULL" 
would also ryme with the existing "RETURNS NULL ON NULL INPUT", but I 
find the verbosity too high.

Best regards,

Tels

-- 
Best regards,

Tels



Re: Declaring a strict function returns not null / eval speed

От
Tom Lane
Дата:
Tels <nospam-pg-abuse@bloodgate.com> writes:
> On 2019-10-20 13:30, Andreas Karlsson wrote:
>> Agreed, this sounds like something useful to do since virtually all
>> strict functions cannot return NULL, especially the ones which are
>> used in tight loops. The main design issue seems to be to think up a
>> name for this new level of strictness which is not too confusing for
>> end users.

> STRICT NONULL? That way you could do

>    CREATE FUNCTION f1 ... STRICT;
>    CREATE FUNCTION f2 ... STRICT NONULL;
>    CREATE FUNCTION f3 ... NONULL;

> and the last wold throw "not implementet yet"? "NEVER RETURNS NULL" 
> would also ryme with the existing "RETURNS NULL ON NULL INPUT", but I 
> find the verbosity too high.

"RETURNS NOT NULL", perhaps?  That'd have the advantage of not requiring
any new keyword.

I'm a little bit skeptical of the actual value of adding this additional
level of complexity, but I suppose we can't determine that reliably
without doing most of the work :-(

            regards, tom lane



Re: Declaring a strict function returns not null / eval speed

От
Tels
Дата:
Moin,

On 2019-10-20 16:27, Tom Lane wrote:
> Tels <nospam-pg-abuse@bloodgate.com> writes:
>> On 2019-10-20 13:30, Andreas Karlsson wrote:
>>> Agreed, this sounds like something useful to do since virtually all
>>> strict functions cannot return NULL, especially the ones which are
>>> used in tight loops. The main design issue seems to be to think up a
>>> name for this new level of strictness which is not too confusing for
>>> end users.
> 
>> STRICT NONULL? That way you could do
> 
>>    CREATE FUNCTION f1 ... STRICT;
>>    CREATE FUNCTION f2 ... STRICT NONULL;
>>    CREATE FUNCTION f3 ... NONULL;
> 
>> and the last wold throw "not implementet yet"? "NEVER RETURNS NULL"
>> would also ryme with the existing "RETURNS NULL ON NULL INPUT", but I
>> find the verbosity too high.
> 
> "RETURNS NOT NULL", perhaps?  That'd have the advantage of not 
> requiring
> any new keyword.

Hm, yes, that would be a good compromise on verbosity and align even 
better the other "RETURNS ..." variants.

> I'm a little bit skeptical of the actual value of adding this 
> additional
> level of complexity, but I suppose we can't determine that reliably
> without doing most of the work :-(

Maybe it would be possible to simulate the effect somehow? Or at least 
we could try to find practical queries where the additional information 
results in a much better plan if RETRUNS NOT NULL was set.

Best regards,

Tels



Re: Declaring a strict function returns not null / eval speed

От
Andres Freund
Дата:
Hi,

On 2019-10-20 10:27:19 -0400, Tom Lane wrote:
> "RETURNS NOT NULL", perhaps?  That'd have the advantage of not requiring
> any new keyword.

That could work.


> I'm a little bit skeptical of the actual value of adding this additional
> level of complexity, but I suppose we can't determine that reliably
> without doing most of the work :-(

Depends a bit on what case we're concerned about improving. What brought
me onto this was the concern that actually a good bit of the overhead of
computing aggregate transition functions is often the checks whether the
transition value has become NULL. And that for a lot of the more common
aggregates that's unnecessary, as they'll never do so.  That case is
pretty easy to test, we can just stop generating the relevant expression
step and do a few micro benchmarks.

Obviously for the planner taking advantage of that fact, it's more work...

Greetings,

Andres Freund



Re: Declaring a strict function returns not null / eval speed

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-10-20 10:27:19 -0400, Tom Lane wrote:
>> "RETURNS NOT NULL", perhaps?  That'd have the advantage of not requiring
>> any new keyword.

> That could work.

Actually, I think we probably don't need any SQL representation of this
at all, because if what you're going to do with it is omit logically
necessary null-value checks, then a wrong setting would trivially crash
the server.  Therefore, we can never give the ability to set this flag
to users; we could only set it on built-in functions.

(But that saves a lot of work, eg dump/restore support isn't needed
either.)

This doesn't seem too awful to me, because non-builtin functions are
most likely slow enough that it doesn't matter.

            regards, tom lane



Re: Declaring a strict function returns not null / eval speed

От
Christoph Berg
Дата:
Re: Tom Lane 2019-10-22 <821.1571771210@sss.pgh.pa.us>
> Actually, I think we probably don't need any SQL representation of this
> at all, because if what you're going to do with it is omit logically
> necessary null-value checks, then a wrong setting would trivially crash
> the server.  Therefore, we can never give the ability to set this flag
> to users; we could only set it on built-in functions.

Or require superuser.

> This doesn't seem too awful to me, because non-builtin functions are
> most likely slow enough that it doesn't matter.

Some years ago, Kohsuke Kawaguchi, the Jenkins author, was giving a
keynote at FOSDEM about extensibility of software. The gist I took
away from it was the tagline "if core can do something that extensions
can't, that's a bug". I think that's something that PostgreSQL should
try to live up to as well.

Christoph



Re: Declaring a strict function returns not null / eval speed

От
Andres Freund
Дата:
Hi,

On 2019-10-22 15:06:50 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-10-20 10:27:19 -0400, Tom Lane wrote:
> >> "RETURNS NOT NULL", perhaps?  That'd have the advantage of not requiring
> >> any new keyword.
> 
> > That could work.
> 
> Actually, I think we probably don't need any SQL representation of this
> at all, because if what you're going to do with it is omit logically
> necessary null-value checks, then a wrong setting would trivially crash
> the server.  Therefore, we can never give the ability to set this flag
> to users; we could only set it on built-in functions.

I assumed we'd allow it plainly for C functions, as there's already
myriad ways to break the server. And for anything but C, we probably
should check it in the language handler (or some generic code invoking
that).

I think it's interesting to have this function property not just for
performance, but also semantic reasons. But it's fine to include the
check in the function handler (or some wrapper around those, if we think
that's worthwhile), rather than relying on the function to get this
right.


> This doesn't seem too awful to me, because non-builtin functions are
> most likely slow enough that it doesn't matter.

With builtin, do you mean just internal functions, or also "C"? I think
it's worthwhile to allow "C" directly if benchmarks proves this is
worthwhile.

Greetings,

Andres Freund