Обсуждение: [MASSMAIL]SupportRequestRows support function for generate_series_timestamptz

Поиск
Список
Период
Сортировка

[MASSMAIL]SupportRequestRows support function for generate_series_timestamptz

От
David Rowley
Дата:
When looking at [1], I noticed that we don't have a prosupport
function for the timestamp version of generate_series.

We have this for the integer versions of generate_series(), per:

postgres=# explain analyze select * from generate_series(1, 256, 2);
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..1.28 rows=128 width=4)
(actual time=0.142..0.183 rows=128 loops=1)

The timestamp version just gives the default 1000 row estimate:

postgres=# explain analyze select * from generate_series('2024-01-01',
'2025-01-01', interval '1 day');
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..10.00 rows=1000
width=8) (actual time=0.604..0.718 rows=367 loops=1)

I had some spare time today, so wrote a patch, which gives you:

postgres=# explain analyze select * from generate_series('2024-01-01',
'2025-01-01', interval '1 day');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..3.67 rows=367 width=8)
(actual time=0.258..0.291 rows=367 loops=1)

This required a bit of hackery to not have timestamp_mi() error out in
the planner when the timestamp difference calculation overflows.  I
considered adding ereturn support to fix that, but that felt like
opening Pandora's box.  Instead, I added some pre-checks similar to
what's in timestamp_mi() to have the support function fall back on the
1000 row estimate when there will be an overflow.

Also, there's no interval_div, so the patch has a macro that converts
interval to microseconds and does floating point division. I think
that's good enough for row estimations.

I'll park this here until July CF.

(I understand this doesn't help the case in [1] as the generate_series
inputs are not const there)

David

[1] https://www.postgresql.org/message-id/CAMPYKo0FouB-HZ1k-_Ur2v%2BkK71q0T5icQGrp%2BSPbQJGq0H2Rw%40mail.gmail.com

Вложения

Re: SupportRequestRows support function for generate_series_timestamptz

От
David Rowley
Дата:
On Sun, 14 Apr 2024 at 15:14, David Rowley <dgrowleyml@gmail.com> wrote:
> I had some spare time today, so wrote a patch, which gives you:
>
> postgres=# explain analyze select * from generate_series('2024-01-01',
> '2025-01-01', interval '1 day');
>                                                     QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------
>  Function Scan on generate_series  (cost=0.00..3.67 rows=367 width=8)
> (actual time=0.258..0.291 rows=367 loops=1)

Here's v2 of the patch with some added regression tests.

I did this by writing a plpgsql function named explain_mask_costs()
which has various boolean parameters to mask out the various portions
of the costs. I wondered if this function should live somewhere else
as it seems applicable to more than just misc_functions.sql.  Maybe
test_setup.sql. I'll leave where it is for now unless anyone thinks
differently.

This is a fairly simple and seemingly non-controversial patch. I plan
to push it in the next few days unless there's some feedback before
then.

David

Вложения
looks good to me.

some minor questions:
/*
* Protect against overflows in timestamp_mi.  XXX convert to
* ereturn one day?
*/
if (!TIMESTAMP_NOT_FINITE(start) && !TIMESTAMP_NOT_FINITE(finish) &&
!pg_sub_s64_overflow(finish, start, &dummy))

i don't understand the comment "XXX convert to ereturn one day?".

do we need to add unlikely for "pg_sub_s64_overflow", i saw most of
pg_sub_s64_overflow have unlikely.



Re: SupportRequestRows support function for generate_series_timestamptz

От
David Rowley
Дата:
On Mon, 8 Jul 2024 at 14:50, jian he <jian.universality@gmail.com> wrote:
>
> looks good to me.

Thanks for looking.

> /*
> * Protect against overflows in timestamp_mi.  XXX convert to
> * ereturn one day?
> */
> if (!TIMESTAMP_NOT_FINITE(start) && !TIMESTAMP_NOT_FINITE(finish) &&
> !pg_sub_s64_overflow(finish, start, &dummy))
>
> i don't understand the comment "XXX convert to ereturn one day?".

The problem I'm trying to work around there is that timestamp_mi
raises an ERROR if there's an overflow.  I don't want the support
function to cause an ERROR so I'm trying to only call timestamp_mi in
cases where it won't error. The ereturn mention is a reference to
ERROR raising infrastructure added by d9f7f5d32 and so far only used
by input functions. It would be possible to use that to save from
having to do the pg_sub_s64_overflow().  Instead, we could check if
any errors were found and only proceed with the remaining part of the
calculation if none were found.

I've tried to improve the comment in the attached version. I removed
the reference to ereturn.

> do we need to add unlikely for "pg_sub_s64_overflow", i saw most of
> pg_sub_s64_overflow have unlikely.

I was hoping the condition would be likely() rather than unlikely().
However, I didn't consider that the code path was hot enough for it to
matter. It's just a function we call once during planning if we find a
call to generate_series_timestamp(). It's not like it's called a
million or a billion times during execution like a function such as
int4pl() could be.

David

Вложения
On Mon, Jul 8, 2024 at 12:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> > /*
> > * Protect against overflows in timestamp_mi.  XXX convert to
> > * ereturn one day?
> > */
> > if (!TIMESTAMP_NOT_FINITE(start) && !TIMESTAMP_NOT_FINITE(finish) &&
> > !pg_sub_s64_overflow(finish, start, &dummy))
> >
> > i don't understand the comment "XXX convert to ereturn one day?".
>
> The problem I'm trying to work around there is that timestamp_mi
> raises an ERROR if there's an overflow.  I don't want the support
> function to cause an ERROR so I'm trying to only call timestamp_mi in
> cases where it won't error. The ereturn mention is a reference to
> ERROR raising infrastructure added by d9f7f5d32 and so far only used
> by input functions. It would be possible to use that to save from
> having to do the pg_sub_s64_overflow().  Instead, we could check if
> any errors were found and only proceed with the remaining part of the
> calculation if none were found.
>
> I've tried to improve the comment in the attached version. I removed
> the reference to ereturn.
>
got it.

{ oid => '2031',
  proname => 'timestamp_mi', prorettype => 'interval',
  proargtypes => 'timestamp timestamp', prosrc => 'timestamp_mi' },
{ oid => '1188',
  proname => 'timestamptz_mi', prorettype => 'interval',
  proargtypes => 'timestamptz timestamptz', prosrc => 'timestamp_mi' },

so this also apply to

{ oid => '938', descr => 'non-persistent series generator',
  proname => 'generate_series', prorows => '1000', proretset => 't',
  prorettype => 'timestamp', proargtypes => 'timestamp timestamp interval',
  prosrc => 'generate_series_timestamp' },

If so, then we need to update src/include/catalog/pg_proc.dat also?



Re: SupportRequestRows support function for generate_series_timestamptz

От
David Rowley
Дата:
On Mon, 8 Jul 2024 at 16:43, jian he <jian.universality@gmail.com> wrote:
> { oid => '2031',
>   proname => 'timestamp_mi', prorettype => 'interval',
>   proargtypes => 'timestamp timestamp', prosrc => 'timestamp_mi' },
> { oid => '1188',
>   proname => 'timestamptz_mi', prorettype => 'interval',
>   proargtypes => 'timestamptz timestamptz', prosrc => 'timestamp_mi' },

I'm not quite sure what you mean that needs to be adjusted with this.

> so this also apply to
>
> { oid => '938', descr => 'non-persistent series generator',
>   proname => 'generate_series', prorows => '1000', proretset => 't',
>   prorettype => 'timestamp', proargtypes => 'timestamp timestamp interval',
>   prosrc => 'generate_series_timestamp' },
>
> If so, then we need to update src/include/catalog/pg_proc.dat also?

Oh, yeah. I missed setting the prosupport function for that one. Thanks.

I'm not sure when I realised there were 3 of these functions, but it
seems I didn't when I adjusted pg_proc.dat.

Updated patch attached.

David

Вложения
On Mon, Jul 8, 2024 at 1:16 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
>
> Updated patch attached.
>

looking good to me.



Re: SupportRequestRows support function for generate_series_timestamptz

От
David Rowley
Дата:
On Mon, 8 Jul 2024 at 17:52, jian he <jian.universality@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 1:16 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > Updated patch attached.
>
> looking good to me.

Thanks for reviewing. I've pushed the patch now.

David