Обсуждение: lag() default value ignored for some window partition depending on table records count?

Поиск
Список
Период
Сортировка

lag() default value ignored for some window partition depending on table records count?

От
Loïc Revest
Дата:
Hello,

Long story short:
- We track history data in a partitioned table which boils down to  "key fields + a JSONB data column", the latter being a list of date/numeric values pair ;
- For every of this date/numeric couple within their "window partitioning", we need to determine the preceding value, thus the use of lag(<value>, 1, 0::numeric), since it's relevant for us here to get the very first record of the partition having "0.00" as its preceding value;
- Except that the "first" row of some partitions gets NULL instead of 0::numeric as the result of lag(<value>, 1, 0::numeric), while other get 0::numeric as expected;
- Things get stranger when the table containing the data get "purged" from every record except those corresponding to the "window partition" whose first record gets NULL as lag() output: now it also gets 0::numeric...

More details:
- Behavior first observed on 12.10 (Debian, through pgdg repo), then reproduced on 14.2 (OpenBSD, through packages);
- "Odd" behavior may be seen for particular window partition when row count ∼ 600k, but not when only relevant rows (58 records) are kept;
- EXPLAIN ANALYZE when getting the "odd" behavior: https://explain.depesz.com/s/4kEt
- Output (first rows) to illustrate (\pset null 'NULL'):
lag_return=> SELECT * FROM points_timeserie_full WHERE peer_id = 10043;
peer_id |             date              | points_count | previous_points_count  
---------+-------------------------------+--------------+-----------------------
  10043 | 2019-11-13 13:59:50+01        |       770.00 |                  NULL
  10043 | 2019-11-14 15:17:15+01        |       480.00 |                770.00
  10043 | 2019-12-17 13:23:20+01        |       770.00 |                480.00

While for another "peer_id":
peer_id |          date                 | points_count | previous_points_count  
---------+-------------------------------+--------------+-----------------------
  10015 | 2019-11-15 12:39:34+01        |       840.00 |                  0.00
  10015 | 2019-11-19 14:12:26+01        |      1165.00 |                840.00
  10015 | 2019-11-21 15:51:52+01        |      1165.00 |               1165.00

- EXPLAIN ANALYZE when keeping relevant records only: https://explain.depesz.com/s/pZKFL
- Output (first rows) to illustrate (\pset null 'NULL'):
lag_return=> SELECT * FROM points_timeserie_10043 WHERE peer_id = 10043;     
peer_id |             date              | points_count | previous_points_count  
---------+-------------------------------+--------------+-----------------------
  10043 | 2019-11-13 13:59:50+01        |       770.00 |                     0
  10043 | 2019-11-14 15:17:15+01        |       480.00 |                770.00
  10043 | 2019-12-17 13:23:20+01        |       770.00 |                480.00

(Note: points_timeserie_full and points_timeserie_10043 are strictly identical views structure-wise, the only difference being in the dataset of their respective tables (themselves identical structure-wise), as explained above).

Attached is the test case database script we've been able to reproduce the "issue" one hundred percent of the time on, but without the data - even after compression this weights ∼8.5Mb. Data that I'd happily share would anyone points me towards the customary way to do it here with such "big" a file.

Regards,
L. Revest
Вложения

Re: lag() default value ignored for some window partition depending on table records count?

От
Tom Lane
Дата:
=?UTF-8?B?TG/Dr2MgUmV2ZXN0?= <l.revest@apc.fr> writes:
> Long story short:
> - We track history data in a partitioned table which boils down to  "key
> fields + a JSONB data column", the latter being a list of date/numeric
> values pair ;
> - For every of this date/numeric couple within their "window partitioning",
> we need to determine the preceding value, thus the use of lag(<value>, 1,
> 0::numeric), since it's relevant for us here to get the very first record
> of the partition having "0.00" as its preceding value;
> - Except that the "first" row of some partitions gets NULL instead of
> 0::numeric as the result of lag(<value>, 1, 0::numeric), while other get
> 0::numeric as expected;

Hm ...

> Attached is the test case database script we've been able to reproduce the
> "issue" one hundred percent of the time on, but without the data - even
> after compression this weights ∼8.5Mb. Data that I'd happily share would
> anyone points me towards the customary way to do it here with such "big" a
> file.

With no data, this is pretty pointless.  Maybe you could make a script
to fill the tables with some dummy data that reproduces the problem?

            regards, tom lane



Re: lag() default value ignored for some window partition depending on table records count?

От
"David G. Johnston"
Дата:
On Tue, Apr 26, 2022 at 3:22 PM Loïc Revest <l.revest@apc.fr> wrote:

- For every of this date/numeric couple within their "window partitioning", we need to determine the preceding value, thus the use of lag(<value>, 1, 0::numeric), since it's relevant for us here to get the very first record of the partition having "0.00" as its preceding value;

I'm a bit out of my league on the promises that window functions give with respect to qual pushdown and the like (and your nested view structure's impact on that): but the fact that your window doesn't do partitioning would seem to be a factor here.  It is perfectly fine for lag to return a null if, in this example, the previous peer_id's LAST graph.date has a null value for graph.agg_points and that is computed before removing all peer_id values except the one the in the query where clause.


- Except that the "first" row of some partitions gets NULL instead of 0::numeric as the result of lag(<value>, 1, 0::numeric), while other get 0::numeric as expected;
- Things get stranger when the table containing the data get "purged" from every record except those corresponding to the "window partition" whose first record gets NULL as lag() output: now it also gets 0::numeric...

- "Odd" behavior may be seen for particular window partition when row count ∼ 600k, but not when only relevant rows (58 records) are kept;

Exactly, your under-specified window clause in the view doesn't match up with the usage.  It seems like operator-error to me.  You want the first record to be relative to the final output, which is per-id, but the window doesn't actually compute that.

David J.

Re: lag() default value ignored for some window partition depending on table records count?

От
David Rowley
Дата:
On Wed, 27 Apr 2022 at 10:48, David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 3:22 PM Loïc Revest <l.revest@apc.fr> wrote:
>>
>>
>> - For every of this date/numeric couple within their "window partitioning", we need to determine the preceding
value,thus the use of lag(<value>, 1, 0::numeric), since it's relevant for us here to get the very first record of the
partitionhaving "0.00" as its preceding value; 
>
>
> I'm a bit out of my league on the promises that window functions give with respect to qual pushdown and the like (and
yournested view structure's impact on that): but the fact that your window doesn't do partitioning would seem to be a
factorhere.  It is perfectly fine for lag to return a null if, in this example, the previous peer_id's LAST graph.date
hasa null value for graph.agg_points and that is computed before removing all peer_id values except the one the in the
querywhere clause. 

There can only be qual pushdown when the column in the qual is present
in the PARTITION BY clause. In this case, there's no PARTITION BY
clause, so can't be pushdowns.

And yeah, due to lack of a PARTITION BY, LAG's default value is only
applied to exactly 1 row, the one with the lowest peer_id and
graph.date.  The example one showing the NULL is peer_id 10043, and
one showing 0.0 is peer_id 10015. So 10043 is certainly not the lowest
peer_id in the recordset. That indicates the NULL is not from the LAG
but actually just a NULL returned by jsonb_to_recordset().

Loïc, If the required behaviour is to replace NULL with 0.0, then
COALESCE is what you need.

David



Re: lag() default value ignored for some window partition depending on table records count?

От
"David G. Johnston"
Дата:
On Tue, Apr 26, 2022 at 3:59 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 27 Apr 2022 at 10:48, David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 3:22 PM Loïc Revest <l.revest@apc.fr> wrote:
>>
>>
>> - For every of this date/numeric couple within their "window partitioning", we need to determine the preceding value, thus the use of lag(<value>, 1, 0::numeric), since it's relevant for us here to get the very first record of the partition having "0.00" as its preceding value;
>
>
> I'm a bit out of my league on the promises that window functions give with respect to qual pushdown and the like (and your nested view structure's impact on that): but the fact that your window doesn't do partitioning would seem to be a factor here.  It is perfectly fine for lag to return a null if, in this example, the previous peer_id's LAST graph.date has a null value for graph.agg_points and that is computed before removing all peer_id values except the one the in the query where clause.

There can only be qual pushdown when the column in the qual is present
in the PARTITION BY clause. In this case, there's no PARTITION BY
clause, so can't be pushdowns.


Thanks.  I got a bit confused by the working plan having a different dataset than the not working one: until now when I realized that basically what that plan shows is that if one does partition by peer_id (in this case physically, by removing all other records except the peer_id you want) one gets the correct result.  The "Rows removed by filter" part of the explain output is suppressed if the row count is zero - but it is still above the window aggregate.

David J.

Re: lag() default value ignored for some window partition depending on table records count?

От
Loïc Revest
Дата:
>> On Wed, 27 Apr 2022 at 10:48, David G. Johnston
>> <david.g.johnston@gmail.com> wrote:
>>
>> There can only be qual pushdown when the column in the qual is present
>> in the PARTITION BY clause. In this case, there's no PARTITION BY
>> clause, so can't be pushdowns.
>>
peer_id being only used as a sort criterion instead of the
partitioning factor is indeed the culprit, and a true shame I haven't
figured it out on my own.

lag(<value>, 1, 0::numeric) OVER (PARTITION BY peer_id ORDER BY
calculation_time) returns 0::numeric for first row in partition, as
expected.

Thanks a lot to all three of you for pointing it out to me, and my
apologies for the unrelevant noise on that list.

Regards,
L. Revest