Обсуждение: FETCH FIRST clause WITH TIES option
hello ,
The WITH TIES keyword is sql standard that specifies any peers of retained rows
to retained in the result set too .which means according to ordering key the result set can includes additional rows which have ties on the last position, if there are any and It work with ORDER BY query
The attach patch is a finished implementation of it except it did not have a regression test because am not sure of changing the test data set for it and I can’t find fetch first clause regression test too.
Regards
Surafel
Вложения
Hello Surafel,
On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
> hello ,
> 
> The WITH TIES keyword is sql standard that specifies any peers of 
> retained rows to retained in the result set too .which means 
> according to ordering key the result set can includes additional rows
> which have ties on the last position, if there are any and It work
> with ORDER BY query.
>
Thanks for the patch. I've looked at it today, and it seems mostly OK,
with a couple of minor issues. Most of it is code formatting and comment
wording issues, so I'm not going to go through them here - see the
attached 0002 patch (0001 is your patch, rebased to current master).
There's one place that I think is incorrect, and that's this bit from
ExecLimit:
    }else
    /*
     * Get next tuple from subplan, if any.
     */
    slot = ExecProcNode(outerPlan);
    if (TupIsNull(slot))
    {
        node->lstate = LIMIT_SUBPLANEOF;
        return NULL;
    }
    if (node->limitOption == WITH_TIES)
    {
        ExecCopySlot(node->last_slot, slot);
    }
    node->subSlot = slot;
    node->position++;
Note that the "else" guards only the very first line, not the whole
block. This seems wrong, i.e. there should be {} around the whole block.
I'm also suggesting to slightly change the create_limit_plan(), to keep
a single make_limit call. IMHO that makes it easier to understand,
although I admit it's fairly subjective.
One question is whether to make this work for LIMIT too, not just for
FETCH FIRST. I'm not sure how difficult would that be (it should be a
grammar-only tweak I guess), or if it's a good idea in general. But I
find it quite confusing that various comments say things like this:
  LimitOption  limitOption; /* LIMIT with ties or  exact number */
while in fact it does not work with LIMIT.
> 
> The attach patch is a finished implementation of it except it did not
> have a regression test because am not sure of changing the test data set
> for it and I can’t find fetch first clause regression test too.
> 
Well, that's not a very good reason not to have tests for this new
improvement. FWIW there are a couple of places in regression tests where
FETCH FIRST is used, see this:
  $ grep -i 'fetch first' src/test/regress/sql/*
  src/test/regress/sql/hs_standby_allowed.sql:fetch first from hsc;
  src/test/regress/sql/tablesample.sql:FETCH FIRST FROM tablesample_cur;
  src/test/regress/sql/tablesample.sql:FETCH FIRST FROM tablesample_cur;
  src/test/regress/sql/tidscan.sql:FETCH FIRST FROM c;
But those places don't seem like very good match for testing the new
stuff, because those are primarily testing something else. I suggest we
add the new tests into limit.sql.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Вложения
>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 10/26/2018 12:28 PM, Surafel Temesgen wrote: >> hello , >> >> The WITH TIES keyword is sql standard that specifies any peers of >> retained rows to retained in the result set too .which means >> according to ordering key the result set can includes additional rows >> which have ties on the last position, if there are any and It work >> with ORDER BY query. Tomas> Thanks for the patch. I've looked at it today, and it seems Tomas> mostly OK, with a couple of minor issues. Most of it is code Tomas> formatting and comment wording issues, so I'm not going to go Tomas> through them here - see the attached 0002 patch (0001 is your Tomas> patch, rebased to current master). I still think that this is the wrong approach. Implementing WITH TIES and PERCENT together using an implicit window function call kills two birds with one very small stone (the only executor change needed would be teaching LIMIT to be able to stop on a boolean condition), with maximum reuse of existing facilities. -- Andrew (irc:RhodiumToad)
On 10/29/2018 04:17 PM, Andrew Gierth wrote: >>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > > On 10/26/2018 12:28 PM, Surafel Temesgen wrote: > >> hello , > >> > >> The WITH TIES keyword is sql standard that specifies any peers of > >> retained rows to retained in the result set too .which means > >> according to ordering key the result set can includes additional rows > >> which have ties on the last position, if there are any and It work > >> with ORDER BY query. > > Tomas> Thanks for the patch. I've looked at it today, and it seems > Tomas> mostly OK, with a couple of minor issues. Most of it is code > Tomas> formatting and comment wording issues, so I'm not going to go > Tomas> through them here - see the attached 0002 patch (0001 is your > Tomas> patch, rebased to current master). > > I still think that this is the wrong approach. Implementing WITH TIES > and PERCENT together using an implicit window function call kills two > birds with one very small stone (the only executor change needed would > be teaching LIMIT to be able to stop on a boolean condition), with > maximum reuse of existing facilities. > Hmmm, maybe. How would that work, exactly? Wouldn't that mean extra overhead (the window functions are hardly free) and limitations? Perhaps that was discussed in some other thread in the past? FWIW, I doubt the patch can be much smaller/simpler - a significant part of the new stuff is in gram.y and node read/out infrastructure, the changes to LIMIT node are fairly minimal. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I still think that this is the wrong approach. Implementing WITH >> TIES and PERCENT together using an implicit window function call >> kills two birds with one very small stone (the only executor change >> needed would be teaching LIMIT to be able to stop on a boolean >> condition), with maximum reuse of existing facilities. Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean Tomas> extra overhead (the window functions are hardly free) and Tomas> limitations? They're not free, but then this case probably shouldn't be treated as a particularly hot code path. The basic idea is this: extend the Limit node to be able to stop on a boolean expression, such that the first false value ends the result. Then FETCH FIRST N WITH TIES becomes "stop when the expression rank() over (order by ...) <= N is no longer true" (where the ... is the existing top level order by) and FETCH FIRST N PERCENT is the same but with percent_rank (or cume_dist, I'd have to check the exact semantics) rank() doesn't need to read ahead in the input. percent_rank of course does, but it's clearly impossible to implement PERCENT without doing that. Also, one could consider extending LIMIT to support arbitrary expressions, with some syntax like LIMIT WHEN (condition) which would be the general form. Tomas> FWIW, I doubt the patch can be much smaller/simpler - a Tomas> significant part of the new stuff is in gram.y and node read/out Tomas> infrastructure, the changes to LIMIT node are fairly minimal. It's not a matter of patch size as such, but reuse of mechanisms rather than adding new ones. Also doing WITH TIES as a special case in the Limit node doesn't make adding PERCENT any easier at all, whereas the above gets it basically for free. -- Andrew (irc:RhodiumToad)
On 10/29/2018 05:47 PM, Andrew Gierth wrote: >>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > >> I still think that this is the wrong approach. Implementing WITH > >> TIES and PERCENT together using an implicit window function call > >> kills two birds with one very small stone (the only executor change > >> needed would be teaching LIMIT to be able to stop on a boolean > >> condition), with maximum reuse of existing facilities. > > Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean > Tomas> extra overhead (the window functions are hardly free) and > Tomas> limitations? > > They're not free, but then this case probably shouldn't be treated as a > particularly hot code path. > > The basic idea is this: extend the Limit node to be able to stop on a > boolean expression, such that the first false value ends the result. > > Then FETCH FIRST N WITH TIES becomes "stop when the expression > rank() over (order by ...) <= N > is no longer true" (where the ... is the existing top level order by) > > and FETCH FIRST N PERCENT is the same but with percent_rank (or > cume_dist, I'd have to check the exact semantics) > > rank() doesn't need to read ahead in the input. percent_rank of course > does, but it's clearly impossible to implement PERCENT without doing > that. > OK. I was under the impression the window function would need to see all the input rows, but that seems not to be the case in general. > Also, one could consider extending LIMIT to support arbitrary > expressions, with some syntax like LIMIT WHEN (condition) which would be > the general form. > Hmmm. I can't really recall needing such thing, but of course - that does not prove it'd not be a good thing. > Tomas> FWIW, I doubt the patch can be much smaller/simpler - a > Tomas> significant part of the new stuff is in gram.y and node read/out > Tomas> infrastructure, the changes to LIMIT node are fairly minimal. > > It's not a matter of patch size as such, but reuse of mechanisms rather > than adding new ones. Also doing WITH TIES as a special case in the > Limit node doesn't make adding PERCENT any easier at all, whereas the > above gets it basically for free. > Makes sense, I guess. At first I was thinking that this certainly does not add more new mechanisms than allowing LIMIT to terminate on boolean expression. But you're right about the WITH PERCENT part - using window functions would make adding this much simpler. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > Then FETCH FIRST N WITH TIES becomes "stop when the expression > rank() over (order by ...) <= N > is no longer true" (where the ... is the existing top level order by) Wouldn't that be wicked expensive compared to something hard-coded that does the same thing? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/31/2018 06:17 PM, Robert Haas wrote: > On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth > <andrew@tao11.riddles.org.uk> wrote: >> Then FETCH FIRST N WITH TIES becomes "stop when the expression >> rank() over (order by ...) <= N >> is no longer true" (where the ... is the existing top level order by) > > Wouldn't that be wicked expensive compared to something hard-coded > that does the same thing? > Not sure, but that was one of my concerns too, particularly for the simple FETCH FIRST N WITH TIES case. But I think Andrew has a point it would make it much easier to implement the PERCENT case. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The attached patch include all the comment given by Tomas and i check sql standard about LIMIT and this feature
it did not say anything about it but I think its good idea to include it to LIMIT too and I will add it if we have consensus on it.
regards 
surafel 
Вложения
hi,The attached patch include all the comment given by Tomas and i check sql standard about LIMIT and this feature
it did not say anything about it but I think its good idea to include it to LIMIT too and I will add it if we have consensus on it.
regards
surafel
Вложения
Hi, On 11/24/18 10:28 AM, Surafel Temesgen wrote: > Attach is rebased patch against the current master > regards > Surafel > > On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen <surafel3000@gmail.com > <mailto:surafel3000@gmail.com>> wrote: > > hi, > > The attached patch include all the comment given by Tomas and i > check sql standard about LIMIT and this feature > Unfortunately, it seems the "limit" regression tests fail for some reason - the output mismatches the expected results for some reason. It seems as if the WITH TIES code affects ordering of the results within the group. See the attached file. > it did not say anything about it but I think its good idea to > include it to LIMIT too and I will add it if we have consensus on it. > Hmm, I'm not sure that's needed. I don't see an urgent need to do that in v1 of the patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
>
> The attached patch include all the comment given by Tomas and i
> check sql standard about LIMIT and this feature
>
Unfortunately, it seems the "limit" regression tests fail for some
reason - the output mismatches the expected results for some reason. It
seems as if the WITH TIES code affects ordering of the results within
the group. See the attached file.
Yes the reason is the order of returned row is not always the same. I remove other columns from the result set to get constant result 
Regards
Surafel
Вложения
On 1/2/19 11:51 AM, Surafel Temesgen wrote: > > > On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote: > > > > > The attached patch include all the comment given by Tomas and i > > check sql standard about LIMIT and this feature > > > > Unfortunately, it seems the "limit" regression tests fail for some > reason - the output mismatches the expected results for some reason. It > seems as if the WITH TIES code affects ordering of the results within > the group. See the attached file. > > > Yes the reason is the order of returned row is not always the same. I > remove other columns from the result set to get constant result > Thanks, that seems reasonable. After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this patch should tweak estimates in some way. Currently, the cardinality estimate is the same as for plain LIMIT, using the requested number of rows. But let's say there are very few large groups - that will naturally increase the number of rows produced. As an example, let's say the subplan produces 1M rows, and there are 1000 groups (when split according to the ORDER BY clause). Consider a query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate will be 10, but in fact we know that it's likely to produce ~1000 rows (because that's the average group size). So I think the patch should tweak the estimate to be limitCount + (avgGroupSize/2); or perhaps Max(avgGroupSize, limitCount + (avgGroupSize/2)) The 1/2 is there because we don't know where the group starts. Of course, using average group size like this is rather crude, but it's about the best thing we can do. In principle, increasing the cardinality estimate is the right thing to do. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.
As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause).
can we use ORDER BY column raw statistic in limit node reliably? because it seems to me it can be affected by other operation in a subplan like filter condition
regards
Surafel
On 1/15/19 11:07 AM, Surafel Temesgen wrote: > > > On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote: > > After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this > patch should tweak estimates in some way. Currently, the cardinality > estimate is the same as for plain LIMIT, using the requested number of > rows. But let's say there are very few large groups - that will > naturally increase the number of rows produced. > > As an example, let's say the subplan produces 1M rows, and there are > 1000 groups (when split according to the ORDER BY clause). > > > > > can we use ORDER BY column raw statistic in limit node reliably? > because it seems to me it can be affected by other operation in a > subplan like filter condition > What do you mean by "raw statistic"? Using stats from the underlying table is not quite possible, because you might be operating on top of join or something like that. IMHO the best thing you can do is call estimate_num_groups() and combine that with the number of input rows. That shall benefit from ndistinct coefficients when available, etc. I've been thinking that considering the unreliability of grouping estimates we should use a multiple of the average size (because there may be much larger groups), but I think that's quite unprecipled and I'd much rather try without it first. But maybe we can do better when there really is a single table to consider, in which case we might look at MCV lists and estimate the largest group. That would give us a much better idea of the worst case. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
What do you mean by "raw statistic"?
I mean without further calculation to consider other operation  
IMHO the best thing you can do is call estimate_num_groups() and combine
that with the number of input rows. That shall benefit from ndistinct
coefficients when available, etc. I've been thinking that considering
the unreliability of grouping estimates we should use a multiple of the
average size (because there may be much larger groups), but I think
that's quite unprecipled and I'd much rather try without it first.
thank you for clarifying.
The attache patch use your suggestion uptread
regards 
Surafel
Вложения
On Wed, Jan 16, 2019 at 11:45:46AM +0300, Surafel Temesgen wrote: > The attached patch use your suggestion uptread This patch needs a rebase because of conflicts done recently for pluggable storage, so moved to next CF, waiting on author. -- Michael
Вложения
This patch needs a rebase because of conflicts done recently for
pluggable storage, so moved to next CF, waiting on author.
--
Вложения
On 2/4/19 2:37 PM, Surafel Temesgen wrote: > > > On Mon, Feb 4, 2019 at 8:29 AM Michael Paquier <michael@paquier.xyz > <mailto:michael@paquier.xyz>> wrote: > > > This patch needs a rebase because of conflicts done recently for > pluggable storage, so moved to next CF, waiting on author. > -- > > > Thank you . rebased against current master This patch no longer passes testing so marked Waiting on Author. Regards, -- -David david@pgmasters.net
This patch no longer passes testing so marked Waiting on Author.
Вложения
On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote: >On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote: > >> This patch no longer passes testing so marked Waiting on Author. >> >> >Thank you for informing. Fixed Thanks for the updated patch. I do have this on my list of patches that I'd like to commit in this CF - likely tomorrow after one more round of review, or so. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 29, 2019 at 01:56:48AM +0100, Tomas Vondra wrote:
>On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:
>>On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote:
>>
>>>This patch no longer passes testing so marked Waiting on Author.
>>>
>>>
>>Thank  you for informing. Fixed
>
>Thanks for the updated patch.  I do have this on my list of patches that
>I'd like to commit in this CF - likely tomorrow after one more round of
>review, or so.
>
Hi,
I got to look at the patch today, with the intent to commit, but sadly I
ran into a couple of minor issues that I don't feel comfortable fixing
on my own. Attached is a patch highlighling some of the places (0001 is
your v7 patch, to keep the cfbot happy).
1) the docs documented this as
   ... [ ONLY | WITH TIES ]
but that's wrong, because it implies those options are optional (i.e.
the user may not specify anything). That's not the case, exactly one
of those options needs to be specified, so it should have been
   ... { ONLY | WITH TIES }
2) The comment in ExecLimit() needs to be updated to explain that WITH
TIES changes the behavior.
3) Minor code style issues (no space before * on comment lines, {}
around single-line if statements, ...).
4) The ExecLimit() does this
    if (node->limitOption == WITH_TIES)
        ExecCopySlot(node->last_slot, slot);
but I think we only really need to do that for the last tuple in the
window, no? Would it be a useful optimization?
5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
code actually prints uniqOperators. Secondly, why does the code use
these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
arrays? I haven't tested this, but looking at the READ_ counterparts, I
don't see why that would be the case.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Вложения
On Sun, Mar 31, 2019 at 01:14:46AM +0100, Tomas Vondra wrote:
>On Fri, Mar 29, 2019 at 01:56:48AM +0100, Tomas Vondra wrote:
>>On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:
>>>On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote:
>>>
>>>>This patch no longer passes testing so marked Waiting on Author.
>>>>
>>>>
>>>Thank  you for informing. Fixed
>>
>>Thanks for the updated patch.  I do have this on my list of patches that
>>I'd like to commit in this CF - likely tomorrow after one more round of
>>review, or so.
>>
>
>Hi,
>
>I got to look at the patch today, with the intent to commit, but sadly I
>ran into a couple of minor issues that I don't feel comfortable fixing
>on my own. Attached is a patch highlighling some of the places (0001 is
>your v7 patch, to keep the cfbot happy).
>
>
>1) the docs documented this as
>
>  ... [ ONLY | WITH TIES ]
>
>but that's wrong, because it implies those options are optional (i.e.
>the user may not specify anything). That's not the case, exactly one
>of those options needs to be specified, so it should have been
>
>  ... { ONLY | WITH TIES }
>
>
>2) The comment in ExecLimit() needs to be updated to explain that WITH
>TIES changes the behavior.
>
>
>3) Minor code style issues (no space before * on comment lines, {}
>around single-line if statements, ...).
>
>
>4) The ExecLimit() does this
>
>   if (node->limitOption == WITH_TIES)
>       ExecCopySlot(node->last_slot, slot);
>
>but I think we only really need to do that for the last tuple in the
>window, no? Would it be a useful optimization?
>
>
>5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
>code actually prints uniqOperators. Secondly, why does the code use
>these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
>WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
>arrays? I haven't tested this, but looking at the READ_ counterparts, I
>don't see why that would be the case.
>
Actually, two more minor comments:
6) There's some confusing naming - in plannodes.h the fields added to
the Limit node are called uniqSomething, but in other places the patch
uses sortSomething, ordSomething. I suggest more consistent naming.
7) The LimitOption enum has two items - WITH_ONLY and WITH_TIES. That's
a bit strange, because there's nothing like "WITH ONLY".
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
			
		
Hi,
I got to look at the patch today, with the intent to commit, but sadly I
ran into a couple of minor issues that I don't feel comfortable fixing
on my own. Attached is a patch highlighling some of the places (0001 is
your v7 patch, to keep the cfbot happy).
1) the docs documented this as
... [ ONLY | WITH TIES ]
but that's wrong, because it implies those options are optional (i.e.
the user may not specify anything). That's not the case, exactly one
of those options needs to be specified, so it should have been
... { ONLY | WITH TIES }
2) The comment in ExecLimit() needs to be updated to explain that WITH
TIES changes the behavior.
3) Minor code style issues (no space before * on comment lines, {}
around single-line if statements, ...).
4) The ExecLimit() does this
if (node->limitOption == WITH_TIES)
ExecCopySlot(node->last_slot, slot);
but I think we only really need to do that for the last tuple in the
window, no? Would it be a useful optimization?
5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
code actually prints uniqOperators. Secondly, why does the code use
these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
arrays? I haven't tested this, but looking at the READ_ counterparts, I
don't see why that would be the case.
Вложения
Hi,
Unfortunately this got broken again, this time by aef65db676 :-(
I've tried to fix the merge conflict (essentially by moving some of the
code to adjust_limit_rows_costs(), but I'm wondering if the code added to
create_limit_path is actually correct
    if (count_est != 0)
    {
        double        count_rows;
        if (count_est > 0)
            count_rows = (double) count_est;
        else
            count_rows = clamp_row_est(subpath->rows * 0.10);
        if (limitOption == WITH_TIES)
        {
            ...
            count_rows = Max(avgGroupSize, count_est + (...));
        }
        ...
    }
Firstly, this seriously needs some comment explaining why we do this. But
more importantly - shouldn't it really be
    count_rows = Max(avgGroupSize, count_rows + (...));
instead of using count_est again (which might easily be -1 anyway)?
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I've tried to fix the merge conflict (essentially by moving some of the
> code to adjust_limit_rows_costs(), but I'm wondering if the code added to
> create_limit_path is actually correct
> ...
> Firstly, this seriously needs some comment explaining why we do this.
I've not looked at this patch, but TBH I wonder why it is touching
planner rowcount estimation at all.  I find it doubtful either that
a correction for WITH TIES would be significant in most use-cases,
or that we could estimate it accurately if it was significant.
It certainly doesn't seem like something that needs to be messed
with in v1 of the feature.
            regards, tom lane
			
		On Wed, Apr 03, 2019 at 03:08:05PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I've tried to fix the merge conflict (essentially by moving some of the >> code to adjust_limit_rows_costs(), but I'm wondering if the code added to >> create_limit_path is actually correct >> ... >> Firstly, this seriously needs some comment explaining why we do this. > >I've not looked at this patch, but TBH I wonder why it is touching >planner rowcount estimation at all. I find it doubtful either that >a correction for WITH TIES would be significant in most use-cases, >or that we could estimate it accurately if it was significant. >It certainly doesn't seem like something that needs to be messed >with in v1 of the feature. > FWIW it was me who suggested to tweak the cardinality estimation this way, but if we want to leave it out from v1, I'm OK with that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Surafel, I've been looking at the patch with the intent to commit it, but once again I ran into stuff that seems suspicious to me but am not familiar with enough to say if it's OK. I'm sorry about that :-( First, a couple of tweaks in the attached v9 of the patch: 1) I've removed the costing changes. As Tom mentions elsewhere in this thread, that's probably not needed for v1 and it's true those estimates are probably somewhat sketchy anyway. 2) v8 did this (per my comment): - ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node)); + if(node->limitOption == EXACT_NUMBER) + ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node)); but I noticed the comment immediately above that says * Notify child node about limit. Note: think not to "optimize" by * skipping ExecSetTupleBound if compute_tuples_needed returns < 0. We * must update the child node anyway, in case this is a rescan and the * previous time we got a different result. which applies to WITH_TIES too, no? So I've modified compute_tuples_needed to return -1, and reverted this change. Not sure, though. 3) I'm a bit confused by the initialization added to ExecInitLimit. It first gets the tuple descriptor from the limitstate (it should not do so directly but use ExecGetResultType). But when it creates the extra slot, it uses ops extracted from the outer plan. That's strange, I guess ... And then it extracts the descriptor from the outer plan and uses it when calling execTuplesMatchPrepare. But AFAIK it's going to be compared to the last_slot, which is using a descriptor from the limitstate. IMHO all of this should use descriptor/ops from the outer plan, no? It probably does not change anything because limit does not project, but it seems confusing. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
1) I've removed the costing changes. As Tom mentions elsewhere in this
thread, that's probably not needed for v1 and it's true those estimates
are probably somewhat sketchy anyway.
2) v8 did this (per my comment):
- ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+ if(node->limitOption == EXACT_NUMBER)
+ ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
but I noticed the comment immediately above that says
* Notify child node about limit. Note: think not to "optimize" by
* skipping ExecSetTupleBound if compute_tuples_needed returns < 0. We
* must update the child node anyway, in case this is a rescan and the
* previous time we got a different result.
which applies to WITH_TIES too, no? So I've modified compute_tuples_needed
to return -1, and reverted this change. Not sure, though.
3) I'm a bit confused by the initialization added to ExecInitLimit. It
first gets the tuple descriptor from the limitstate (it should not do so
directly but use ExecGetResultType). But when it creates the extra slot,
it uses ops extracted from the outer plan. That's strange, I guess ...
And then it extracts the descriptor from the outer plan and uses it when
calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
the last_slot, which is using a descriptor from the limitstate.
IMHO all of this should use descriptor/ops from the outer plan, no? It
probably does not change anything because limit does not project, but it
seems confusing.
On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > agree Hi Surafel, A new Commitfest is starting. Can you please post a fresh rebase of this patch? Thanks, -- Thomas Munro https://enterprisedb.com
On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
> agree
Hi Surafel,
A new Commitfest is starting. Can you please post a fresh rebase of this patch?
Вложения
On 2019-07-01 19:38, Surafel Temesgen wrote:
> Thank you for informing. attach is a rebased patch against current 
> master
> [...]
> [fetch_first_with_ties_v10.patch]
Hi Surafel,
The patch applies OK, make check is OK, compiles OK.
But I get:
TRAP: FailedAssertion("!(!(((slot)->tts_flags & (1 << 1)) != 0))", File: 
"execTuples.c", Line: 491)
when running a variant ('fetch 1' instead of 'fetch 2') of the test SQL 
against src/test/regress/data/onek.data:
(in the script below the location of the file 'onek.data' will have to 
be changed)
--------------------- 8< ---------------------
#!/bin/bash
echo "
drop   table if exists onek ;
create table onek (
unique1      int4,
unique2      int4,
two          int4,
four         int4,
ten          int4,
twenty       int4,
hundred      int4,
thousand     int4,
twothousand  int4,
fivethous    int4,
tenthous     int4,
odd          int4,
even         int4,
stringu1     name,
stringu2     name,
string4      name
);
copy onek from 
'/home/aardvark/pg_stuff/pg_sandbox/pgsql.fetch_first_with_ties/src/test/regress/data/onek.data';
create index               onek_unique1 on onek using btree(unique1 
int4_ops);
create index onek_unique2  on onek using btree(unique2 int4_ops);
create index onek_hundred  on onek using btree(hundred int4_ops);
create index onek_stringu1 on onek using btree(stringu1 name_ops);
-- OK:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows only
;
-- crashes:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows with ties
;
" | psql -qXa
--------------------- 8< ---------------------
Can you have a look?
thanks,
Erik Rijkers
			
		Can you have a look?
Вложения
As Tom just said in the thread for PERCENT, the gram.y changes need a
better representation.  Also, rename EXACT_NUMBER, per that thread.
As far as I can tell, this concerns feature F867.  I think we should
mark that as supported after this patch -- please edit
src/backend/catalog/sql_features.txt.
Earlier in the thread, Tomas Vondra said:
> 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so
                                                                     
 
> directly but use ExecGetResultType). But when it creates the extra slot,
                                                                     
 
> it uses ops extracted from the outer plan. That's strange, I guess ...
                                                                     
 
>
                                                                     
 
> And then it extracts the descriptor from the outer plan and uses it when
                                                                     
 
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
                                                                     
 
> the last_slot, which is using a descriptor from the limitstate.
                                                                     
 
>
                                                                     
 
> IMHO all of this should use descriptor/ops from the outer plan, no? It
                                                                     
 
> probably does not change anything because limit does not project, but it
                                                                     
 
> seems confusing.
                                                                     
 
and you replied:
> agree
... yet this doesn't appear to have resulted in any change in the code,
or I just missed it.  Are you going to update the patch per that?
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		As Tom just said in the thread for PERCENT, the gram.y changes need a
better representation. Also, rename EXACT_NUMBER, per that thread.
As far as I can tell, this concerns feature F867. I think we should
mark that as supported after this patch -- please edit
src/backend/catalog/sql_features.txt.
Earlier in the thread, Tomas Vondra said:
> 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so
> directly but use ExecGetResultType). But when it creates the extra slot,
> it uses ops extracted from the outer plan. That's strange, I guess ...
>
> And then it extracts the descriptor from the outer plan and uses it when
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
> the last_slot, which is using a descriptor from the limitstate.
>
> IMHO all of this should use descriptor/ops from the outer plan, no? It
> probably does not change anything because limit does not project, but it
> seems confusing.
and you replied:
> agree
... yet this doesn't appear to have resulted in any change in the code,
or I just missed it. Are you going to update the patch per that?
On 2019-Sep-06, Surafel Temesgen wrote: > > ... yet this doesn't appear to have resulted in any change in the code, > > or I just missed it. Are you going to update the patch per that? > > Its already done in v9 of the patch attached by Tomas Oh, I missed that. Great, so we're expecting some small updates from you then and I think you can mark ready for committer when you post it. Thanks. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Sep-06, Surafel Temesgen wrote:
> > ... yet this doesn't appear to have resulted in any change in the code,
> > or I just missed it. Are you going to update the patch per that?
>
> Its already done in v9 of the patch attached by Tomas
Oh, I missed that. Great, so we're expecting some small updates from
you then and I think you can mark ready for committer when you post it.
Thanks.
Вложения
Hmm,
create table w (a point);
# select * from w order by a fetch first 2 rows with ties;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a fetch first 2 rows with ties;
                                  ^
HINT:  Use an explicit ordering operator or modify the query.
I'm not sure that the HINT is useful here.
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> create table w (a point);
> # select * from w order by a fetch first 2 rows with ties;
> ERROR:  could not identify an ordering operator for type point
> LINE 1: select * from w order by a fetch first 2 rows with ties;
>                                   ^
> HINT:  Use an explicit ordering operator or modify the query.
> I'm not sure that the HINT is useful here.
That's not new to this patch, HEAD does the same:
regression=# create table w (a point);
CREATE TABLE
regression=# select * from w order by a ;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a ;
                                 ^
HINT:  Use an explicit ordering operator or modify the query.
It is a meaningful hint IMO, since (in theory) you could add
something like "USING <<" to the ORDER BY to specify a
particular ordering operator.  The fact that no suitable
operator is actually available in core doesn't seem like
a reason not to give the hint.
            regards, tom lane
			
		Looking at this patch again.  (Attached v13 fixes a typo in ruleutils,
fixes a bogus edit in the SGML docs plus minor rewording, and is rebased
to current master).
First, I noticed that there's a significant unanswered issue from Andrew
Gierth about this using a completely different mechanism, namely an
implicit window function.  I see that Robert was concerned about the
performance of Andrew's proposed approach, but I don't see any reply
from Surafel on the whole issue.
Andrew: Would you rather not have this feature in this form at all?
Tomas, are you still intending to be committer for this?
There's also this point that was not discussed very much:
On 2018-Oct-28, Tomas Vondra wrote:
> One question is whether to make this work for LIMIT too, not just for
> FETCH FIRST. I'm not sure how difficult would that be (it should be a
> grammar-only tweak I guess), or if it's a good idea in general.
I was also thinking that it would be good to support the new clause
under the LIMIT spelling, but it turns out that there are shift/reduce
conflicts if done the naive way(*); fixing those looks to be rather major
surgery to the way we represent the LIMIT/OFFSET clauses.
I'm not sure the proposed changes to gram.y are all that great, though.
Right now we have separate productions offset_clause and limit_clause,
and a combination mechanism called select_limit; after the patch the
whole thing is a little bit too messy.  Would it be sensible to unify/
rationalize these?
(*) this:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 72ad633102..01d337c30e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11672,6 +11672,13 @@ limit_clause:
                     n->limitOption = LIMIT_OPTION_COUNT;
                     $$ = n;
                 }
+            | LIMIT select_limit_value WITH TIES
+                {
+                    LimitClause *n = (LimitClause *) palloc(sizeof(LimitClause));
+                    n->limitCount = $2;
+                    n->limitOption = LIMIT_OPTION_WITH_TIES;
+                    $$ = n;
+                }
             | LIMIT select_limit_value ',' select_offset_value
                 {
                     /* Disabled because it was too confusing, bjm 2002-02-18 */
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Вложения
(BTW another small issue in the patch is the comments in gram.y that talk about ONLY being a reserved word about the SQL:2008 syntax; ISTM that that comment should now mention both ONLY as well as WITH being fully reserved words.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
First, I noticed that there's a significant unanswered issue from Andrew
Gierth about this using a completely different mechanism, namely an
implicit window function. I see that Robert was concerned about the
performance of Andrew's proposed approach, but I don't see any reply
from Surafel on the whole issue.
On 2019-Nov-11, Alvaro Herrera wrote: > I'm not sure the proposed changes to gram.y are all that great, though. Here's a proposed simplification of the gram.y changes. There are two things here: 1. cosmetic: we don't need the LimitClause struct; we can use just SelectLimit, and return that from limit_clause; that can be complemented using the offset_clause if there's any at select_limit level. 2. there's a gratuituous palloc() in opt_select_limit when there's no clause, that seems to be there just so that NULLs can be returned. That's one extra palloc for SELECTs parsed using one the affected productions ... it's not every single select, but it seems bad enough it's worth fixing. I fixed #2 by just checking whether the return from opt_select_limit is NULL. ISTM it'd be better to pass the SelectLimit pointer to insertSelectOptions instead (which is a static function in gram.y anyway so there's no difficulty there). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
(Prior to posting this delta patch, the CF bot appeared happy with this patch.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I rebased this patch, and my proposed changes are in 0002. Looking at the changes in ExecLimit, I'm wondering if it would be better to add a new state to the state machine there -- instead of doing all the work in duplicative code in the LIMIT_INWINDOW case, have that one only save the current end-of-window tuple and jump to LIMIT_WINDOWEND_WITH_TIE (or something) which then returns all tuples that match the current one. I think that would end up being cleaner. Please research. Marking Waiting on Author. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
I rebased this patch, and my proposed changes are in 0002.
Looking at the changes in ExecLimit, I'm wondering if it would be better
to add a new state to the state machine there -- instead of doing all
the work in duplicative code in the LIMIT_INWINDOW case, have that one
only save the current end-of-window tuple and jump to
LIMIT_WINDOWEND_WITH_TIE (or something) which then returns all tuples that
match the current one. I think that would end up being cleaner. Please
research.
Вложения
Thanks.
(I would suggest renaming the new state LIMIT_WINDOWEND_TIES, because it
seems to convey what it does a little better.)
I think you should add a /* fall-though */ comment after changing state.
Like this (this flow seems clearer; also DRY):
                if (!node->noCount &&
                    node->position - node->offset >= node->count)
                {
                    if (node->limitOption == LIMIT_OPTION_COUNT)
                    {
                        node->lstate = LIMIT_WINDOWEND;
                        return NULL;
                    }
                    else
                    {
                        node->lstate = LIMIT_WINDOWEND_TIES;
                        /* fall-through */
                    }
                }
                else
                    ...
I've been playing with this a little more, and I think you've overlooked
a few places in ExecLimit that need to be changed.  In the regression
database, I tried this:
55432 13devel 17282=# begin;
BEGIN
55432 13devel 17282=# declare c1 cursor for select * from int8_tbl order by q1 fetch first 2 rows with ties;
DECLARE CURSOR
55432 13devel 17282=# fetch all in c1;
 q1  │        q2        
─────┼──────────────────
 123 │              456
 123 │ 4567890123456789
(2 filas)
55432 13devel 17282=# fetch all in c1;
 q1 │ q2 
────┼────
(0 filas)
55432 13devel 17282=# fetch forward all in c1;
 q1 │ q2 
────┼────
(0 filas)
So far so good .. things look normal.  But here's where things get
crazy:
55432 13devel 17282=# fetch backward all in c1;
        q1        │        q2        
──────────────────┼──────────────────
 4567890123456789 │              123
              123 │ 4567890123456789
(2 filas)
(huh???)
55432 13devel 17282=# fetch backward all in c1;
 q1 │ q2 
────┼────
(0 filas)
Okay -- ignoring the fact that we got a row that should not be in the
result, we've exhausted the cursor going back.  Let's scan it again from
the start:
55432 13devel 17282=# fetch forward all in c1;
        q1        │        q2         
──────────────────┼───────────────────
              123 │  4567890123456789
 4567890123456789 │               123
 4567890123456789 │  4567890123456789
 4567890123456789 │ -4567890123456789
(4 filas)
This is just crazy.
I think you need to stare a that thing a little harder.
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Thanks.
(I would suggest renaming the new state LIMIT_WINDOWEND_TIES, because it
seems to convey what it does a little better.)
I think you should add a /* fall-though */ comment after changing state.
Like this (this flow seems clearer; also DRY):
if (!node->noCount &&
node->position - node->offset >= node->count)
{
if (node->limitOption == LIMIT_OPTION_COUNT)
{
node->lstate = LIMIT_WINDOWEND;
return NULL;
}
else
{
node->lstate = LIMIT_WINDOWEND_TIES;
/* fall-through */
}
}
else
...
I've been playing with this a little more, and I think you've overlooked
a few places in ExecLimit that need to be changed. In the regression
database, I tried this:
55432 13devel 17282=# begin;
BEGIN
55432 13devel 17282=# declare c1 cursor for select * from int8_tbl order by q1 fetch first 2 rows with ties;
DECLARE CURSOR
55432 13devel 17282=# fetch all in c1;
q1 │ q2
─────┼──────────────────
123 │ 456
123 │ 4567890123456789
(2 filas)
55432 13devel 17282=# fetch all in c1;
q1 │ q2
────┼────
(0 filas)
55432 13devel 17282=# fetch forward all in c1;
q1 │ q2
────┼────
(0 filas)
So far so good .. things look normal. But here's where things get
crazy:
55432 13devel 17282=# fetch backward all in c1;
q1 │ q2
──────────────────┼──────────────────
4567890123456789 │ 123
123 │ 4567890123456789
(2 filas)
(huh???)
55432 13devel 17282=# fetch backward all in c1;
q1 │ q2
────┼────
(0 filas)
Okay -- ignoring the fact that we got a row that should not be in the
result, we've exhausted the cursor going back. Let's scan it again from
the start:
55432 13devel 17282=# fetch forward all in c1;
q1 │ q2
──────────────────┼───────────────────
123 │ 4567890123456789
4567890123456789 │ 123
4567890123456789 │ 4567890123456789
4567890123456789 │ -4567890123456789
(4 filas)
This is just crazy.
I think you need to stare a that thing a little harder.
Вложения
On 2019-Nov-28, Surafel Temesgen wrote:
> On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > I think you should add a /* fall-though */ comment after changing state.
> > Like this (this flow seems clearer; also DRY):
> >
> >                 if (!node->noCount &&
> >                     node->position - node->offset >= node->count)
> >                 {
> >                     if (node->limitOption == LIMIT_OPTION_COUNT)
> >                     {
> >                         node->lstate = LIMIT_WINDOWEND;
> >                         return NULL;
> >                     }
> >                     else
> >                     {
> >                         node->lstate = LIMIT_WINDOWEND_TIES;
> >                         /* fall-through */
> >                     }
> >                 }
> >                 else
> >                     ...
>
> changed
But you did not read my code snippet, did you ...?
> > I think you need to stare a that thing a little harder.
>
> This is because the new state didn't know about backward scan
> and there was off by one error it scan one position past to detect
> ties. The attached patch fix both
Oh, thanks, glad it was that easy.
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: Alvaro> First, I noticed that there's a significant unanswered issue Alvaro> from Andrew Gierth about this using a completely different Alvaro> mechanism, namely an implicit window function. Robert was Alvaro> concerned about the performance of Andrew's proposed approach, Alvaro> but I don't see any reply from Surafel on the whole issue. Alvaro> Andrew: Would you rather not have this feature in this form at Alvaro> all? I have done a proof-of-concept hack for my alternative approach, which I will post in another thread so as not to confuse the cfbot. The main advantage, as I see it, of a window function approach is that it can also work for PERCENT with only a change to the generated expression; the executor part of the code can handle any stopping condition. It can also be generalized (though the POC does not do this) to allow an SQL-level extension, something like "LIMIT WHEN condition" to indicate that it should stop just before the first row for which the condition is true. Whether this is a desirable feature or not is another question. -- Andrew.
On Fri, Nov 29, 2019 at 05:19:00AM +0000, Andrew Gierth wrote: >>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Alvaro> First, I noticed that there's a significant unanswered issue > Alvaro> from Andrew Gierth about this using a completely different > Alvaro> mechanism, namely an implicit window function. Robert was > Alvaro> concerned about the performance of Andrew's proposed approach, > Alvaro> but I don't see any reply from Surafel on the whole issue. > > Alvaro> Andrew: Would you rather not have this feature in this form at > Alvaro> all? > >I have done a proof-of-concept hack for my alternative approach, which I >will post in another thread so as not to confuse the cfbot. > >The main advantage, as I see it, of a window function approach is that >it can also work for PERCENT with only a change to the generated >expression; the executor part of the code can handle any stopping >condition. It can also be generalized (though the POC does not do this) >to allow an SQL-level extension, something like "LIMIT WHEN condition" >to indicate that it should stop just before the first row for which the >condition is true. Whether this is a desirable feature or not is another >question. > For the record, the alternative approach was posted here: https://www.postgresql.org/message-id/flat/87o8wvz253.fsf@news-spur.riddles.org.uk I think we need to make a decision about which road to take - shall we continue with what Surafel originally proposed, or should we instead adopt Andrew's patch? Andrew's patch looks significantly simpler, although it probably will need more work to get it committable. My main concern about using window functions was performance, so I think we should benchmark. For example, considering window functions are not parallel safe, would this have negative impact on parallel queries? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Nov-28, Surafel Temesgen wrote:
> On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > I think you should add a /* fall-though */ comment after changing state.
> > Like this (this flow seems clearer; also DRY):
> >
> > if (!node->noCount &&
> > node->position - node->offset >= node->count)
> > {
> > if (node->limitOption == LIMIT_OPTION_COUNT)
> > {
> > node->lstate = LIMIT_WINDOWEND;
> > return NULL;
> > }
> > else
> > {
> > node->lstate = LIMIT_WINDOWEND_TIES;
> > /* fall-through */
> > }
> > }
> > else
> > ...
>
> changed
But you did not read my code snippet, did you ...?
Вложения
There was a minor conflict in planmain.h. Here's a refreshed version. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Some ruleutils.c code added by this patch is not covered by tests:
    5246             :     /* Add the LIMIT clause if given */
    5247        1115 :     if (query->limitOffset != NULL)
    5248             :     {
    5249           0 :         appendContextKeyword(context, " OFFSET ",
    5250             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5251           0 :         get_rule_expr(query->limitOffset, context, false);
    5252             :     }
    5253        1115 :     if (query->limitOption == LIMIT_OPTION_WITH_TIES)
    5254             :     {
    5255           0 :         appendContextKeyword(context, " FETCH FIRST ",
    5256             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5257           0 :         get_rule_expr(query->limitCount, context, false);
    5258           0 :         appendContextKeyword(context, " ROWS WITH TIES ",
    5259             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5260             :     }
    5261        1115 :     if (query->limitCount != NULL && query->limitOption != LIMIT_OPTION_WITH_TIES)
    5262             :     {
    5263           2 :         appendContextKeyword(context, " LIMIT ",
    5264             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5265           2 :         if (IsA(query->limitCount, Const) &&
    5266           0 :             ((Const *) query->limitCount)->constisnull)
    5267           0 :             appendStringInfoString(buf, "ALL");
    5268             :         else
    5269           2 :             get_rule_expr(query->limitCount, context, false);
    5270             :     }
Other than that, the patch seems good to go to me, so unless there are
objections, I intend to get this pushed tomorrow.
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Pushed, with some additional changes.
So *of course* when I add tests to verify that ruleutils, I find a case
that does not work properly -- meaning, you get a view that pg_dump
emits in a way that won't be accepted:
CREATE VIEW limit_thousand_v_3 AS SELECT thousand FROM onek WHERE thousand < 995
        ORDER BY thousand FETCH FIRST NULL ROWS WITH TIES;
note the "NULL" there.  ruleutils would gladly print this out as:
View definition:
 SELECT onek.thousand
   FROM onek
  WHERE onek.thousand < 995
  ORDER BY onek.thousand
 FETCH FIRST NULL::integer ROWS WITH TIES;
which is then not accepted.
The best fix I could come up for this was to reject a bare NULL in the
limit clause.  It's a very stupid fix, because you can still give it a
NULL, using
CREATE VIEW limit_thousand_v_3 AS SELECT thousand FROM onek WHERE thousand < 995
        ORDER BY thousand FETCH FIRST (NULL+1) ROWS WITH TIES;
and the like.  But when ruleutils get this, it will add the parens,
which will magically make it work.
It turns out that the SQL standard is much more limited in what it will
accept there.  But our grammar (what we'll accept for the ancient LIMIT
clause) is very lenient -- it'll take just any expression.  I thought
about reducing that to NumericOnly for FETCH FIRST .. WITH TIES, but
then I have to pick: 1) gram.y fails to compile because of a
reduce/reduce conflict, or 2) also restricting FETCH FIRST .. ONLY to
NumericOnly.  Neither of those seemed very palatable.
-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Hi,
On 2020-04-07 16:36:54 -0400, Alvaro Herrera wrote:
> Pushed, with some additional changes.
This triggers a new warning for me (gcc-10):
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c: In function ‘ExecLimit’:
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c:136:7: warning: this statement may fall through
[-Wimplicit-fallthrough=]
  136 |    if (ScanDirectionIsForward(direction))
      |       ^
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c:216:3: note: here
  216 |   case LIMIT_WINDOWEND_TIES:
      |   ^~~~
I've not looked at it in any sort of detail, but it looks like it might
be a false positive, with the "fall-through" comment not being
sufficient to quiesce the compiler?
Cosmetically I would agree that falling through to the next case" a few
blocks deep inside a case: isn't the prettiest...
- Andres
			
		Hello On 2020-Apr-07, Andres Freund wrote: > On 2020-04-07 16:36:54 -0400, Alvaro Herrera wrote: > > Pushed, with some additional changes. > > This triggers a new warning for me (gcc-10): > /home/andres/src/postgresql/src/backend/executor/nodeLimit.c: In function ‘ExecLimit’: > /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:136:7: warning: this statement may fall through [-Wimplicit-fallthrough=] > 136 | if (ScanDirectionIsForward(direction)) > | ^ > /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:216:3: note: here > 216 | case LIMIT_WINDOWEND_TIES: > | ^~~~ > > I've not looked at it in any sort of detail, but it looks like it might > be a false positive, with the "fall-through" comment not being > sufficient to quiesce the compiler? It's on purpose, yeah, but I can understand the compiler not getting it. > Cosmetically I would agree that falling through to the next case" a few > blocks deep inside a case: isn't the prettiest... That's true ... maybe a fix would be to split that stuff to a subroutine? Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: Alvaro> It turns out that the SQL standard is much more limited in what Alvaro> it will accept there. But our grammar (what we'll accept for Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just Alvaro> any expression. I thought about reducing that to NumericOnly Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1) Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither Alvaro> of those seemed very palatable. FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it didn't allow parameters (which are allowed by the spec); see 1da162e1f. (That change didn't present a problem for ruleutils, because FETCH FIRST ... ONLY is output as a LIMIT clause instead.) This needs to be fixed in ruleutils, IMO, not by changing what the grammar accepts. -- Andrew.
On 2020-Apr-08, Andrew Gierth wrote: > >>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Alvaro> It turns out that the SQL standard is much more limited in what > Alvaro> it will accept there. But our grammar (what we'll accept for > Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just > Alvaro> any expression. I thought about reducing that to NumericOnly > Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1) > Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or > Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither > Alvaro> of those seemed very palatable. > > FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it > didn't allow parameters (which are allowed by the spec); see 1da162e1f. Hmm, oh, I see. > (That change didn't present a problem for ruleutils, because FETCH FIRST > ... ONLY is output as a LIMIT clause instead.) Right, I noticed that and kept it unchanged. > This needs to be fixed in ruleutils, IMO, not by changing what the > grammar accepts. Fair. I didn't change what the grammar accepts. I ended up only throwing an error in parse analysis when a bare NULL const is seen. I guess we could fix ruleutils to add parens when NULL is seen, but I'm not sure it's necessary or useful. (LIMIT uses a null to represent the LIMIT ALL case.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> This needs to be fixed in ruleutils, IMO, not by changing what the >> grammar accepts. Alvaro> Fair. I didn't change what the grammar accepts. I ended up only Alvaro> throwing an error in parse analysis when a bare NULL const is Alvaro> seen. This seems far too arbitrary to me. -- Andrew (irc:RhodiumToad)
On 2020-Apr-08, Andrew Gierth wrote: > >>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > >> This needs to be fixed in ruleutils, IMO, not by changing what the > >> grammar accepts. > > Alvaro> Fair. I didn't change what the grammar accepts. I ended up only > Alvaro> throwing an error in parse analysis when a bare NULL const is > Alvaro> seen. > > This seems far too arbitrary to me. Oops, I saw this comment only now. We're still on time to fix this, if there's something that needs fixing. Let's discuss what changes are needed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services