Обсуждение: Draft LIMIT pushdown to Append and MergeAppend patch

Поиск
Список
Период
Сортировка

Draft LIMIT pushdown to Append and MergeAppend patch

От
Michał Kłeczek
Дата:
Hi All,

Attached is a draft patch implementing LIMIT pushdown to Append and MergeAppend nodes.

This patch eliminates the need to resort to subqueries to optimise UNIONs.
It also enables more aggressive partition pruning.
Not sure if it causes LIMIT pushdown to foreign partitions though.

Applying this patch causes regressions in:
- postgres_fdw tests
- partitions tests

This is due to subsequent partition pruning applied when LIMIT is pushed down - I guess that’s a (big) win.

I would be happy to hear if the approach is sound.

Thanks,
Michal
Вложения

Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Michał Kłeczek
Дата:
Hi All,

Attached is a second version of the patch.

The goal is to:
1. Apply LIMIT as early as possible - especially to apply LIMIT in partition scans
2. Enable LIMIT pushdown for FDW partitions.

Main idea of the patch is:

1. Wrap children of Append and MergeAppend paths in LimitPaths.
2. Let FDW extension handle limit pushdown

The changes are mainly in pathnode.c:
- Introduced a new function: pushdown_limit() used by planner instead of create_limit_node
- pushdown_limit handles MergeAppend, Append and ForeignScan nodes specially
- it falls back to create_limit_node for other path types

Changes in fdw:
- added a new FDW operation PushdownLimitNode
- this operation is called by pushdown_limit in pathnode.c

Changes in postgres_fdw.c
- Added stub implementation of PushdownLimitNode operation that delegates to create_limit_node wrapping original
ForeignPathnode 

I am going to work on tests right now as (obviously) they are failing due to different plans.

As this is my first time I dig into the internals of Postgres I would be really grateful for friendly review and some
directions- I am not sure it the approach is the right one. 

The need for this is real: we are struggling with slow queries on partitioned tables - the business requirements are
suchthat the only way to avoid index scans yielding many records is to apply LIMIT early and not execute partition
scansat all if enough rows are produced. 

Kind regards,
Michal





> On 7 Oct 2023, at 12:01, Michał Kłeczek <michal@kleczek.org> wrote:
>
> Hi All,
>
> Attached is a draft patch implementing LIMIT pushdown to Append and MergeAppend nodes.
>
> This patch eliminates the need to resort to subqueries to optimise UNIONs.
> It also enables more aggressive partition pruning.
> Not sure if it causes LIMIT pushdown to foreign partitions though.
>
> Applying this patch causes regressions in:
> - postgres_fdw tests
> - partitions tests
>
> This is due to subsequent partition pruning applied when LIMIT is pushed down - I guess that’s a (big) win.
>
> I would be happy to hear if the approach is sound.
>
> Thanks,
> Michal<limit-pushdown.patch>


Вложения

Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Andy Fan
Дата:


On Sun, Oct 8, 2023 at 5:04 AM Michał Kłeczek <michal@kleczek.org> wrote:
Hi All,

Attached is a second version of the patch.

The goal is to:
1. Apply LIMIT as early as possible - especially to apply LIMIT in partition scans

For the patches for performance improvement,  it is better to provide
an example to show how much benefits we can get.  As for this case,
I'm doubtful it can work as an improvement. 

2. Enable LIMIT pushdown for FDW partitions.

The same as above,  some testing is helpful. 
 
--
Best Regards
Andy Fan

Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Michał Kłeczek
Дата:
Thanks for the feedback.

On 8 Oct 2023, at 03:33, Andy Fan <zhihui.fan1213@gmail.com> wrote:



On Sun, Oct 8, 2023 at 5:04 AM Michał Kłeczek <michal@kleczek.org> wrote:
Hi All,

Attached is a second version of the patch.

The goal is to:
1. Apply LIMIT as early as possible - especially to apply LIMIT in partition scans

For the patches for performance improvement,  it is better to provide
an example to show how much benefits we can get.  As for this case,
I'm doubtful it can work as an improvement. 

2. Enable LIMIT pushdown for FDW partitions.

The same as above,  some testing is helpful. 

The idea came up from this e-mail thread from 2019:


While obviously permofmance testing is needed to confirm any real improvements
I now (after your feedback) have second thoughts if it is worth pursuing at all.

Could you elaborate a little why you think it won’t work as an improvement?
Is it because in practice LIMIT _is_ pushed down already during execution?
From what I understand postgres_fdw does indeed fetch on demand.
OTOH pushing down LIMIT is considered an improvement (as witnessed in the postgres_fdw code itself after d50d172e51)

Care to provide some more information?

Thanks,

--
Michal

Вложения

Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
David Rowley
Дата:
On Sun, 8 Oct 2023 at 18:32, Michał Kłeczek <michal@kleczek.org> wrote:
> On 8 Oct 2023, at 03:33, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> For the patches for performance improvement,  it is better to provide
>> an example to show how much benefits we can get.  As for this case,
>> I'm doubtful it can work as an improvement.

> Could you elaborate a little why you think it won’t work as an improvement?
> Is it because in practice LIMIT _is_ pushed down already during execution?
> From what I understand postgres_fdw does indeed fetch on demand.
> OTOH pushing down LIMIT is considered an improvement (as witnessed in the postgres_fdw code itself after d50d172e51)

In my opinion, analysis of where we can push LIMIT node deeper into
the plan is an interesting area for research and work.

The Append / MergeAppend case is certainly one place where pushing
LIMIT nodes down could be beneficial. Of course, if there was some
filter or join or aggregation/distinct, etc that occurred after the
Append/MergeAppend then you'd not be able to do this as the pushed
limit might be met before we've got enough rows at the top level of
the plan and that could result in fewer than rows being output than
what was requested in the query (aka wrong results).  Andy was working
around this area recently (see [1] and corresponding thread).  He had
to add a bunch of code that checked for operations that might mean
we'd need to read more than the tuple_fraction rows from the Append
node.  If we had nice way to know when building base rel paths if
there were or were not upper-level operations that affect if LIMIT
pushing can be done, then that might make such a patch cleaner.  Andy
in one of his proposed patches [2] added a field to PlannerInfo to
mark this.  That field wasn't well named or documented, so anything
you did would have to be an improvement on that.

Looking at your patch, I see you've solved this by delaying the
pushing down until the upper planner and just checking if the lower
planner (join search) produced an Append or MergeAppend path. I've not
really developed an opinion yet what's the best method. I feel
creating the correct paths up-front is likely more flexible and more
true to the path costing code.

It might also be worth taking a step backwards and seeing if there are
any other cases where we could push down LIMITs and try to see if
there's something more generic that could be built to do this in a way
that's beneficial to more cases. I can't think of any off the top of
my head, but I've not thought very hard about it.

FWIW, it's trivial to mock up the possible benefits of pushing LIMIT
nodes down with something like the following:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (0) to (1000000);
create table rp2 partition of rp for values from (1000000) to (2000000);
insert into rp select x from generate_series(0, 1999999)x;

-- limit not pushed:
explain analyze select * from rp order by a limit 10;
Execution Time: 148.041 ms

-- limit pushed (mocked):
explain analyze (select * from rp1 order by a limit 10) union all
(select * from rp2 order by a limit 10) limit 10;
Execution Time: 49.263 ms

about 3x faster for this case.

However, it may also be worth you reading over [3] and the ultimate
reason I changed my mind on that being a good idea. Pushing LIMITs
below an Append seems quite incomplete when we don't yet push sorts
below Appends, which is what that patch did.  I just was not
comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort
until executor shutdown.  That'll be done for rescan reasons, but it
does mean if you pushed Sort below Append that we could have a very
large number of sort nodes holding onto work_mem all at once.   I find
that a bit scary, especially so given the excessive partitioning cases
I've seen and worked on recently.  I did consider if we maybe could
adjust nodeSort.c to do tuplesort_end() after the final row. We'd need
to only do that if we could be somehow certain there were going to be
no rescans.  I don't have a plan on how that would be detected.

Anyway, I don't think anything above is that useful to push you
forward with that. I just didn't want you running off thinking we
don't want to see improvements in this area.  I certainly do.

David

[1] https://postgr.es/m/CAApHDvoMGyc+1eb8g5rEMUUMeGWhe2c_f8yvJjUO1kUHZj0h7w@mail.gmail.com
[2] https://postgr.es/m/CAKU4AWqaTNPrYcb_cMEDDYWZVGfFxuUjr75F5LBZwnUQ0JvVPw@mail.gmail.com
[3] https://postgr.es/m/CAApHDvojKdBR3MR59JXmaCYbyHB6Q_5qPRU+dy93En8wm+XiDA@mail.gmail.com



Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Andy Fan
Дата:


On Mon, Oct 9, 2023 at 8:52 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, 8 Oct 2023 at 18:32, Michał Kłeczek <michal@kleczek.org> wrote:
> On 8 Oct 2023, at 03:33, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> For the patches for performance improvement,  it is better to provide
>> an example to show how much benefits we can get.  As for this case,
>> I'm doubtful it can work as an improvement.

> Could you elaborate a little why you think it won’t work as an improvement?
> Is it because in practice LIMIT _is_ pushed down already during execution?
> From what I understand postgres_fdw does indeed fetch on demand.
> OTOH pushing down LIMIT is considered an improvement (as witnessed in the postgres_fdw code itself after d50d172e51)

In my opinion, analysis of where we can push LIMIT node deeper into
the plan is an interesting area for research and work.

Well,  really impressive on your feedback, always professional and 
PATIENT, just like what you helped me in pretty many areas for the
last N years. 

Yes, I think "analysis of where we can .."  is the key point.   SQL is 
an complex area because of its ever-changing,  so providing an 
example will be pretty helpful for communication.  However 
"doubtful" might not be a good word:(:(  
 

The Append / MergeAppend case is certainly one place where pushing
LIMIT nodes down could be beneficial. Of course, if there was some
filter or join or aggregation/distinct, etc that occurred after the
Append/MergeAppend then you'd not be able to do this as the pushed
limit might be met before we've got enough rows at the top level of
the plan and that could result in fewer than rows being output than
what was requested in the query (aka wrong results). 

I'm not totally agree with this,  my main idea came from Tom's reply 
at [1].  The best situation should be we know we should "plan for"
top-N rows,  but we don't need to really add the execution overhead. 
and in my current knowledge, in the pretty number of cases, we have
achieved this already.  If an area which is missed and can be shown 
within an example,  I would be pretty happy to change my mind. 

Andy was working
around this area recently (see [1] and corresponding thread).  He had
to add a bunch of code that checked for operations that might mean
we'd need to read more than the tuple_fraction rows from the Append
node.  If we had nice way to know when building base rel paths if
there were or were not upper-level operations that affect if LIMIT
pushing can be done, then that might make such a patch cleaner.  Andy
in one of his proposed patches [2] added a field to PlannerInfo to
mark this.  That field wasn't well named or documented, so anything
you did would have to be an improvement on that.

Looking at your patch, I see you've solved this by delaying the
pushing down until the upper planner and just checking if the lower
planner (join search) produced an Append or MergeAppend path. I've not
really developed an opinion yet what's the best method. I feel
creating the correct paths up-front is likely more flexible and more
true to the path costing code.

It might also be worth taking a step backwards and seeing if there are
any other cases where we could push down LIMITs and try to see if
there's something more generic that could be built to do this in a way
that's beneficial to more cases. I can't think of any off the top of
my head, but I've not thought very hard about it.

FWIW, it's trivial to mock up the possible benefits of pushing LIMIT
nodes down with something like the following:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (0) to (1000000);
create table rp2 partition of rp for values from (1000000) to (2000000);
insert into rp select x from generate_series(0, 1999999)x;

-- limit not pushed:
explain analyze select * from rp order by a limit 10;
Execution Time: 148.041 ms

-- limit pushed (mocked):
explain analyze (select * from rp1 order by a limit 10) union all
(select * from rp2 order by a limit 10) limit 10;
Execution Time: 49.263 ms 
 
about 3x faster for this case.

However, it may also be worth you reading over [3] and the ultimate
reason I changed my mind on that being a good idea. Pushing LIMITs
below an Append seems quite incomplete when we don't yet push sorts
below Appends, which is what that patch did.  I just was not
comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort
until executor shutdown.  That'll be done for rescan reasons, but it
does mean if you pushed Sort below Append that we could have a very
large number of sort nodes holding onto work_mem all at once.   I find
that a bit scary, especially so given the excessive partitioning cases
I've seen and worked on recently.  I did consider if we maybe could
adjust nodeSort.c to do tuplesort_end() after the final row. We'd need
to only do that if we could be somehow certain there were going to be
no rescans.  I don't have a plan on how that would be detected.

That patch looks interesting and the example there should be not 
uncommon in the real user case. I'd see if I can do anything useful. 


    
--
Best Regards
Andy Fan

Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Ashutosh Bapat
Дата:
On Mon, Oct 9, 2023 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> However, it may also be worth you reading over [3] and the ultimate
> reason I changed my mind on that being a good idea. Pushing LIMITs
> below an Append seems quite incomplete when we don't yet push sorts
> below Appends, which is what that patch did.

When the paths are already ordered according to ORDER BY
specification, pushing down LIMIT will give them extra benefit of
being cost effective. Do you think we can proceed along those lines?
Later when we implement Sorting push down we will adjust the LIMIT
pushdown code for the same.

> I just was not
> comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort
> until executor shutdown.  That'll be done for rescan reasons, but it
> does mean if you pushed Sort below Append that we could have a very
> large number of sort nodes holding onto work_mem all at once.   I find
> that a bit scary, especially so given the excessive partitioning cases
> I've seen and worked on recently.  I did consider if we maybe could
> adjust nodeSort.c to do tuplesort_end() after the final row. We'd need
> to only do that if we could be somehow certain there were going to be
> no rescans.  I don't have a plan on how that would be detected.

We have that problem with partitionwise join. Have you seen it in the
field? I have not seen such reports but that could be because not many
know the partitionwise join needs to be explicitly turned ON. The
solution we will develop here will solve problem with partitionwise
join as well. It's hard to solve this problem. If there's a real case
where LIMIT pushdown helps without fixing Sort pushdown case, it might
help proceeding with the same.

--
Best Wishes,
Ashutosh Bapat



Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
David Rowley
Дата:
On Mon, 9 Oct 2023 at 23:35, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Oct 9, 2023 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > However, it may also be worth you reading over [3] and the ultimate
> > reason I changed my mind on that being a good idea. Pushing LIMITs
> > below an Append seems quite incomplete when we don't yet push sorts
> > below Appends, which is what that patch did.
>
> When the paths are already ordered according to ORDER BY
> specification, pushing down LIMIT will give them extra benefit of
> being cost effective. Do you think we can proceed along those lines?
> Later when we implement Sorting push down we will adjust the LIMIT
> pushdown code for the same.

What are there benefits if the paths are already ordered?  e.g if it's
an index scan then we'll only pull the tuples we need from it.

I think if we did manage to get something working to push Sorts below
Appends then ExecSetTupleBound() would take care of most of the limit
problem (with the exception of FDWs).  If you look at
ExecSetTupleBound() you'll see that it does recursively descend into
AppendStates and set the bound on the append children.  That's not
going to work when the Sort is above the Append. So isn't it mostly
the work_mem * n_append_children concern that is holding us up here?

> We have that problem with partitionwise join. Have you seen it in the
> field? I have not seen such reports but that could be because not many
> know the partitionwise join needs to be explicitly turned ON. The
> solution we will develop here will solve problem with partitionwise
> join as well. It's hard to solve this problem. If there's a real case
> where LIMIT pushdown helps without fixing Sort pushdown case, it might
> help proceeding with the same.

I've not heard anything about that.  What I saw were just complaints
about the planner being too slow to produce a plan which accessed well
in excess of the number of partitions that we recommend in the
partitioning best practices documents.

David



Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Ashutosh Bapat
Дата:
On Mon, Oct 9, 2023 at 4:33 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 9 Oct 2023 at 23:35, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Mon, Oct 9, 2023 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > >
> > > However, it may also be worth you reading over [3] and the ultimate
> > > reason I changed my mind on that being a good idea. Pushing LIMITs
> > > below an Append seems quite incomplete when we don't yet push sorts
> > > below Appends, which is what that patch did.
> >
> > When the paths are already ordered according to ORDER BY
> > specification, pushing down LIMIT will give them extra benefit of
> > being cost effective. Do you think we can proceed along those lines?
> > Later when we implement Sorting push down we will adjust the LIMIT
> > pushdown code for the same.
>
> What are there benefits if the paths are already ordered?  e.g if it's
> an index scan then we'll only pull the tuples we need from it.
>

postgres_fdw creates a path with pathkeys based on the clauses on that
relation. There is no index involved there. Pushing down LIMIT will
limit the number of rows fetched from the foreign server and the
foreign server may have opportunity to optimize query based on the
LIMIT.

> > We have that problem with partitionwise join. Have you seen it in the> > field? I have not seen such reports but
thatcould be because not many 
> > know the partitionwise join needs to be explicitly turned ON. The
> > solution we will develop here will solve problem with partitionwise
> > join as well. It's hard to solve this problem. If there's a real case
> > where LIMIT pushdown helps without fixing Sort pushdown case, it might
> > help proceeding with the same.
>
> I've not heard anything about that.  What I saw were just complaints
> about the planner being too slow to produce a plan which accessed well
> in excess of the number of partitions that we recommend in the
> partitioning best practices documents.

I am not able to relate the planner slowness and work_mem * number of
partitions problems. I agree that both of them are problems but
different one.

--
Best Wishes,
Ashutosh Bapat



Re: Draft LIMIT pushdown to Append and MergeAppend patch

От
Michał Kłeczek
Дата:


On 9 Oct 2023, at 15:04, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

On Mon, Oct 9, 2023 at 4:33 PM David Rowley <dgrowleyml@gmail.com> wrote:

What are there benefits if the paths are already ordered?  e.g if it's
an index scan then we'll only pull the tuples we need from it.


postgres_fdw creates a path with pathkeys based on the clauses on that
relation. There is no index involved there. Pushing down LIMIT will
limit the number of rows fetched from the foreign server and the
foreign server may have opportunity to optimize query based on the
LIMIT.

I would add another benefit:
opportunity to fetch everything early, buffer it and release the session.

Without limit information fdw has to keep cursors open.


Michal