Обсуждение: Reject Foreign Tables from MIN/MAX indexscan Optimization?
Hackers,
In working on a foreign data wrapper, I noticed that a simple `MIN()` or `MAX()` query doesn’t push down, even though
theremote server supports it and it will push down for a more complicated query.
To demonstrate, I added this query to the postgres_fdw tests:
```
explain (verbose, costs off)
select min(c2) from ft4;
```
And the output:
```
QUERY PLAN
-----------------------------------------------------------------------------------
Result
Output: (InitPlan minmax_1).col1
Replaces: MinMaxAggregate
InitPlan minmax_1
-> Limit
Output: ft4.c2
-> Foreign Scan on public.ft4
Output: ft4.c2
Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST
(9 rows)
```
I don’t think this is what we want, but should let the remote server handle the optimization; besides the LIMIT isn’t
pusheddown!
The attached patch fixes the issue by disabling the MIN/MAX indexscan optimization for foreign tables. It simply does
```
/*
* Reject foreign tables. They have their own optimizations, so just let
* them have it.
*/
if (rte->relkind == RELKIND_FOREIGN_TABLE)
return;
```
With that change, the test out emits:
```
QUERY PLAN
-----------------------------------------------
Foreign Scan
Output: (min(c2))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c2) FROM "S 1"."T 3"
(4 rows)
```
Leaving the optimization up to the remote server.
What do you think?
Best,
David
Вложения
"David E. Wheeler" <david@justatheory.com> writes:
> In working on a foreign data wrapper, I noticed that a simple `MIN()` or `MAX()` query doesn’t push down, even though
theremote server supports it and it will push down for a more complicated query.
Are you sure this is bad? I would hope that it'd only choose such
a plan if the remote server has a supporting index, else cost
considerations should cause us to reject it. Yeah, it looks a little
odd to ship the query in this form rather than as a MIN()/MAX(),
but I'm unconvinced that it's harmful per se. Maybe there is a
costing problem that needs to be fixed, but if so we should fix it
not lobotomize the planner.
> The attached patch fixes the issue by disabling the MIN/MAX
> indexscan optimization for foreign tables.
I don't like that too much. The original intention of
preprocess_minmax_aggregates was that it would only create indexscan
plans, which would automatically eliminate FDW relations. Some of
the comments still say that :-(, but in reality we got rid of that
restriction back in 9.1 (see 034967bdc and 8df08c848) because we
realized that the optimization could be applicable even when the
underlying plan isn't a simple indexscan. I think that the same
applies here, and so we might be throwing away useful cases.
In particular, your "let the remote side do the optimization"
argument is only valid for postgres_fdw; other FDWs might not
have equivalent smarts underneath them.
It would be good to look under the hood a little more and understand
why the MinMaxAgg path is winning, if it's not actually fast.
regards, tom lane
On Jan 5, 2026, at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Are you sure this is bad? I would hope that it'd only choose such > a plan if the remote server has a supporting index, else cost > considerations should cause us to reject it. How does one tell it that there is a supporting index? I’m working with ClickHouse, where there are no indexes to speak of;this query relies on the inherent ordering of the columns. > Yeah, it looks a little > odd to ship the query in this form rather than as a MIN()/MAX(), > but I'm unconvinced that it's harmful per se. It would be less harmful if the LIMIT pushed down; otherwise it’s pulling the entire table back into Postgres, no? > Maybe there is a > costing problem that needs to be fixed, but if so we should fix it > not lobotomize the planner. I would love to better understand how that all works, esp. WRT foreign tables. With an analytics database like ClickHouse,though, we’re trying to push down as much as possible. > I don't like that too much. The original intention of > preprocess_minmax_aggregates was that it would only create indexscan > plans, which would automatically eliminate FDW relations. Some of > the comments still say that :-(, but in reality we got rid of that > restriction back in 9.1 (see 034967bdc and 8df08c848) because we > realized that the optimization could be applicable even when the > underlying plan isn't a simple indexscan. I think that the same > applies here, and so we might be throwing away useful cases. But it needs the LIMIT, no? > In particular, your "let the remote side do the optimization" > argument is only valid for postgres_fdw; other FDWs might not > have equivalent smarts underneath them. Okay, but it’d be great to have a way to tell it “don’t do that” for those that do. > It would be good to look under the hood a little more and understand > why the MinMaxAgg path is winning, if it's not actually fast. Where does one start? D
Вложения
"David E. Wheeler" <david@justatheory.com> writes:
> On Jan 5, 2026, at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, it looks a little
>> odd to ship the query in this form rather than as a MIN()/MAX(),
>> but I'm unconvinced that it's harmful per se.
> It would be less harmful if the LIMIT pushed down; otherwise it’s pulling the entire table back into Postgres, no?
Yeah, there might be some gold to be mined there. IIRC we already
have some ability for LIMIT to tell an immediately-below Sort that
only a limited number of rows will be retrieved. Could be useful
if ForeignScan could be in on that.
>> It would be good to look under the hood a little more and understand
>> why the MinMaxAgg path is winning, if it's not actually fast.
> Where does one start?
I made the attached test case (based on postgres_fdw.sql, but with
a few more rows in the table). The idea is to check the behavior
for both an indexed column (c1) and an unindexed one (c2). To
eliminate cacheing/startup effects, I ran each test case three times
but considered only the lowest execution time of the three (often
but not always the last run).
The first attached results are from current HEAD; the second are
after applying your patch. What I find interesting is that
with use_remote_estimate enabled, it always goes for shipping the
MIN as-is. Your patch changes the behavior with use_remote_estimate
disabled, and we can see that the reason is that that mode estimates
the query cost at 100.24 with the minmax optimization and 146.78 when
shipping the aggregate. I've not dug into where we are getting
those numbers without use_remote_estimate, but perhaps that could
use refinement. On the other hand, there is not that much difference
between the actual runtimes, so perhaps it doesn't matter for
postgres_fdw.
I can't speak to your results with ClickHouse, since I have no idea
how the cost estimates are derived for that.
regards, tom lane
CREATE EXTENSION postgres_fdw;
SELECT current_database() AS current_database,
current_setting('port') AS current_port
\gset
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname :'current_database', port :'current_port');
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
CREATE SCHEMA "S 1";
CREATE TABLE "S 1"."T 3" (
c1 int NOT NULL,
c2 int NOT NULL,
c3 text,
CONSTRAINT t3_pkey PRIMARY KEY (c1)
);
INSERT INTO "S 1"."T 3"
SELECT id,
id + 1,
'AAA' || to_char(id, 'FM000')
FROM generate_series(1, 10000) id;
ANALYZE "S 1"."T 3";
CREATE FOREIGN TABLE ft4 (
c1 int NOT NULL,
c2 int NOT NULL,
c3 text
) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
explain (analyze, verbose) select min(c1) from "S 1"."T 3";
explain (analyze, verbose) select min(c1) from "S 1"."T 3";
explain (analyze, verbose) select min(c1) from "S 1"."T 3";
explain (analyze, verbose) select min(c2) from "S 1"."T 3";
explain (analyze, verbose) select min(c2) from "S 1"."T 3";
explain (analyze, verbose) select min(c2) from "S 1"."T 3";
ALTER FOREIGN TABLE ft4 OPTIONS (use_remote_estimate 'false');
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
ALTER FOREIGN TABLE ft4 OPTIONS (SET use_remote_estimate 'true');
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.32..0.33 rows=1 width=4) (actual time=0.012..0.012 rows=1.00 loops=1)
Output: (InitPlan minmax_1).col1
Replaces: MinMaxAggregate
Buffers: shared hit=3
InitPlan minmax_1
-> Limit (cost=0.29..0.32 rows=1 width=4) (actual time=0.011..0.011 rows=1.00 loops=1)
Output: "T 3".c1
Buffers: shared hit=3
-> Index Only Scan using t3_pkey on "S 1"."T 3" (cost=0.29..328.29 rows=10000 width=4) (actual
time=0.011..0.011rows=1.00 loops=1)
Output: "T 3".c1
Heap Fetches: 1
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.027 ms
Execution Time: 0.019 ms
(15 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180.00..180.01 rows=1 width=4) (actual time=1.057..1.058 rows=1.00 loops=1)
Output: min(c2)
Buffers: shared hit=55
-> Seq Scan on "S 1"."T 3" (cost=0.00..155.00 rows=10000 width=4) (actual time=0.004..0.414 rows=10000.00 loops=1)
Output: c1, c2, c3
Buffers: shared hit=55
Planning Time: 0.026 ms
Execution Time: 1.065 ms
(8 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Result (cost=100.23..100.24 rows=1 width=4) (actual time=0.145..0.145 rows=1.00 loops=1)
Output: (InitPlan minmax_1).col1
Replaces: MinMaxAggregate
InitPlan minmax_1
-> Limit (cost=100.00..100.23 rows=1 width=4) (actual time=0.144..0.144 rows=1.00 loops=1)
Output: ft4.c1
-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual time=0.144..0.144 rows=1.00
loops=1)
Output: ft4.c1
Remote SQL: SELECT c1 FROM "S 1"."T 3" ORDER BY c1 ASC NULLS LAST
Planning Time: 0.030 ms
Execution Time: 0.207 ms
(11 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Result (cost=100.23..100.24 rows=1 width=4) (actual time=0.832..0.832 rows=1.00 loops=1)
Output: (InitPlan minmax_1).col1
Replaces: MinMaxAggregate
InitPlan minmax_1
-> Limit (cost=100.00..100.23 rows=1 width=4) (actual time=0.831..0.831 rows=1.00 loops=1)
Output: ft4.c2
-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual time=0.831..0.831 rows=1.00
loops=1)
Output: ft4.c2
Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST
Planning Time: 0.029 ms
Execution Time: 0.900 ms
(11 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.32..100.54 rows=1 width=4) (actual time=0.130..0.130 rows=1.00 loops=1)
Output: (min(c1))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c1) FROM "S 1"."T 3"
Planning Time: 0.315 ms
Execution Time: 0.163 ms
(6 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Foreign Scan (cost=280.00..280.22 rows=1 width=4) (actual time=0.803..0.803 rows=1.00 loops=1)
Output: (min(c2))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c2) FROM "S 1"."T 3"
Planning Time: 0.316 ms
Execution Time: 0.837 ms
(6 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.32..0.33 rows=1 width=4) (actual time=0.012..0.012 rows=1.00 loops=1)
Output: (InitPlan minmax_1).col1
Replaces: MinMaxAggregate
Buffers: shared hit=3
InitPlan minmax_1
-> Limit (cost=0.29..0.32 rows=1 width=4) (actual time=0.011..0.011 rows=1.00 loops=1)
Output: "T 3".c1
Buffers: shared hit=3
-> Index Only Scan using t3_pkey on "S 1"."T 3" (cost=0.29..328.29 rows=10000 width=4) (actual
time=0.011..0.011rows=1.00 loops=1)
Output: "T 3".c1
Heap Fetches: 1
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.026 ms
Execution Time: 0.019 ms
(15 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180.00..180.01 rows=1 width=4) (actual time=1.054..1.054 rows=1.00 loops=1)
Output: min(c2)
Buffers: shared hit=55
-> Seq Scan on "S 1"."T 3" (cost=0.00..155.00 rows=10000 width=4) (actual time=0.004..0.412 rows=10000.00 loops=1)
Output: c1, c2, c3
Buffers: shared hit=55
Planning Time: 0.026 ms
Execution Time: 1.061 ms
(8 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Foreign Scan (cost=107.31..146.78 rows=1 width=4) (actual time=0.127..0.127 rows=1.00 loops=1)
Output: (min(c1))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c1) FROM "S 1"."T 3"
Planning Time: 0.017 ms
Execution Time: 0.187 ms
(6 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Foreign Scan (cost=107.31..146.78 rows=1 width=4) (actual time=0.795..0.796 rows=1.00 loops=1)
Output: (min(c2))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c2) FROM "S 1"."T 3"
Planning Time: 0.017 ms
Execution Time: 0.859 ms
(6 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.32..100.54 rows=1 width=4) (actual time=0.124..0.124 rows=1.00 loops=1)
Output: (min(c1))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c1) FROM "S 1"."T 3"
Planning Time: 0.186 ms
Execution Time: 0.157 ms
(6 rows)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Foreign Scan (cost=280.00..280.22 rows=1 width=4) (actual time=0.801..0.802 rows=1.00 loops=1)
Output: (min(c2))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c2) FROM "S 1"."T 3"
Planning Time: 0.209 ms
Execution Time: 0.836 ms
(6 rows)
On Jan 5, 2026, at 23:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, there might be some gold to be mined there. IIRC we already
> have some ability for LIMIT to tell an immediately-below Sort that
> only a limited number of rows will be retrieved. Could be useful
> if ForeignScan could be in on that.
I’m surprised the limit part of the plan isn’t there already. Is that something that needs to be tweaked in the paths
passedto the FDW?
> The first attached results are from current HEAD; the second are
> after applying your patch. What I find interesting is that
> with use_remote_estimate enabled, it always goes for shipping the
> MIN as-is. Your patch changes the behavior with use_remote_estimate
> disabled, and we can see that the reason is that that mode estimates
> the query cost at 100.24 with the minmax optimization and 146.78 when
> shipping the aggregate. I've not dug into where we are getting
> those numbers without use_remote_estimate, but perhaps that could
> use refinement.
Odd. They should be the same, no?
> On the other hand, there is not that much difference
> between the actual runtimes, so perhaps it doesn't matter for
> postgres_fdw.
Maybe not with a loopback connection, but some of these queries send all 10000 values over the network, no? That could
bea lot of additional latency in some situations. ISTM that the code should either:
1. Push down the `LIMIT 1`; or
2. Push down the min/max and let the optimization happen remotely
> I can't speak to your results with ClickHouse, since I have no idea
> how the cost estimates are derived for that.
ClickHouse is column-oriented, and all the columns are sorted, so min/max optimizations are completely different and
neverrequire an index scan or table scan: they just read the first or last value in the relevant column. This is why I
startedlooking into this in the first place.
But your finding of the difference between use_remote_estimate true and false helped me to figure out the right lever
topull. pg_clickhouse has some remnants of use_remote_estimate from forking postgres_fdw back in 2019; they don’t do
anything,but it led me to a function that sets more-or-less hard-coded estimates. In `estimate_path_cost_size()` I
changed:
p_total_cost = 5.0 + coef
To
p_total_cost = 0.1 + coef
And that makes the min and max push down. Of course this impacts *all* pg_clickhouse queries, but that’s mainly what we
wantfor analytics-oriented workloads.
Anyway, some comments on plans generated by your testing:
> explain (analyze, verbose) select min(c1) from "S 1"."T 3";
> explain (analyze, verbose) select min(c1) from "S 1"."T 3";
> explain (analyze, verbose) select min(c1) from "S 1"."T 3";
Index-only scan, 1 row (though estimates 10000 for some reason).
> explain (analyze, verbose) select min(c2) from "S 1"."T 3";
> explain (analyze, verbose) select min(c2) from "S 1"."T 3";
> explain (analyze, verbose) select min(c2) from "S 1"."T 3";
Table scan, 10000 rows, as expected.
> ALTER FOREIGN TABLE ft4 OPTIONS (use_remote_estimate 'false');
>
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
The Foreign Scan here confuses me:
-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual time=0.144..0.144 rows=1.00 loops=1)
Output: ft4.c1
Remote SQL: SELECT c1 FROM "S 1"."T 3" ORDER BY c1 ASC NULLS LAST
How does it result in only one row? Does that mean it gets all of them back from the seq scan but only reads the first?
Arethey still delivered over the network and buffered? Seems like it would be good to have `LIMIT 1` pushed down.
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
Same confusion about the foreign scan plan:
-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual time=0.831..0.831 rows=1.00 loops=1)
Output: ft4.c2
Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST
But also its execution time is 0.900 ms, which implies a remote full table scan. This makes sense, and `LIMIT 1` would
reducethe network overhead, but not the remote execution time.
> ALTER FOREIGN TABLE ft4 OPTIONS (SET use_remote_estimate 'true');
>
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
Interesting that this configuration pushes down min/max, but of course then the same optimizations happen remotely as
locally,but now we know only one row is returned, right?
My patch doesn’t change this much, because the remote optimization still takes place, it’s just that in that case it
neverreturns more than a single row. Setting `LIMIT` pushdown aside, I guess the question is, how ought an FDW author
toproperly tweak cost estimates to encourage or discourage the min/max optimization?
Best,
David