estimation problems for DISTINCT ON with FDW

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	estimation problems for DISTINCT ON with FDW
Дата	29 июня 2020 г. 06:23:04
Msg-id	CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com обсуждение исходный текст
Ответы	Re: estimation problems for DISTINCT ON with FDW (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) Re: estimation problems for DISTINCT ON with FDW (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

If I use the attached sql file to set up the database with loop-back postgres_fdw, and then turn on use_remote_estimate for this query:

distinct on (id) id, z from fgn.priority order by id, priority desc,z

It issues two queries for the foreign estimate, one with a sort and one without:

EXPLAIN SELECT id, priority, z FROM public.priority

EXPLAIN SELECT id, priority, z FROM public.priority ORDER BY id ASC NULLS LAST, priority DESC NULLS FIRST, z ASC NULLS LAST

It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign side, which is probably the best way to run the query. I guess it makes sense that FDW machinery in general doesn't want to try to push a PostgreSQL specific construct.

But much worse than that, it horribly misestmates the number of unique rows it will get back, having never asked the remote side for an estimate of that.

Result (cost=100.51..88635.90 rows=1 width=16)
-> Unique (cost=100.51..88635.90 rows=1 width=16)
-> Foreign Scan on priority (cost=100.51..86135.90 rows=1000000 width=16)

Where does it come up with the idea that these 1,000,000 rows will DISTINCT/Unique down to just 1 row? I can't find the place in the code where that happens. I suspect it is happening somewhere in the core code based on data fed into it by postgres_fdw, not in postgres_fdw itself.

This leads to horrible plans if the DISTINCT ON is actually in a subquery which is joined to other tables, for example.

If you don't use the remote estimates, it at least comes up with a roughly sane estimate of 200 distinct rows, which is enough to inhibit selection of the worst plans. Why does an uninformative remote estimate do so much worse than no remote estimate at all?

Of course I could just disable remote estimates for this table, but then other queries that use the table without DISTINCT ON suffer. Another solution is to ANALYZE the foreign table, but that opens up a can of worms of its own.

I see this behavior in all supported or in-development versions.

Cheers,

Jeff

Вложения

distinct_on_foreign_estimate.sql

В списке pgsql-hackers по дате отправления:

Предыдущее

От: torikoshia
Дата: 29 июня 2020 г., 06:01:55
Сообщение: Re: Creating a function for exposing memory usage of backend process

Следующее

От: Amit Kapila
Дата: 29 июня 2020 г., 06:34:25
Сообщение: Re: pgsql: Enable Unix-domain sockets support on Windows

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

estimation problems for DISTINCT ON with FDW

Вложения

Предыдущее

Следующее