Обсуждение: PostgreSQL optimizer use seq scan instead of pkey index only scan (inqueries with postgres_fdw)

Поиск
Список
Период
Сортировка

PostgreSQL optimizer use seq scan instead of pkey index only scan (inqueries with postgres_fdw)

От
Vitaly Baranovsky
Дата:
Hello all,

I faced strange behavior of PostgreSQL during the query execution.

So, I have to databases: local and foreign. There are foreign server definitions in the local database (via postgres_fdw). The local database has table 'local_table'. The foreign database has table 'foreign_table'. Both of them have only 1 column: 'primary_uuid'. This column in both databases is a primary key column. Schema on a local server that stores remote server definitions is 'foreign_server'. Each table has 100K rows. Vacuum analyze has been run for both servers.

When I run a query:
SELECT *
FROM 
(
SELECT foreign_table.primary_uuid
           FROM foreign_server.foreign_table
        UNION ALL
         SELECT local_table.primary_uuid
           FROM local_table
)
join_view
WHERE 
 join_view.primary_uuid in (select '19b2db7e-db89-48eb-90b1-0bd468a2346b'::uuid)

I expect that the server will use the pkey index for the local table. But it uses seq scan instead!

"Hash Semi Join  (cost=100.03..3346.23 rows=51024 width=16) (actual time=482.235..482.235 rows=0 loops=1)"
"  Output: foreign_table.primary_uuid"
"  Hash Cond: (foreign_table.primary_uuid = ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid))"
"  ->  Append  (cost=100.00..2510.68 rows=102048 width=16) (actual time=0.529..463.563 rows=200000 loops=1)"
"        ->  Foreign Scan on foreign_server.foreign_table  (cost=100.00..171.44 rows=2048 width=16) (actual time=0.528..446.715 rows=100000 loops=1)"
"              Output: foreign_table.primary_uuid"
"              Remote SQL: SELECT primary_uuid FROM public.foreign_table"
"        ->  Seq Scan on public.local_table  (cost=0.00..1829.00 rows=100000 width=16) (actual time=0.021..6.358 rows=100000 loops=1)"
"              Output: local_table.primary_uuid"
"  ->  Hash  (cost=0.02..0.02 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=1)"
"        Output: ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid)"
"        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"        ->  Result  (cost=0.00..0.01 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=1)"
"              Output: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"Planning Time: 0.126 ms"
"Execution Time: 482.572 ms""Execution Time: 509.315 ms"

So, as you can see, the execution time is 509 ms! It could be very fast if PostgreSQL used primary key index!

Also, please, note, that SQL without WHERE clause has been set to the foreign server:
"              Remote SQL: SELECT primary_uuid FROM public.foreign_table"

So, the optimizer doesn't select optimal plans for such executions :(

Looks like it's an optimizer inadequacy.

Does someone know, how to optimize this query without query rewriting (queries like this are generated from the Data Access layer and it's hard to rebuild that layer)? 

Thank you

P.S.: Answers to standard questions:
> PostgreSQL version number you are running:
PostgreSQL 11.2, compiled by Visual C++ build 1914, 64-bit

> How you installed PostgreSQL:
By downloaded standard Windows 64 installer

> Changes made to the settings in the postgresql.conf file: shared_preload_libraries = '$libdir/pg_stat_statements'

> Operating system and version:
Windows 10 Enterprise 64-bit

> What program you're using to connect to PostgreSQL:
pgAdmin III
 
> Is there anything relevant or unusual in the PostgreSQL server logs?:
Nope


P.P.S.: DDL scripts:
for the foreign database:
CREATE TABLE public.foreign_table
(
  primary_uuid uuid NOT NULL,
  CONSTRAINT "PKEY" PRIMARY KEY (primary_uuid)
)

for local database:
CREATE TABLE public.local_table
(
  primary_uuid uuid NOT NULL,
  CONSTRAINT local_table_pkey PRIMARY KEY (primary_uuid)
)

On Mon, May 6, 2019 at 10:44 AM Vitaly Baranovsky <barvetalforums@gmail.com> wrote:
Hello all,

I faced strange behavior of PostgreSQL during the query execution.

 ...
 
Also, please, note, that SQL without WHERE clause has been set to the foreign server:
"              Remote SQL: SELECT primary_uuid FROM public.foreign_table"

So, the optimizer doesn't select optimal plans for such executions :(

It works the way you want in version 12, which is currently under development and should be released in 5 months or so.

Cheers,

Jeff

Re: PostgreSQL optimizer use seq scan instead of pkey index only scan(in queries with postgres_fdw)

От
Vitaly Baranovsky
Дата:
Thank you, Jeff!

We'll be looking forward to the next version of Postgres in this case.

As far as I understand, you've answered about sending filtering condition to a foreign server... Could you, please, clarify about another (the first) part of my question? Why the server choose seq scan instead of pk key index only scan for the local table? 

Thank you

On Mon, May 6, 2019 at 6:32 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, May 6, 2019 at 10:44 AM Vitaly Baranovsky <barvetalforums@gmail.com> wrote:
Hello all,

I faced strange behavior of PostgreSQL during the query execution.

 ...
 
Also, please, note, that SQL without WHERE clause has been set to the foreign server:
"              Remote SQL: SELECT primary_uuid FROM public.foreign_table"

So, the optimizer doesn't select optimal plans for such executions :(

It works the way you want in version 12, which is currently under development and should be released in 5 months or so.

Cheers,

Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
> It works the way you want in version 12, which is currently under
> development and should be released in 5 months or so.

Even in older versions, the OP would get a significantly smarter
plan after setting use_remote_estimate = on.  I think the core
issue here is that we won't generate remote parameterized paths
without that:

    /*
     * If we're not using remote estimates, stop here.  We have no way to
     * estimate whether any join clauses would be worth sending across, so
     * don't bother building parameterized paths.
     */
    if (!fpinfo->use_remote_estimate)
        return;

            regards, tom lane



On Mon, May 6, 2019 at 11:38 AM Vitaly Baranovsky <barvetalforums@gmail.com> wrote:
Thank you, Jeff!

We'll be looking forward to the next version of Postgres in this case.

As far as I understand, you've answered about sending filtering condition to a foreign server... Could you, please, clarify about another (the first) part of my question? Why the server choose seq scan instead of pk key index only scan for the local table? 

Thank you


Aren't those the same thing?  The foreign server can't use the where clause, if it doesn't get sent. 

Cheers,

Jeff

Re: PostgreSQL optimizer use seq scan instead of pkey index only scan(in queries with postgres_fdw)

От
Vitaly Baranovsky
Дата:
Ough, I believed I had use_remote_estimate = true in my database, but it was false :(

With use_remote_estimate = true everything works well!

Here is explain analyze with use_remote_estimate = true:
"Nested Loop  (cost=100.45..108.97 rows=100000 width=16) (actual time=1.037..1.037 rows=0 loops=1)"
"  Output: foreign_table.primary_uuid"
"  ->  HashAggregate  (cost=0.02..0.03 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=1)"
"        Output: ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid)"
"        Group Key: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"        ->  Result  (cost=0.00..0.01 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=1)"
"              Output: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"  ->  Append  (cost=100.43..108.92 rows=2 width=16) (actual time=1.032..1.032 rows=0 loops=1)"
"        ->  Foreign Scan on foreign_server.foreign_table  (cost=100.43..104.47 rows=1 width=16) (actual time=0.994..0.994 rows=0 loops=1)"
"              Output: foreign_table.primary_uuid"
"              Remote SQL: SELECT primary_uuid FROM public.foreign_table WHERE (($1::uuid = primary_uuid))"
"        ->  Index Only Scan using local_table_pkey on public.local_table  (cost=0.42..4.44 rows=1 width=16) (actual time=0.035..0.035 rows=0 loops=1)"
"              Output: local_table.primary_uuid"
"              Index Cond: (local_table.primary_uuid = ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid))"
"              Heap Fetches: 0"
"Planning Time: 100.619 ms"
"Execution Time: 1.243 ms"

I tried this with use_remote_estimate = true for different real queries with a lot of joins and everything works well!

On Mon, May 6, 2019 at 6:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> It works the way you want in version 12, which is currently under
> development and should be released in 5 months or so.

Even in older versions, the OP would get a significantly smarter
plan after setting use_remote_estimate = on.  I think the core
issue here is that we won't generate remote parameterized paths
without that:

        /*
         * If we're not using remote estimates, stop here.  We have no way to
         * estimate whether any join clauses would be worth sending across, so
         * don't bother building parameterized paths.
         */
        if (!fpinfo->use_remote_estimate)
                return;

                        regards, tom lane
On Mon, May 6, 2019 at 11:53 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, May 6, 2019 at 11:38 AM Vitaly Baranovsky <barvetalforums@gmail.com> wrote:
Thank you, Jeff!

We'll be looking forward to the next version of Postgres in this case.

As far as I understand, you've answered about sending filtering condition to a foreign server... Could you, please, clarify about another (the first) part of my question? Why the server choose seq scan instead of pk key index only scan for the local table? 

Thank you


Aren't those the same thing?  The foreign server can't use the where clause, if it doesn't get sent. 

Nevermind. When you said local table, I had some tunnel vision and was thinking of the foreign table as viewed from the perspective of the foreign server (to which it is local), not the actual local table.   That too is "fixed" in the same commit to the 12dev branch as the other issue is: 

commit 4be058fe9ec5e630239b656af21fc083371f30ed
Date:   Mon Jan 28 17:54:10 2019 -0500

    In the planner, replace an empty FROM clause with a dummy RTE.
 

My tests are all done with empty, unanalyzed tables as I just took you DDL without inventing my own DML, so may be different than what what you were seeing with your populated tables.

Cheers,

Jeff