Weird planner issue on a standby

Поиск
Список
Период
Сортировка
От Guillaume Lelarge
Тема Weird planner issue on a standby
Дата
Msg-id CAECtzeVPM4Oi6dTdqVQmjoLkDBVChNj7ed3hNs1RGrBbwCJ7Cw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Weird planner issue on a standby  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hello,

My customer has a really weird planner issue on a standby.

First, the context. There's a primary, and two standbys, all on a 11.8 release. (I know this isn't the latest release for the v11 branch.) The cluster is 5.5TB. The PostgreSQL configuration is the same on all servers. pg_db_role_setting is almost empty, and there's nothing specific to planning and replication.

Here is the issue. Queries on both standbys take a lot more time than on the primary. We eventually found that the queries take a lot of time to be planned, not to be executed. For example:

On the primary:
 Planning Time: 1806.574 ms
 Execution Time: 771.888 ms
On any of the standbys:
 Planning Time: 41566.439 ms
 Execution Time: 1125.352 ms
 
A simple EXPLAIN already shows this difference in duration.

The query in itself isn't really that big. 8 tables (without partitions), a few predicates. Nothing fancy. Nothing huge to plan. Here it is, a bit anonymized:

select ...
from t1
left outer join t2 on ...
left outer join t3 on ...
left outer join t4 on ...
left outer join t5 on ...
left outer join t6 on ...
left outer join t7 on ...
left outer join t8 on ...
where c1='...' and c2='...' 
  and c3>='...' and c4<='...' ;

Join conditions are really simple. There's no function called anywhere.

Plans on the three servers are exactly the same. Same nodes, same statistics, same calculations' results.

Moreover, we've looked into what the planner was doing with strace, and we've been surprised that it involved massive reading on tables (more than 130,000 calls to read() on 15 files). We found that most of these files are the files for one of the tables on this query. Also, this massive reading happens only on standbys, not on the primary.

Our customer took the time to drop and create the standbys yesterday night, but the new standbys still have the issue.

And finally, we discovered that promoting a standby brings back the old performance. We have the same performance between the primary and the promoted standby.

To be honest, I'm lost, I don't know what to do next. But I have some questions :)

* The planner seems to read tables to generate the plans and/or select the right plan. Why does it do that? I thought it only reads indexes if needed, but it is quite clear it reads tables also.
* How can the replication have an impact on the planner ? (I'm not sure I'm asking the right question... I mean, why do I see a difference in behaviour between a standby and an old standby, new autonomous server?)
* Do you have any idea on how to fix this?

Thank you.

Regards.


--
Guillaume.

В списке pgsql-general по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Different execution plan between PostgreSQL 8.4 and 12.11
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Weird planner issue on a standby