GEQO plans much slower than standard join plans
| От | Carlo Sganzerla |
|---|---|
| Тема | GEQO plans much slower than standard join plans |
| Дата | |
| Msg-id | CACWDQNX+mFG35dL+CP3BeXRjozJr8HKe3O=_ZPwSTaGQOHoizA@mail.gmail.com обсуждение исходный текст |
| Ответы |
Re: GEQO plans much slower than standard join plans
|
| Список | pgsql-performance |
Hello experts,
Recently, we've been encountering poor query plans on our database workload. We've successfully diagnosed the cause as reaching the default limit of 8 for join_collapse_limit.
We've already dealt with the situation successfully. However, I've been discussing with some colleagues the possibility of raising the join_collapse_limit parameter value (and from_collapse_limit with it) in our production environment. I was trying to determine what would be a sensible new value when I stumbled upon this thread on OLTP Star Joins: https://www.postgresql.org/message-id/1ea167aa-457d-422a-8422-b025bb660ef3%40vondra.me
Our data model is hierarchical, so we rather have child tables instead of the dimension tables mentioned above. I created a similar script as the one found on the OLTP Star Join thread to test the effects of different values of join_collapse_limit locally on an hierarchical model which is more similar to ours. I ran pg_bench with 30 threads and 30 clients for 5 seconds. The following results were obtained on my machine (AMD Ryzen 7 5700U with 8 cores and 32 GB RAM, PostgreSQL 17.5 (Debian 17.5-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit):
OLTP starjoin (10 dimensions)
- join_collapse_limit = 1: 6.4k TPS
- join_collapse_limit = 8: 2.8k TPS
- join_collapse_limit = 11: 400 TPS
Tree join (10 levels)
- join_collapse_limit = 1: 7.4k TPS
- join_collapse_limit = 8: 4.7k TPS
- join_collapse_limit = 11: 3.9k TPS
Tree join (14 levels)
- join_collapse_limit = 1: 5.1k TPS
- join_collapse_limit = 8: 3.4k TPS
- join_collapse_limit = 11: 3k TPS
- join_collapse_limit = 14, geqo = off: 2.2k TPS
- join_collapse_limit = 14, geqo = on: 200 TPS
Note: geqo_threshold was unchanged from the default (12)
I assume that the reason why the hierarchical "tree join" is much faster is due to the dependencies among tables, so the standard join search has a much narrower range of possible query paths compared to the OLTP Star Join case. What surprised me, however, is that when GEQO is turned on, the TPS falls dramatically. Given that the documentation states that GEQO "... reduces planning time for complex queries (those joining many relations), at the cost of producing plans that are sometimes inferior to those found by the normal exhaustive-search algorithm", it made me wonder what could be the cause of this much slower planning. I'm not really familiar with genetic algorithms, so perhaps I might be missing something, but is this kind of planning performance hit normal when GEQO is on? I was hoping someone could help us on this topic.
Regards,
Carlo
Recently, we've been encountering poor query plans on our database workload. We've successfully diagnosed the cause as reaching the default limit of 8 for join_collapse_limit.
We've already dealt with the situation successfully. However, I've been discussing with some colleagues the possibility of raising the join_collapse_limit parameter value (and from_collapse_limit with it) in our production environment. I was trying to determine what would be a sensible new value when I stumbled upon this thread on OLTP Star Joins: https://www.postgresql.org/message-id/1ea167aa-457d-422a-8422-b025bb660ef3%40vondra.me
Our data model is hierarchical, so we rather have child tables instead of the dimension tables mentioned above. I created a similar script as the one found on the OLTP Star Join thread to test the effects of different values of join_collapse_limit locally on an hierarchical model which is more similar to ours. I ran pg_bench with 30 threads and 30 clients for 5 seconds. The following results were obtained on my machine (AMD Ryzen 7 5700U with 8 cores and 32 GB RAM, PostgreSQL 17.5 (Debian 17.5-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit):
OLTP starjoin (10 dimensions)
- join_collapse_limit = 1: 6.4k TPS
- join_collapse_limit = 8: 2.8k TPS
- join_collapse_limit = 11: 400 TPS
Tree join (10 levels)
- join_collapse_limit = 1: 7.4k TPS
- join_collapse_limit = 8: 4.7k TPS
- join_collapse_limit = 11: 3.9k TPS
Tree join (14 levels)
- join_collapse_limit = 1: 5.1k TPS
- join_collapse_limit = 8: 3.4k TPS
- join_collapse_limit = 11: 3k TPS
- join_collapse_limit = 14, geqo = off: 2.2k TPS
- join_collapse_limit = 14, geqo = on: 200 TPS
Note: geqo_threshold was unchanged from the default (12)
I assume that the reason why the hierarchical "tree join" is much faster is due to the dependencies among tables, so the standard join search has a much narrower range of possible query paths compared to the OLTP Star Join case. What surprised me, however, is that when GEQO is turned on, the TPS falls dramatically. Given that the documentation states that GEQO "... reduces planning time for complex queries (those joining many relations), at the cost of producing plans that are sometimes inferior to those found by the normal exhaustive-search algorithm", it made me wonder what could be the cause of this much slower planning. I'm not really familiar with genetic algorithms, so perhaps I might be missing something, but is this kind of planning performance hit normal when GEQO is on? I was hoping someone could help us on this topic.
Regards,
Carlo
Вложения
В списке pgsql-performance по дате отправления: