Обсуждение: Avoid using list_delete_first in simplify_or/and_arguments
Hi hackers,
While trying to measure if there is any gain from the change as
discussed in [1], I happened to notice another place that is slowed down
by list_delete_first. I'm using the query as below:
(n=1000000;
printf "explain (summary on) select * from t where "
for ((i=1;i<$n;i++)); do printf "a = $i or "; done;
printf "a = $n;"
) | psql
And I notice that a large part of planning time is spent on the
list_delete_first calls inside simplify_or_arguments().
I think the issue here is clear and straightforward: list_delete_first
has an O(N) cost due to data movement. And I believe similar issue has
been discussed several times before.
I wonder if we can improve it by using list_delete_last instead, so I
tried the following change:
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3612,9 +3612,9 @@ simplify_or_arguments(List *args,
unprocessed_args = list_copy(args);
while (unprocessed_args)
{
- Node *arg = (Node *) linitial(unprocessed_args);
+ Node *arg = (Node *) llast(unprocessed_args);
- unprocessed_args = list_delete_first(unprocessed_args);
+ unprocessed_args = list_delete_last(unprocessed_args);
With this change, in my box the planning time for the query above is
reduced from 64257.784 ms to 1411.666 ms, a big improvement. The side
effect is that it results in a lot of plan diffs in regression tests,
but they are all about different order of OR arguments.
While trying to measure if there is any gain from the change as
discussed in [1], I happened to notice another place that is slowed down
by list_delete_first. I'm using the query as below:
(n=1000000;
printf "explain (summary on) select * from t where "
for ((i=1;i<$n;i++)); do printf "a = $i or "; done;
printf "a = $n;"
) | psql
And I notice that a large part of planning time is spent on the
list_delete_first calls inside simplify_or_arguments().
I think the issue here is clear and straightforward: list_delete_first
has an O(N) cost due to data movement. And I believe similar issue has
been discussed several times before.
I wonder if we can improve it by using list_delete_last instead, so I
tried the following change:
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3612,9 +3612,9 @@ simplify_or_arguments(List *args,
unprocessed_args = list_copy(args);
while (unprocessed_args)
{
- Node *arg = (Node *) linitial(unprocessed_args);
+ Node *arg = (Node *) llast(unprocessed_args);
- unprocessed_args = list_delete_first(unprocessed_args);
+ unprocessed_args = list_delete_last(unprocessed_args);
With this change, in my box the planning time for the query above is
reduced from 64257.784 ms to 1411.666 ms, a big improvement. The side
effect is that it results in a lot of plan diffs in regression tests,
but they are all about different order of OR arguments.
I believe simplify_and_arguments() can also benefit from similar
changes. But I'm not sure if we could have such a long AND/OR arguments
in real world. So is this worth doing?
[1] https://www.postgresql.org/message-id/CAMbWs4-RXhgz0i4O1z62gt%2BbTLTM5vXYyYhgnius0j_txLH7hg%40mail.gmail.com
Thanks
Richard
changes. But I'm not sure if we could have such a long AND/OR arguments
in real world. So is this worth doing?
[1] https://www.postgresql.org/message-id/CAMbWs4-RXhgz0i4O1z62gt%2BbTLTM5vXYyYhgnius0j_txLH7hg%40mail.gmail.com
Thanks
Richard