Re: adjust_limit_rows_costs algorithm
От | Heikki Linnakangas |
---|---|
Тема | Re: adjust_limit_rows_costs algorithm |
Дата | |
Msg-id | 843c6b1e-7c75-4383-9b0d-694d93b69d16@iki.fi обсуждение исходный текст |
Ответ на | adjust_limit_rows_costs algorithm (wenhui qiu <qiuwenhuifx@gmail.com>) |
Список | pgsql-hackers |
On 24/12/2024 08:20, wenhui qiu wrote: > sysbench=# explain analyze select userid from dba_users where username > like '%aaaaaaaaaaaa%' order by userid limit 1; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.43..408.59 rows=1 width=4) (actual > time=1433.469..1433.470 rows=0 loops=1) > -> Index Scan using dba_users_pkey on dba_users > (cost=0.43..244892.51 rows=600 width=4) (actual > time=1433.468..1433.468 rows=0 loops=1) > Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text) > Rows Removed by Filter: 6000000 > Planning Time: 0.384 ms > Execution Time: 1433.489 ms > (6 rows) The problem seems to be that the planner estimates the LIKE qual to match 600 rows, but in reality it matches 0. > I think the total_cost calculation method cannot be linear, because the > data distribution is not linear in reality. I try to change it to a > logarithmic Why? I don't see anything wrong with the LIMIT estimation here. One factor here is that the execution time of the Index Scan + Limit plan depends heavily on whether there are 0 matches, or more. With 0 matches, the Index Scans needs to scan the whole table. With 1 rows, it needs to scan just half of the table on average before finding the match. With 10 rows, just 10 % of the table, and so forth. So the execution time is indeed highly non-linear depending on whether there are any matches or not. Is that what you meant? It'd be nice to somehow take that non-linearity into account in the estimation. But I don't know how, and I don't think that change in LIMIT estimation is the right way. > The reality may be more complicated, I mean there is no better way for > the community leaders to solve this problem, because the community will > never say no to the hint, and there are so many such problems It's a difficult topic because "hints" mean different things to different people. Meanwhile, we have added things like "ALTER FUNCTION ... ROWS <rows>" and support functions which can do more accurate estimation of functions. That's a kind of a hint. If someone comes up with more ways to help the planner with selectivity estimation, it's worth considering. Or maybe you could improve the selectivity estimator of the LIKE operator to be more accurate to begin with. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: