Re: Incremental Sort Cost Estimation Instability
От | Alexander Korotkov |
---|---|
Тема | Re: Incremental Sort Cost Estimation Instability |
Дата | |
Msg-id | CAPpHfduossatgSzBzTm-ENnzhiGEj-xE=pk8MK79XR2cm9=vDg@mail.gmail.com обсуждение исходный текст |
Ответ на | Incremental Sort Cost Estimation Instability (Andrei Lepikhov <lepihov@gmail.com>) |
Ответы |
Re: Incremental Sort Cost Estimation Instability
|
Список | pgsql-hackers |
Hi, Andrei! On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > On 9/12/24 16:57, Tomas Vondra wrote: > > On 9/12/24 12:12, David Rowley wrote: > >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov@gmail.com> wrote: > >>> Initial problem causes wrong cost_sort estimation. Right now I think > >>> about providing cost_sort() the sort clauses instead of (or in addition > >>> to) the pathkeys. > >> > >> I'm not quite sure why the sort clauses matter any more than the > >> EquivalenceClass. If the EquivalanceClass defines that all members > >> will have the same value for any given row, then, if we had to choose > >> any single member to drive the n_distinct estimate from, isn't the > >> most accurate distinct estimate from the member with the smallest > >> n_distinct estimate? (That assumes the less distinct member has every > >> value the more distinct member has, which might not be true) > > I'm not sure how to fix this, but it seems estimate_num_groups() needs > > to do things differently. And I agree looking for the minimum ndistinct > > seems like the right approach. but doesn't estimate_num_groups() > > supposed to already do that? The comment says: > Hi, > > I have rewritten the code according to the idea of the > estimate_num_groups responsibility to adjust estimation according to the EC. > I haven't changed all the places of ngroups estimation - only where the > planner can access pathkeys to avoid the overhead of passing through all > the ECs. And added some cache in the EM. > The most important case for me here is GROUP-BY estimation and > create_unique_path because I frequently see them as sides of a JOIN. > It would be also nice to adjust the cost of memoize rescan, but it may > be a subject of the future improvement. Thank you for your work on this subject. I've the following notes about the last patch. 1. Now groupExprs argument of estimate_num_groups() might contain either Expr's or Pathkey's. I think this should be documented. 2. A new argument relids of estimate_num_groups() also should be documented. 3. I doubt estimate_num_groups() is appropriate place to handle "varno 0". Should we do this in cost_incremental_sort() as before? I see that it currently analyzes only first EquivalenceMember. But could it be a different answers between first EquivalenceMember and others? ------ Regards, Alexander Korotkov Supabase
В списке pgsql-hackers по дате отправления: