Re: sqlsmith crash incremental sort
От | Tomas Vondra |
---|---|
Тема | Re: sqlsmith crash incremental sort |
Дата | |
Msg-id | 20200422225919.ucs5xnlgnr5ekhod@development обсуждение исходный текст |
Ответ на | Re: sqlsmith crash incremental sort (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: sqlsmith crash incremental sort
|
Список | pgsql-hackers |
On Mon, Apr 20, 2020 at 01:47:29AM +0200, Tomas Vondra wrote: >On Sat, Apr 18, 2020 at 02:23:25PM -0400, James Coleman wrote: >>On Thu, Apr 16, 2020 at 9:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>> I think we have essentially three options: >>>> 1) assuming there's just a single group >>>> 2) assuming each row is a separate group >>>> 3) something in between >>>> If (1) and (2) are worst/best-case scenarios, maybe we should pick >>>> something in between. We have DEFAULT_NUM_DISTINCT (200) which >>>> essentially says "we don't know what the number of groups is" so maybe >>>> we should use that. >>> >>>I wouldn't recommend picking either the best or worst cases. >>> >>>Possibly DEFAULT_NUM_DISTINCT is a sane choice, though it's fair to >>>wonder if it's quite applicable to the case where we already know >>>we've grouped by some columns. >> >>Do you think defining a new default, say, >>DEFAULT_NUM_DISTINCT_PRESORTED is preferred then? And choose some >>value like "1/2 of the normal DEFAULT_NUM_DISTINCT groups" or some >>such? >> > >If we had a better intuition what a better value is, maybe. But I don't >think we have that at all, so I'd just use the existing one. > I've pushed fix with the DEFAULT_NUM_DISTINCT. The input comes from a set operation (which is where we call generate_append_tlist), so it's probably fairly unique, so maybe we should use input_tuples. But it's not guaranteed, so DEFAULT_NUM_DISTINCT seems reasonably defensive. One detail I've changed is that instead of matching the expression directly to a Var, it now calls pull_varnos() to also detect Vars somewhere deeper. Lookig at examine_variable() it calls find_base_rel for such case too, but I haven't tried constructing a query triggering the issue. One improvement I can think of is handling lists with only some expressions containing varno 0. We could still call estimate_num_groups for expressions with varno != 0, and multiply that by the estimate for the other part (be it DEFAULT_NUM_DISTINCT). This might produce a higher estimate than just using DEFAULT_NUM_DISTINCT directly, resulting in a lower incremenal sort cost. But it's not clear to me if this can even happen - AFAICS either all Vars have varno 0 or none, so I haven't done this. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: