Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id d207e075-9fb3-3a95-7811-8e0ab5292b2a@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers

On 3/16/19 11:55 AM, Dean Rasheed wrote:
> On Fri, 15 Mar 2019 at 00:06, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I've noticed an annoying thing when modifying type of column not
>> included in a statistics...
>>
>> That is, we don't remove the statistics, but the estimate still changes.
>> But that's because the ALTER TABLE also resets reltuples/relpages:
>>
>> That's a bit unfortunate, and it kinda makes the whole effort to not
>> drop the statistics unnecessarily kinda pointless :-(
>>
> 
> Well not entirely. Repeating that test with 100,000 rows, I get an
> initial estimate of 9850 (actual 10,000), which then drops to 2451
> after altering the column. But if you drop the dependency statistics,
> the estimate drops to 241, so clearly there is some benefit in keeping
> them in that case.
> 

Sure. What I meant is that to correct the relpages/reltuples estimates
you need to do ANALYZE, which rebuilds the statistics anyway. Although
VACUUM also fixes the estimates, without the stats rebuild.

> Besides, I thought there was no extra effort in keeping the extended
> statistics in this case -- isn't it just using the column
> dependencies, so in this case UpdateStatisticsForTypeChange() never
> gets called anyway?
> 

Yes, it does not get called at all. My point was that I was a little bit
confused because the test says "check change of unrelated column type
does not reset the MCV statistics" yet the estimates do actually change.

I wonder why we reset the relpages/reltuples to 0, instead of retaining
the original values, though. That would likely give us better density
estimates in estimate_rel_size, I think.

So I've tried doing that, and I've included it as 0001 into the patch
series. It seems to work, but I suppose the reset is there for a reason.
In any case, this is a preexisting issue, independent of what this patch
does or changes.

I've discovered another issue, though. Currently, clauselist_selectivity
has this as the very beginning:

    /*
     * If there's exactly one clause, just go directly to
     * clause_selectivity(). None of what we might do below is relevant.
     */
    if (list_length(clauses) == 1)
        return clause_selectivity(root, (Node *) linitial(clauses),
                                  varRelid, jointype, sjinfo);

Which however fails with queries like this:

    WHERE (a = 1 OR b = 1)

because clauselist_selectivity sees it as a single clause, passes it to
clause_selectivity and the OR-clause handling simply relies on

    (s1 + s2 - s1 * s2)

which entirely ignores the multivariate stats. The other similar places
in clause_selectivity() simply call clauselist_selectivity() so that's
OK, but OR-clauses don't do that.

For functional dependencies this is not a huge issue because those apply
only to AND-clauses. But there were proposals to maybe apply them to
other types of clauses, in which case it might become issue.

I think the best fix is moving the optimization after the multivariate
stats are applied. The only alternative I can think of is modifying
clauselist_selectivity so that it can be executed on OR-clauses. But
that seems much more complicated than the former option for almost no
other advantages.

I've also changed how statext_is_compatible_clause_internal() handles
the attnums bitmapset - you were right in your 3/10 message that we can
just pass the value, without creating a local bitmapset. So I've just
done that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
Следующее
От: Tom Lane
Дата:
Сообщение: Unduly short fuse in RequestCheckpoint