Re: extended stats on partitioned tables

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: extended stats on partitioned tables
Дата
Msg-id 689accb1-76e8-5115-6ed0-336749cdca57@enterprisedb.com
обсуждение исходный текст
Ответ на extended stats on partitioned tables  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers

On 11/4/21 12:19 AM, Justin Pryzby wrote:
> On Wed, Nov 03, 2021 at 11:48:44PM +0100, Tomas Vondra wrote:
>> On 10/8/21 12:45 AM, Justin Pryzby wrote:
>>> On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:
>>>> On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
>>>>> On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:
>>>>>> It seems like your patch should also check "inh" in examine_variable and
>>>>>> statext_expressions_load.
>>>>>
>>>>> I tried adding that - I mostly kept my patches separate.
>>>>> Hopefully this is more helpful than a complication.
>>>>> I added at: https://commitfest.postgresql.org/35/3332/
>>>>>
>>>>
>>>> Actually, this is confusing. Which patch is the one we should be
>>>> reviewing?
>>>
>>> It is confusing, but not as much as I first thought.  Please check the commit
>>> messages.
>>>
>>> The first two patches are meant to be applied to master *and* backpatched.  The
>>> first one intends to fixes the bug that non-inherited stats are being used for
>>> queries of inheritance trees.  The 2nd one fixes the regression that stats are
>>> not collected for inheritence trees of partitioned tables (which is the only
>>> type of stats they could ever possibly have).
>>
>> I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
>> the (!rte->inh) check in the rel->statlist loops. AFAICS both places
>> could do that right at the beginning, because it does not depend on the
>> statistics object at all, just the RelOptInfo.
> 
> I probably did this to make the code change small, to avoid indentin the whole
> block.

But indenting the block is not necessary. It's possible to do something
like this:

    if (!rel->inh)
        return 1.0;

or whatever is the "default" result for that function.

> 
>>> And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
>>> inherited and non-inherited stats, only in master, since it requires a catalog
>>> change.  It's a bit confusing that patch #4 removes most what I added in
>>> patches 1 and 2.  But that's exactly what's needed to collect and apply both
>>> inherited and non-inherited stats: the first two patches avoid applying stats
>>> collected with the wrong inheritence.  That's also what's needed for the
>>> patchset to follow the normal "apply to master and backpatch" process, rather
>>> than 2 patches which are backpatched but not applied to master, and one which
>>> is applied to master and not backpatched..
>>>
>>
>> Yeah. Af first I was a bit confused because after applying 0003 there
>> are both the fixes and the "correct" way, but then I realized 0004
>> removes the unnecessary bits.
> 
> This was to leave your 0003 (mostly) unchanged, so you can see and/or apply my
> changes.  They should be squished together.
> 

Yep.

>> The one thing 0003 still needs is to rework the places that need to
>> touch both inh and !inh stats. The patch simply does
>>
>>     for (inh = 0; inh <= 1; inh++) { ... }
>>
>> but that feels a bit too hackish. But if we don't know which of the two
>> stats exist, I'm not sure what to do about it. 
> 
> There's also this:
> 
> On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
>> +       /* create only the "stxdinherit=false", because that always exists */
>> +       datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = ObjectIdGetDatum(false);
>>
>> That'd be confusing for partitioned tables, no?
>> They'd always have an row with no data.
>> I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE).
>> (not ObjectIdGetDatum).
>> Then, that affects the loops which delete the tuples - neither inh nor !inh is
>> guaranteed, unless you check relkind there, too.
> 
> Maybe the for inh<=1 loop should instead be two calls to new functions factored
> out of get_relation_statistics() and RemoveStatisticsById(), which take "bool
> inh".
> 

Well, yeah. That's part of the strange 1:2 mapping between the stats
definition and data. Although, even with regular stats we have such
mapping, except the "definition" is the pg_attribute row.

>> And I'm not sure we do the right thing after removing children, for example
>> (that should drop the inheritance stats, I guess).
> 
> Do you mean for inheritance only ?  Or partitions too ?
> I think for partitions, the stats should stay.
> And for inheritence, they can stay, for consistency with partitions, and since
> it does no harm.
> 

I think the behavior should be the same as for data in pg_statistic,
i.e. if we keep/remove those, we should do the same thing for extended
statistics.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: Fixing WAL instability in various TAP tests