Re: Merging statistics from children instead of re-sampling everything

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Merging statistics from children instead of re-sampling everything
Дата	30 июня 2021 г. 18:15:11
Msg-id	d35c52a6-3242-e559-4ba6-c0e6de3fa1b1@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Merging statistics from children instead of re-sampling everything (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы	Re: Merging statistics from children instead of re-sampling everything Re: Merging statistics from children instead of re-sampling everything
Список	pgsql-hackers

Дерево обсуждения

On 6/30/21 2:55 PM, Andrey Lepikhov wrote:
> Sorry, I forgot to send CC into pgsql-hackers.
> On 29/6/21 13:23, Tomas Vondra wrote:
>> Because sampling is fairly expensive, especially if you have to do it 
>> for large number of child relations. And you'd have to do that every 
>> time *any* child triggers autovacuum, pretty much. Merging the stats 
>> is way cheaper.
>>
>> See the other thread linked from the first message.
> Maybe i couldn't describe my idea clearly.
> The most commonly partitioning is used for large tables.
> I suppose to store a sampling reservoir for each partition, replace on 
> update of statistics and merge to build statistics for parent table.
> It can be spilled into tuplestore on a disk, or stored in a parent table.
> In the case of complex inheritance we can store sampling reservoirs only 
> for leafs.
> You can consider this idea as an imagination, but the merging statistics 
> approach has an extensibility problem on another types of statistics.
 >

Well, yeah - we might try that too, of course. This is simply exploring 
the "merge statistics" idea from [1], which is why it does not even 
attempt to do what you suggested. We may explore the approach with 
keeping per-partition samples, of course.

You're right maintaining a per-partition samples and merging those might 
solve (or at least reduce) some of the problems, e.g. eliminating most 
of the I/O that'd be needed for sampling. And yeah, it's not entirely 
clear how to merge some of the statistics types (like ndistinct). But 
for a lot of the basic stats it works quite nicely, I think.

I'm sure there'll be some complexity due to handling large / toasted 
values, etc. And we probably need to design this for large hierarchies 
(IMHO it should work with 10k partitions, not just 100), in which case 
it may still be quite a bit more expensive than merging the stats.

So maybe we should really support both, and combine them somehow?

regards

https://www.postgresql.org/message-id/CAM-w4HO9hUHvJDVwQ8%3DFgm-znF9WNvQiWsfyBjCr-5FD7gWKGA%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 30 июня 2021 г., 18:03:12
Сообщение: Re: Dependency to logging in jsonapi.c

Следующее

От: Alvaro Herrera
Дата: 30 июня 2021 г., 18:24:33
Сообщение: trivial improvement to system_or_bail

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Merging statistics from children instead of re-sampling everything

Предыдущее

Следующее