Re: Stats target increase vs compute_tsvector_stats()

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Stats target increase vs compute_tsvector_stats()
Дата
Msg-id 29737.1229353308@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Stats target increase vs compute_tsvector_stats()  (Jan Urbański <j.urbanski@students.mimuw.edu.pl>)
Список pgsql-hackers
Jan Urbański <j.urbanski@students.mimuw.edu.pl> writes:
> Tom Lane wrote:
>> I came across this bit in ts_typanalyze.c:
>> 
>>    /* We want statistic_target * 100 lexemes in the MCELEM array */
>>    num_mcelem = stats->attr->attstattarget * 100;
>> 
>> I wonder whether the multiplier here should be changed?

> The origin of that bit is this post:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00556.php
> and the following few downthread ones.

> If we bump the default statistics target 10 times, then changing the 
> multiplier to 10 seems the right thing to do.

OK, will do.

> Only thing that needs 
> caution is the frequency of pruning we do in the Lossy Counting 
> algorithm, that IIRC is correlated with the desired target length of the 
> MCELEM array.

Right below that we have
/* * We set bucket width equal to the target number of result lexemes. * This is probably about right but perhaps might
needto be scaled * up or down a bit? */bucket_width = num_mcelem;
 

so it should track automatically.  AFAICS the argument in the above
thread that this is an appropriate pruning distance holds good
regardless of just how we obtain the target mcelem count.

> BTW: I've been occupied with other things and might have missed some 
> discussions, but at some point it has been considered to use Lossy 
> Counting to gather statistics from regular columns, not only tsvectors. 
> Wouldn't this help the performance hit ANALYZE takes from upping 
> default_stats_target?

Perhaps, but it's not likely to get done for 8.4 ...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonah H. Harris"
Дата:
Сообщение: Re: Block-level CRC checks
Следующее
От: Tom Lane
Дата:
Сообщение: Re: rules regression test failed on mingw