Re: gsoc, text search selectivity and dllist enhancments

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: gsoc, text search selectivity and dllist enhancments
Дата
Msg-id 486E77E8.6010404@enterprisedb.com
обсуждение исходный текст
Ответ на Re: gsoc, text search selectivity and dllist enhancments  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: gsoc, text search selectivity and dllist enhancments  (Jan Urbański <j.urbanski@students.mimuw.edu.pl>)
Список pgsql-hackers
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> The data structure I'd suggest is a simple array of pointers
>>> to the underlying hash table entries.  Since you have a predetermined
>>> maximum number of lexemes to track, you can just palloc the array once
>>> --- you don't need the expansibility properties of a list. 
> 
>> The number of lexemes isn't predetermined. It's 2 * (longest tsvector 
>> seen so far), and we don't know beforehand how long the longest tsvector is.
> 
> Hmm, I had just assumed without looking too closely that it was stats
> target times a fudge factor.  What is the rationale for doing it as
> above?  I don't think I like the idea of the limit varying over the
> course of the scan --- that means that lexemes in different places
> in the input will have significantly different probabilities of
> surviving to the final result.

Well, clearly if the list is smaller than the longest tsvector, 
inserting all elements of that long tsvector will flush out all other 
entries. Or if we throw away the newly inserted entries, some elements 
will never have a chance to climb up the list. I'm not sure where the 
"times two" figure comes from, maybe it's just a fudge factor, but the 
bottom line is that the minimum size needed depends on the size of the 
longest tsvector.

(Jan is offline until Saturday...)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Fix a couple of bugs in win32 shmem name generation: * Don't cut
Следующее
От: "David E. Wheeler"
Дата:
Сообщение: Re: PATCH: CITEXT 2.0