Re: [PROPOSAL] Shared Ispell dictionaries
От | Arthur Zakirov |
---|---|
Тема | Re: [PROPOSAL] Shared Ispell dictionaries |
Дата | |
Msg-id | 20171231152811.GA4233@arthur.localdomain обсуждение исходный текст |
Ответ на | [PROPOSAL] Shared Ispell dictionaries (Arthur Zakirov <a.zakirov@postgrespro.ru>) |
Ответы |
Re: [PROPOSAL] Shared Ispell dictionaries
|
Список | pgsql-hackers |
Hello, hackers, On Tue, Dec 26, 2017 at 07:48:27PM +0300, Arthur Zakirov wrote: > The patch will be ready and added into the 2018-03 commitfest. > I attached the patch itself. 0001-Fix-ispell-memory-handling.patch: Some strings are allocated via compact_palloc0(). But they are not persistent, so they should be allocated using temporary memory context. Also a couple strings are not released if .aff file had new format. 0002-Retreive-shmem-location-for-ispell.patch: Adds ispell_shmem_location() function which look for location for a dictionary using .dict and .aff file names. If the location haven't been allocated in DSM earlier, allocate it. Shared hash table is used here to search the location. Maximum number of elements of hash table is NUM_DICTIONARIES=20 now. It will be better to use a GUC-variable. Also if the number of elements reached the limit then it will be good to use backend's local memory instead of shared. 0003-Store-ispell-structures-in-shmem.patch: Introduces IspellDictBuild and IspellDictData structures, removes IspellDict structure. IspellDictBuild is used during building the dictionary, if it haven't been allocated in DSM earlier, within dispell_build() function. IspellDictBuild has a pointer to IspellDictData structure, which will be filled with persistent data. After building the dictionary IspellDictData is copied into DSM location and temporary data of IspellDictBuild is released. All prefix trees are stored as a flat array now. Those arrays are allocated and stored using NodeArray struct now. Required node can be retreied by node offset. AffixData and Affix arrays have additional offset array to retreive an element by index. Affix field (array of AFFIX) of IspellDictBuild is persistent data also. But it is constructed as a temporary array first, Affix array need to be sorted via qsort() within NISortAffixes(). So IspellDictData stores: - AffixData - array of strings, access via AffixDataOffset - Affix - array of AFFIX, access via AffixOffset - DictNodes, PrefixNodes, SuffixNodes - prefix trees as a plain array - CompoundAffix - array of CMPDAffix sequential access I had to remove compact_palloc0() added by Pavel in 3e5f9412d0a818be77c974e5af710928097b91f3. Ispell dictionary doesn't need such allocation anymore. It was used to allocate a little locations. I will definity check performance of Czech dictionary. There are issues to do: - add the GUC-variable for hash table limit - fix bugs - improve comments - performance testing -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: