Re: pg_kazsearch: Full-text search extension for Kazakh language
| От | Adrien Nayrat |
|---|---|
| Тема | Re: pg_kazsearch: Full-text search extension for Kazakh language |
| Дата | |
| Msg-id | 34cf74ff-5466-44e0-9a3f-e626708f893a@anayrat.info обсуждение |
| Ответ на | pg_kazsearch: Full-text search extension for Kazakh language (Darkhan <darkhanahmetov2005@gmail.com>) |
| Ответы |
Re: pg_kazsearch: Full-text search extension for Kazakh language
|
| Список | pgsql-general |
On 4/5/26 3:32 PM, Darkhan wrote: > Hi all, > > I built pg_kazsearch, a PostgreSQL extension that adds full-text search > support for Kazakh. Currently there's no Kazakh dictionary, stemmer, or > stop word list available in PostgreSQL, so anyone searching Kazakh text is > stuck with trigram matching or application-level workarounds. > > Kazakh is agglutinative — a single word can carry 5-6 suffixes, which makes > standard search approaches miss most relevant results. pg_kazsearch > provides a custom Kazakh stemmer (core written in Rust), a stop word list, > and a text search dictionary that plugs into the standard PostgreSQL FTS > infrastructure — GIN indexes, ts_rank, phrase search all work out of the > box. > > I tested it on a dataset of 3,000 real Kazakh news articles. On the same > query, pg_kazsearch returns 61 relevant articles vs 1 with trigram search, > with a 23% improvement in recall overall. > > You can install it with a single command via deb package or Docker image, > no compilation needed. > > Repo: https://github.com/darkhanakh/pg-kazsearch > > I'd appreciate any feedback, especially from anyone working on text search > internals or with experience supporting non-Latin or agglutinative > languages in PostgreSQL. > > Thanks, Darkhan > Hello, Thanks for your work. I don't know anything about Kazakh. But have you try to add it to Snowball stemmer [1] ? As Postgres uses it, you have more chances to have Kazakh supported in future versions. 1: https://github.com/snowballstem/snowball -- Adrien NAYRAT https://pro.anayrat.info
В списке pgsql-general по дате отправления: