[PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit"setting is relatively small for large tables

Поиск
Список
Период
Сортировка
От Adé
Тема [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit"setting is relatively small for large tables
Дата
Msg-id CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables
Список pgsql-hackers
Hello all,

Like the title says, using "gin_fuzzy_search_limit" degrades speed when it has a relatively low setting.

What do I mean by "relatively low"? An example would be when a table with a GIN index has many millions of rows and a particular keyword search has 1,000,000 possible results because the keyword is very common (or it's just that the table is so supremely large that even a somewhat common keyword appears enough to return one million results). However, you only want to return around 100 random results from that one million, so you set gin_fuzzy_search_limit to 100. That limit is relatively low when you look at the ratio of the limit value to the possible results: 100 / 1,000,000  = 0.0001. You'll find the query is very slow for such a low ratio. It isn't so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a total of 10,000 possible results (resulting in a higher ratio of 0.1).

This would explain why in the documentation it is said that "From experience, values in the thousands (e.g., 5000 — 20000) work well". It's not so common to have queries that return large enough result sets such that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in low ratios and so result in the performance issue I've observed (these gin_fuzzy_search_limit values have relatively high ratios between 0.005 and 0.02 if you have 1,000,000 results for a keyword search). However, if you desire a lower gin_fuzzy_search_limit such as 100, while also having a relatively larger table, you'll find this slowness issue.

I discussed this issue more and the reason for it in my original bug report: https://www.postgresql.org/message-id/16220-1a0a4f0cb67cafdc@postgresql.org

Attached is SQL to test and observe this issue and also attached is a patch I want to eventually submit to a commitfest.

Best regards,
Adé
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: TestLib condition for deleting temporary directories
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: add support for IN and @> in functional-dependencystatistics use