Re: Full text search with ORDER BY performance issue
От | Krade |
---|---|
Тема | Re: Full text search with ORDER BY performance issue |
Дата | |
Msg-id | 4A65F685.5080601@krade.com обсуждение исходный текст |
Ответ на | Re: Full text search with ORDER BY performance issue (valgog <valgog@gmail.com>) |
Ответы |
Re: Full text search with ORDER BY performance issue
(Oleg Bartunov <oleg@sai.msu.su>)
|
Список | pgsql-performance |
On 7/21/2009 11:32, valgog wrote: > Hi, > > There is a problem with GIN and GIST indexes, that they cannot be used > by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it > possible to use the b-tree columns in GIST or GIN to make the sort > easier, but I have no idea how difficult it will be to implement it in > current GIN or GIST structures. I think Oleg or even Tom will be the > right people to ask it :) But even if it is possible it will not be > implemented at least until 8.5 that will need a year to come, so until > then... > Unfortunately, it's not even just the lack of ORDER BY support, btree_gin indexes seem to be broken under some circumstances. So I can't even use my idea to limit searches to the last 10 days. See this: http://pgsql.privatepaste.com/5219TutUMk The first query gives bogus results. It's not using the index correctly. timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The timestamp column is an integer. The queries work right if I drop the index. Is this a bug in btree_gin? > It is possible to strip your table in several smaller ones putting > them on different machines and then splitting your query with DBLINK. > This will distribute the burden of sorting to several machines that > will have to sort smaller parts as well. After you have your 25 ids > from each of the machines, you can merge them, sort again and limit as > you wish. Doing large offsets will be still problematic but faster > anyway in most reasonable offset ranges. (Load balancing tools like > pg_pool can automate this task, but I do not have practical experience > using them for that purposes) > > Yet another very interesting technology -- sphinx search (http:// > www.sphinxsearch.com/). It can distribute data on several machines > automatically, but it will be probably too expensive to start using > (if your task is not your main one :)) as they do not have standard > automation scripts, it does not support live updates (so you will > always have some minutes delay), and this is a standalone service, > that needs to be maintained and configured and synchronized with our > main database separately (though you can use pg/python to access it > from postgres). > > Good luck with your task :) Yeah, I don't really have that sort of resources. This is a small hobby project (ie: no budget) that is growing a bit too large. I might just have to do text searches without time ordering. On 7/21/2009 5:06, Scott Marlowe wrote: > Couldn't you do tge second query as a with query then run another > query to limit that result to everything greater than now()-xdays ? > I suppose I could, but I have no way to do a fast query that does both a full text match and a < or > in the same WHERE due to the issue I described above, so my original plan won't work. A separate BTREE timestamp index obviously does nothing. And again, thank you for all the help.
В списке pgsql-performance по дате отправления: