Simplifying Text Search
От | Simon Riggs |
---|---|
Тема | Simplifying Text Search |
Дата | |
Msg-id | 1194879636.2644.167.camel@ebony.site обсуждение исходный текст |
Ответы |
Re: Simplifying Text Search
(Martijn van Oosterhout <kleptog@svana.org>)
Re: Simplifying Text Search (Heikki Linnakangas <heikki@enterprisedb.com>) Re: Simplifying Text Search (Tom Lane <tgl@sss.pgh.pa.us>) Re: Simplifying Text Search (Bruce Momjian <bruce@momjian.us>) Re: Simplifying Text Search (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-hackers |
Something Tom Dunstan just mentioned has made me ask the question "Why does our full text search feature look so strange?". It's the operator-laden syntax that causes the problem. By any stretch, this query is difficult for most people to understand: SELECT * FROM text_table WHERE to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat'); Wouldn't it be much simpler to just have a function, so we can write this query like this? SELECT * FROM text_table WHERE text_search('haystack needle haystack', 'needle'); We then explain to people that while the above is nice, it will presume that both the function inputs are Text, which isn't any good for complex searches, indexing and dictionaries etc.., so then we move to: SELECT * FROM text_table WHERE text_search('haystack needle haystack'::tsvector, 'needle'::tsquery); or perhaps SELECT * FROM text_table WHERE full_text_search('haystack needle haystack', 'needle & hay'); which would automatically do the conversions to tsvector and tsquery for us. No more tedious casting, easy to read. [*text_search() functions would return bool] So we end up with a normal sounding function that is overloaded to provide all of the various goodies. We can include the text_search(text, text) version of the function in the normal chapter on functions, with a pointer to the more complex stuff elsewhere. Sound good? We can then explain everything without having to use @@ operators. They can then be introduced as an option. The side benefit of this is that we can then allow our wonderful new functionality to be more easily usable by things like Hibernate. We just tell them we have this new function and thats all they need to know. I know that under the covers the @@ operator is necessary because we hang various pieces of optimizer information from it. Each function signature gets an operator with matching signature, so there's a 1:1 correspondence in most use cases. So to make this all hang together, there'd need to be a some smarts that says: if there is only one operator on a function then use the operator's optimizer information when you see just the function. That information can be assessed at DDL time, so we can keep accurate track of operator counts in pgproc. An alternative approach might be to make the first operator created on a function the "primary" operator. All other operators would then be secondary operators, so that adding operators would not change the inference mechanism. I've not got sufficient knowledge to say how hard the function-to-operator inference is, but it would be dang useful in making text search and many other programs readable and easy to interface to. In the end that is going to mean wider usage of that functionality, with more people feeling like they can dip their toes into the water. I must confess I have insufficient time to do this myself right now, not least me discovering exactly how. I'm spending time on this now because I'm the one that has to explain this stuff to people and things like this can make a huge difference in their understanding and eventual uptake. Thoughts? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
В списке pgsql-hackers по дате отправления: