Re: Updated tsearch documentation
От | Nicolas Barbier |
---|---|
Тема | Re: Updated tsearch documentation |
Дата | |
Msg-id | b0f3f5a10707071602m6662ebb4yc4c145dedf5f8601@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Updated tsearch documentation (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Updated tsearch documentation
|
Список | pgsql-hackers |
2007/7/7, Bruce Momjian <bruce@momjian.us>: > FYI, I have massively reorganized the text search documentation and it > is getting closer to something I am happy with: > > http://momjian.us/expire/fulltext/HTML/textsearch.html The following is the result of me proofreading, mainly searching for small mistakes such as spelling/grammatical errors (that means no document structure comments, etc). All corrections are relative to the version of the text at above URL at the time of me reading it :-). General It seems to be a recurring problem that commas are not put between the brackets when an argument is optional. For example: "to_tsvector([conf_name], document TEXT)" -> I guess this should be "to_tsvector([conf_name,] document TEXT)" Full-text vs. full text and stop-word vs. stop word are not used consistently. Also, capitalization of full text searching is not used consistently. 14.1. Introduction * "indexinging" - > "indexing" * "There is no linguistic support, even in English" -> "for" instead of "in"? * "e.g.satisfies" -> add a space before "satisfies" * "have several thousands derivatives" -> should this not use the singular form thousand? * "infinitive form" -> is this the right term? I think it only applies to verbs (also occurs in 14.4 and probably others) * "over how lexemes creation" -> not sure what this should be. "are created" maybe? * "Map synonyms to a single word. ispell." -> why is ispell a standalone word? * "so it is natural to introduce a new data type" -> this does not sound like documentation * "Also, full-text search operator @@" -> add "the" before "full-text" * "A document is any text file that can be opened, read, and modified" -> "file" sounds as if it should be a file on a filesystem. * "However, the document file must be uniquely identified in the database." -> why? * "COALESCE" -> should be a link * "during calculation of document rank" -> add "the" before "calculation" and before "document" * "which supports boolean operators, & (AND)" -> remove the ",". maybe add "the" before boolean * "parenthesis" -> "parentheses" * "Tsquery consists of" -> maybe add "A" before Tsquery 14.2. Operators And Functions ^^^ -> a non-capital "a" in "and" seems to be more consistent with the rest of the manual * "TSVECTOR, otherwise false:" -> "and false if not" or "and false otherwise" (occurs 3 times in this section) * "The text should be formatted to match the way a vector is displayed by SELECT." -> what a strange definition, I think something like "input format" or so should be used (and defined somewhere, didn't see it yet) (used twice in this section) * "tsearch([vector_column_name], my_filter_name | text_column_name1 [...], text_column_nameN)" -> I do not understand the notation * "The following rule is used: a function is applied to all subsequent TEXT columns until next matching column occurs." -> I don't get it * "stat([sqlquery text ], [weight text ]) returns SETOF statinfo" -> I guess that not both of the arguments are optional? * "stop-words candidates" -> stop-word candidates * "tsvectors are compared with each other using lexicographical ordering." -> of the output representation or something else? * "Accepts querytext, which should be single tokens separated by" -> replace "be" with "consist of" * "& and | or, and ! not" -> putting parentheses around the "and" "or" and "not" would be more readable. also, a comma is missing before the "|" sign * "break it onto tokens" -> into instead of onto * "since GIN indexes do not support negate queries" -> something like: "queries with negation" or "negated queries" (depending on what the correct rule is) * "Arguments to rewrite() function" -> "the .. functions" or "to .." (without the "function") * "can be column names of type tsquery" -> "names of columns of type tsquery" (the names are not of type tsquery, the columns are) * "we can change rewriting rule online" -> add "the", possibly use another word for "online" (it is not clear what that means to me) 14.3. Additional Controls * "Full text searching in PostgreSQL provides function" -> add "the" * "we see the resulting" -> maybe "we see that the resulting" "does not contain a, on, or it, word rats became rat, and the punctuation sign - was ignored" -> "does not contain the words" (or lexemes, or tokens), add "the" before "word rats", add quotes around the "-" * "on words" -> "into words" * "they are too frequent" -> "they occur too frequently" (I think a word cannot "be" frequent) * "The Punctuation sign -" -> "The punctuation sign -" + put quotes around the "-" * "which shows all details of full text machinery" -> add "the" before "full" * "is to mark out the different parts of document" -> add "a" before "document" * "by the 1 + logarithm" -> "by 1 + the logarithm" * "i.e., ordering of search results will not change" -> add "the" before "ordering", maybe also before "search" * "note that second example" -> add "the" before "second" * "than ones with labeled with D" -> "than ones labeled with D" or "than ones that are labeled with D" * "Unfortunately, it is almost impossible to avoid since full text indexing in a database should work without indexes" -> I don't get it * "to show part of each document" -> add "a" before "part" * "provides the function headline" -> add something, such as "to accomplish this" or "that implements such functionality" or something. * "ellipse-separated" -> "ellipsis-separated" * "the cascade dropping of the parser function cause dropping of the headling" -> I don't get the meaning of the sentence. I guess that "cause" should be "causes" and "headling" should be 'heading" 14.4. Dictionaries * "to use any word form in a query" -> "to use any derived form of a word in a query" * "infinitive" -> is this the right term? I think it only applies to verbs (used twice in this section) * "colour" -> is the manual supposed to be UK or US English? I cannot remember ever having read any UK-isms before * "substituted to their" -> replace "to" with "by" or "with" (native English speakers, help me here) * "see dictionary for integers Section 14.11 as an example" -> strange way of referring, I would put parenthesis around the section number, or alternatively put the section number before the title * "Lexemes come through a stack" -> replace "come through" with "are processed by" or something * "appears as a stop-word" -> "turns out to be a stop-word", also "stop word" is used elsewhere (without the "-") (this inconsistency occurs a lot in this section) * "Also, the ts_debug function ( Section 14.10 ) is very useful for this." -> the spaces around the section reference look strange. maybe replace "is very useful" by "can be used" * "and appear in almost every document" -> two times "and" sounds bad, replace this "and" by a comma * "discrimination value so they can be ignored in" -> cut this in two sentences: "discrimination value. Therefore, they can be ignored in the context of" * "word like a and it is useless to have them in an index" -> replace "word" with "words", make "a" somehow stand out (quotes?), replace "and" with "although" and "have" with "store" * "However stop words" -> "However, stop words" * "does affect ranking" -> "do affect ranking" (I think both can be considered correct, but like this one better) * "Relative paths in OPTION resolve relative to share/" -> and "share/" is relative to what? such references occur elsewhere in this section * "Synonym dictionary can be used" -> replace "dictonary" with "dictionaries", or alternatively, put "A" before "synonym" * "thesynonym" -> add a space * "en_stemm" -> "en_stem" * "abbeviated" -> "abbreviated" * "preferred terms, non-preferred, related terms" -> add "terms" after "non-preferred", or alternatively, remove all "terms" references apart from the last one * "in the thesaurus requires reindexing" -> replace "requires" with "require" * "It is possible to define only one dictionary." -> I guess that sentence wants to express that only one dictionary is allowed? In that case, change to "It is only possible to define one dictionary." * "Use asterisk" -> add "an" before "asterisk" * "thesubdictionary" -> "the subdictionary" * "It is still required that sample words should be known" -> don't use "required" and "should" together: "sample words are still required to be known" * "Since thesaurus dictionary" -> add "a" before "thesaurus" * "with parser" -> add "the" before "parser" * "but we can use plainto_tsquery and to_tsvector functions" -> add "the" before the name of the first function, or remove the "functions" part * "not a lexemes" -> "not lexemes" * "on OpenOffice Wiki" -> add "the" before "OpenOffice" * "does not supports" -> "does not support" * "support of" -> "support for" * "At present, Full text" -> I guess that "full" should not be capitalized * "see Snowball site" -> add "the" before "Snowball" * "which accepts a snowball stemmer" -> "that is accepted by a snowball stemmer" 14.5. Indexes * "speedup" -> "speed up" * "GiST(The Generalized Search Tree)-based" -> "GiST (Generalized Search Tree)-based" * "GIN(The Generalized Inverted Index)-based" -> "GIN (Generalized Inverted Index)-based" * "necessary consult the" -> add "to" before "consult" * "and could be result" -> remove the "be" * "transitive containment relation is realized" -> add "the" before "transitive" * "Knuth,1973" -> add a space after the comma * "i.e. parent is 'OR'-ed bit-strings" -> "i.e., a parent is the result of 'OR'-ing the bit-strings" * "of its limited" -> "of the limited" * "The likelihood of false drops" -> what are "drops"? maybe this needs to be "hits"? * "while longer one are" -> replace "one" with "ones" * "or the result" -> add "whether" before "the" * "currently is currently" -> remove the first "currently" * "but its performance" -> replace "its" with "their" * "heap, so" -> "heap. Therefore, " * "In example below" -> add "the" before "example" * "constraint_exclusion" -> why the underscore? should be a link 14.6. Configuration * "all of the options" -> maybe remove "of the" * "objects a set" -> add a comma before "a" 14.7. Limitations * "Length of" -> "The length of" (twice) * "less then" -> "less than" None of the numbers use commas to separate the thousands, except for one. 14.8. psql Support 14.9. Application Tutorial * "searchs" -> "searches" * "is last-modified date" -> add "the" after "is" 14.10. Debugging * "Word supernovaes" -> "The word supernovaes" * "end the dictionary stack" -> add "the" before "dictionary" * "specifies maximum length" -> add "the" before "maximum" 14.12. Example of Creating a Parser * "Note it should" -> insert "that" after "Note" * "The void function" -> replace "The" with "This" Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html
В списке pgsql-hackers по дате отправления: