Обсуждение: Text searching HTML

Поиск
Список
Период
Сортировка

Text searching HTML

От
"Campbell, Lance"
Дата:
<div class="WordSection1"><p class="MsoNormal">PostgreSQL 9.3<p class="MsoNormal">Is there a preferred way to search
textwithin an HTML document?  I have been reading up on searching via to_tsvector.  You can pass the to_tsvector two
parameters. The first appears to be a dictionary and the second text.  Is there by chance an English HTML dictionary? 
Thatway html tags or html attributes would be ignored.<p class="MsoNormal"> <p class="MsoNormal">If not then what would
bethe suggested dictionary?  Simple or English or something else.<p class="MsoNormal"> <p class="MsoNormal">Thanks for
yourassistance.<p class="MsoNormal"> <p class="MsoNormal"> <p class="MsoNormal">Thanks,<p class="MsoNormal"> <p
class="MsoNormal"><ahref="http://illinois.edu/person/lance">Lance Campbell</a><p class="MsoNormal">Software Architect<p
class="MsoNormal">WebServices at Public Affairs<p class="MsoNormal">217-333-0382 <p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><ahref="http://illinois.edu/"><span
style="font-size:9.0pt;font-family:"Arial","sans-serif";color:blue;text-decoration:none"><imgalt="University of
Illinoisat Urbana-Champaign logo" border="0" height="25" id="Picture_x0020_1" src="cid:image001.png@01CFF757.6D500590"
width="150"/></span></a><span style="font-size:9.0pt;font-family:"Arial","sans-serif";color:blue"></span><p
class="MsoNormal"> <pclass="MsoNormal"> </div> 

Re: Text searching HTML

От
Tom Lane
Дата:
"Campbell, Lance" <lance@illinois.edu> writes:
> Is there a preferred way to search text within an HTML document?  I have been reading up on searching via
to_tsvector. You can pass the to_tsvector two parameters.  The first appears to be a dictionary and the second text.
Isthere by chance an English HTML dictionary?  That way html tags or html attributes would be ignored.
 

I believe all the built-in text search configurations ignore HTML tags by
default, since they have no mapping for the "tag" token type that the
built-in parser reports those as.  You could of course make a custom
configuration that acts differently.
        regards, tom lane