Обсуждение: Text searching HTML

Поиск
Список
Период
Сортировка

Text searching HTML

От
"Campbell, Lance"
Дата:

PostgreSQL 9.3

Is there a preferred way to search text within an HTML document?  I have been reading up on searching via to_tsvector.  You can pass the to_tsvector two parameters.  The first appears to be a dictionary and the second text.  Is there by chance an English HTML dictionary?  That way html tags or html attributes would be ignored.

 

If not then what would be the suggested dictionary?  Simple or English or something else.

 

Thanks for your assistance.

 

 

Thanks,

 

Lance Campbell

Software Architect

Web Services at Public Affairs

217-333-0382

University of Illinois at Urbana-Champaign logo

 

 

Re: Text searching HTML

От
Tom Lane
Дата:
"Campbell, Lance" <lance@illinois.edu> writes:
> Is there a preferred way to search text within an HTML document?  I have been reading up on searching via
to_tsvector. You can pass the to_tsvector two parameters.  The first appears to be a dictionary and the second text.
Isthere by chance an English HTML dictionary?  That way html tags or html attributes would be ignored.
 

I believe all the built-in text search configurations ignore HTML tags by
default, since they have no mapping for the "tag" token type that the
built-in parser reports those as.  You could of course make a custom
configuration that acts differently.
        regards, tom lane