Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled
От | Arthur Zakirov |
---|---|
Тема | Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled |
Дата | |
Msg-id | 20180712092205.GA16177@zakirov.localdomain обсуждение исходный текст |
Ответ на | BUG #15277: ts_headline strips things that look like HTML tags and itcannot be disabled (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled
|
Список | pgsql-bugs |
Hello, On Thu, Jul 12, 2018 at 07:59:40AM +0000, PG Bug reporting form wrote: > I have text that is not HTML and contains things that look like HTML tags. > The headlines are HTML escaped when output. It is very odd to have this text > missing from the resulting headlines and no way to control the behavior. <b> and </b> are recognized as "tag" token. By default they are ignored. You need to modify existing configuration or create new one: =# CREATE TEXT SEARCH CONFIGURATION english_tag (COPY = english); =# alter text search configuration english_tag add mapping for tag with simple; Then tags aren't skipped: =# select * from ts_debug('english_tag', 'query <b>test</b>'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+----------------+--------------+--------- asciiword | Word, all ASCII | query | {english_stem} | english_stem | {queri} blank | Space symbols | | {} | (null) | (null) tag | XML tag | <b> | {simple} | simple | {<b>} asciiword | Word, all ASCII | test | {english_stem} | english_stem | {test} tag | XML tag | </b> | {simple} | simple | {</b>} But even in this case ts_headline will skip tags. Because it is hardcoded [1]. I think it isn't good to change the behaviour for existing versions of PostgreSQL. But there is a workaround of course if it is appropriate for someone. It is possible to create your own text search parser extension. Example [2]. And change #define HLIDREPLACE(x) ( (x)==TAG_T ) to #define HLIDREPLACE(x) ( false ) 1 - https://github.com/postgres/postgres/blob/master/src/backend/tsearch/wparser_def.c#L1923 2 - https://github.com/postgrespro/pg_tsparser -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
В списке pgsql-bugs по дате отправления: