Html parsing and inline elements

Поиск
Список
Период
Сортировка
От Marcelo Zabani
Тема Html parsing and inline elements
Дата
Msg-id CACgY3QZ0_TX4LBC8=RRCRGM2Mgos6S8jj8AhxYMP6P5EM2M4yQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Html parsing and inline elements  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi everyone,

I was here wondering whether HTML parsing should separate tokens that are not separated by spaces in the original text, but are separated by an inline element. Let me show you an example:

SELECT to_tsvector('english', 'Hello<p>neighbor</p>, you are <strong>n</strong>i<em>ce</em>')
Results: "'ce':7 'hello':1 'n':5 'neighbor':2"

"Hello" and "neighbor" should really be separated, because <p> is a block element, but "nice" should be a single word there, since there is no visual separation when rendered (<em> and <strong> are inline elements).

Sorry if this has been asked before, but I couldn't find it anywhere.

Thanks in advance,
Marcelo.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Missing PG_INT32_MIN in numutils.c
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Missing PG_INT32_MIN in numutils.c