On 2007-05-22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If "%" advances by bytes then this will find a spurious match. The
> only thing that prevents it is if "B" can't be both a leading and a
> trailing byte of validly-encoded MB characters.
Which is (by design) true in UTF8, but is not true of most other
multibyte charsets.
The %_ case is also trivially handled in UTF8 by simply ensuring that
_ doesn't match a non-initial octet. This allows % to advance by bytes
without danger of losing sync.
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services