Обсуждение: Header unfolding in archived mail

Поиск
Список
Период
Сортировка

Header unfolding in archived mail

От
Noah Misch
Дата:
The mailing list web archives display the subject of message
20130603190727.GA360354@tornado.leadboat.com as follows:

Partitioning performance: cache stringToNode() ofpg_constraint.ccbin

Note the lack of whitespace after "of".  The original message, which you can
see by downloading the mbox for June 2013, conveyed the subject this way:

Subject: Partitioning performance: cache stringToNode() of
    pg_constraint.ccbin

Per RFC 5322, section 2.2.3:

   The process of moving from this folded multiple-line representation
   of a header field to its single line representation is called
   "unfolding".  Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
   evaluation.  An unfolded header field has no length restriction and
   therefore may be indeterminately long.

So, the archives should present the subject like this:

Partitioning performance: cache stringToNode() of    pg_constraint.ccbin

Gmane and osdir.com do so.  MARC and Gmail show a space in place of the tab,
but Gmail converts every subject-line tab to a space.  I have attached a
patch, against pgarchives.git, making its unfolding code conform to RFC 5322.
The change also affects headers folded before a space rather than before a
tab, such as 50E31370.5030405@cybertec.at.  Those have been displaying fine
despite the lack of unfolding because newline-space renders like a space in
HTML.  I unit-tested the change, but I did not test the full archives load.


The "raw" message display feature seems to have its own set of rules, and I
failed to find their implementation.  Here are the subject lines for the
aforementioned messages according to "raw" display:

Subject: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
Subject: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket
    communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes
    long time to detect n/w breakdown

In one case, "\n\t" from the true raw original (in the mbox file) became " ".
In the other case, two instances of "\n " became "\n\t".  Any ideas where that
transformation is coming from?

Thanks,
nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

Re: Header unfolding in archived mail

От
Noah Misch
Дата:
On Sat, Sep 07, 2013 at 06:07:45PM -0400, Noah Misch wrote:
> The mailing list web archives display the subject of message
> 20130603190727.GA360354@tornado.leadboat.com as follows:
> 
> Partitioning performance: cache stringToNode() ofpg_constraint.ccbin
> 
> Note the lack of whitespace after "of".  The original message, which you can
> see by downloading the mbox for June 2013, conveyed the subject this way:
> 
> Subject: Partitioning performance: cache stringToNode() of
>     pg_constraint.ccbin
> 
> Per RFC 5322, section 2.2.3:
> 
>    The process of moving from this folded multiple-line representation
>    of a header field to its single line representation is called
>    "unfolding".  Unfolding is accomplished by simply removing any CRLF
>    that is immediately followed by WSP.  Each header field should be
>    treated in its unfolded form for further syntactic and semantic
>    evaluation.  An unfolded header field has no length restriction and
>    therefore may be indeterminately long.
> 
> So, the archives should present the subject like this:
> 
> Partitioning performance: cache stringToNode() of    pg_constraint.ccbin
> 
> Gmane and osdir.com do so.  MARC and Gmail show a space in place of the tab,
> but Gmail converts every subject-line tab to a space.  I have attached a
> patch, against pgarchives.git, making its unfolding code conform to RFC 5322.
> The change also affects headers folded before a space rather than before a
> tab, such as 50E31370.5030405@cybertec.at.  Those have been displaying fine
> despite the lack of unfolding because newline-space renders like a space in
> HTML.  I unit-tested the change, but I did not test the full archives load.
> 
> 
> The "raw" message display feature seems to have its own set of rules, and I
> failed to find their implementation.  Here are the subject lines for the
> aforementioned messages according to "raw" display:
> 
> Subject: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
> Subject: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket
>     communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes
>     long time to detect n/w breakdown
> 
> In one case, "\n\t" from the true raw original (in the mbox file) became " ".
> In the other case, two instances of "\n " became "\n\t".  Any ideas where that
> transformation is coming from?

Ping.  Any advice on how to more-thoroughly test the pgarchives.git change, or
where I might find the corresponding code affecting "raw" message display?

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Header unfolding in archived mail

От
Magnus Hagander
Дата:
On Mon, Dec 9, 2013 at 1:41 AM, Noah Misch <noah@leadboat.com> wrote:
On Sat, Sep 07, 2013 at 06:07:45PM -0400, Noah Misch wrote:
> The mailing list web archives display the subject of message
> 20130603190727.GA360354@tornado.leadboat.com as follows:
>
> Partitioning performance: cache stringToNode() ofpg_constraint.ccbin
>
> Note the lack of whitespace after "of".  The original message, which you can
> see by downloading the mbox for June 2013, conveyed the subject this way:
>
> Subject: Partitioning performance: cache stringToNode() of
>       pg_constraint.ccbin
>
> Per RFC 5322, section 2.2.3:
>
>    The process of moving from this folded multiple-line representation
>    of a header field to its single line representation is called
>    "unfolding".  Unfolding is accomplished by simply removing any CRLF
>    that is immediately followed by WSP.  Each header field should be
>    treated in its unfolded form for further syntactic and semantic
>    evaluation.  An unfolded header field has no length restriction and
>    therefore may be indeterminately long.
>
> So, the archives should present the subject like this:
>
> Partitioning performance: cache stringToNode() of     pg_constraint.ccbin
>
> Gmane and osdir.com do so.  MARC and Gmail show a space in place of the tab,
> but Gmail converts every subject-line tab to a space.  I have attached a
> patch, against pgarchives.git, making its unfolding code conform to RFC 5322.
> The change also affects headers folded before a space rather than before a
> tab, such as 50E31370.5030405@cybertec.at.  Those have been displaying fine
> despite the lack of unfolding because newline-space renders like a space in
> HTML.  I unit-tested the change, but I did not test the full archives load.
>
>
> The "raw" message display feature seems to have its own set of rules, and I
> failed to find their implementation.  Here are the subject lines for the
> aforementioned messages according to "raw" display:
>
> Subject: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
> Subject: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket
>       communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes
>       long time to detect n/w breakdown
>
> In one case, "\n\t" from the true raw original (in the mbox file) became " ".
> In the other case, two instances of "\n " became "\n\t".  Any ideas where that
> transformation is coming from?

Ping.  Any advice on how to more-thoroughly test the pgarchives.git change, or
where I might find the corresponding code affecting "raw" message display?



Hi!

This one is entirely on me, I just haven't been able to get around to it yet :( It's still on my TODO list though, so I haven't given up on you!

Sorry! 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Header unfolding in archived mail

От
Noah Misch
Дата:
On Sun, Dec 15, 2013 at 05:56:13PM +0100, Magnus Hagander wrote:
> On Mon, Dec 9, 2013 at 1:41 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Sat, Sep 07, 2013 at 06:07:45PM -0400, Noah Misch wrote:
> > > Per RFC 5322, section 2.2.3:
> > >
> > >    The process of moving from this folded multiple-line representation
> > >    of a header field to its single line representation is called
> > >    "unfolding".  Unfolding is accomplished by simply removing any CRLF
> > >    that is immediately followed by WSP.  Each header field should be
> > >    treated in its unfolded form for further syntactic and semantic
> > >    evaluation.  An unfolded header field has no length restriction and
> > >    therefore may be indeterminately long.

> > > I have attached a patch, against pgarchives.git, making its unfolding
> > > code conform to RFC 5322.

> > Ping.  Any advice on how to more-thoroughly test the pgarchives.git
> > change, or where I might find the corresponding code affecting "raw"
> > message display?

> This one is entirely on me, I just haven't been able to get around to it
> yet :( It's still on my TODO list though, so I haven't given up on you!

Ping.