Обсуждение: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

Поиск
Список
Период
Сортировка

mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Noah Misch
Дата:
https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202003 is
missing all messages after 2020-03-09.  pgsql-bugs.202003 and
pgsql-hackers.202004 are fine.



Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Magnus Hagander
Дата:


On Wed, Apr 8, 2020 at 8:20 AM Noah Misch <noah@leadboat.com> wrote:
https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202003 is
missing all messages after 2020-03-09.  pgsql-bugs.202003 and
pgsql-hackers.202004 are fine.

It looks like there was an email in the archives exactly on the 9th that somehow confused the python mbox generating code into just stopping there. I've marked that message as corrupt, and AFAICT it now happily downloads the full mbox. 

--

Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Noah Misch
Дата:
On Fri, Apr 10, 2020 at 05:30:03PM +0200, Magnus Hagander wrote:
> On Wed, Apr 8, 2020 at 8:20 AM Noah Misch <noah@leadboat.com> wrote:
> > https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202003 is
> > missing all messages after 2020-03-09.  pgsql-bugs.202003 and
> > pgsql-hackers.202004 are fine.
> >
> 
> It looks like there was an email in the archives exactly on the 9th that
> somehow confused the python mbox generating code into just stopping there.
> I've marked that message as corrupt, and AFAICT it now happily downloads
> the full mbox.

Confirmed.  Thanks.



Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Noah Misch
Дата:
On Fri, Apr 10, 2020 at 09:32:46PM -0700, Noah Misch wrote:
> On Fri, Apr 10, 2020 at 05:30:03PM +0200, Magnus Hagander wrote:
> > On Wed, Apr 8, 2020 at 8:20 AM Noah Misch <noah@leadboat.com> wrote:
> > > https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202003 is
> > > missing all messages after 2020-03-09.  pgsql-bugs.202003 and
> > > pgsql-hackers.202004 are fine.
> > 
> > It looks like there was an email in the archives exactly on the 9th that
> > somehow confused the python mbox generating code into just stopping there.
> > I've marked that message as corrupt, and AFAICT it now happily downloads
> > the full mbox.
> 
> Confirmed.  Thanks.

https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202007 has
its latest message from 2020-07-08.  Did the same sort of cause recur?



Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Magnus Hagander
Дата:


On Thu, Aug 6, 2020 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:
On Fri, Apr 10, 2020 at 09:32:46PM -0700, Noah Misch wrote:
> On Fri, Apr 10, 2020 at 05:30:03PM +0200, Magnus Hagander wrote:
> > On Wed, Apr 8, 2020 at 8:20 AM Noah Misch <noah@leadboat.com> wrote:
> > > https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202003 is
> > > missing all messages after 2020-03-09.  pgsql-bugs.202003 and
> > > pgsql-hackers.202004 are fine.
> >
> > It looks like there was an email in the archives exactly on the 9th that
> > somehow confused the python mbox generating code into just stopping there.
> > I've marked that message as corrupt, and AFAICT it now happily downloads
> > the full mbox.
>
> Confirmed.  Thanks.

https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.202007 has
its latest message from 2020-07-08.  Did the same sort of cause recur?

Yeah, same thing again, there was a corrupt message in the archive that made it blow up. It's annoying that it just stops there and doesn't actually produce an error :/ 

In this particular case, it was git-send-email setting content-transfer-encoding to 8bit and then passing utf8 data in there. This is most likely a client side configuration error, I guess, but it'd be good if we can somehow deal with it without blowing up.

--

Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Noah Misch
Дата:
On Thu, Aug 06, 2020 at 11:37:17AM +0200, Magnus Hagander wrote:
> Yeah, same thing again, there was a corrupt message in the archive that
> made it blow up. It's annoying that it just stops there and doesn't
> actually produce an error :/

Confirmed that the whole month now downloads.

> In this particular case, it was git-send-email setting
> content-transfer-encoding to 8bit and then passing utf8 data in there.

What is wrong with that?  It would be wrong to set "Content-Transfer-Encoding:
BASE64" and then include utf8, but "Content-Transfer-Encoding: 8bit" sounds
okay under my limited understanding of MIME.



Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Magnus Hagander
Дата:
On Fri, Aug 7, 2020 at 5:05 AM Noah Misch <noah@leadboat.com> wrote:
On Thu, Aug 06, 2020 at 11:37:17AM +0200, Magnus Hagander wrote:
> Yeah, same thing again, there was a corrupt message in the archive that
> made it blow up. It's annoying that it just stops there and doesn't
> actually produce an error :/

Confirmed that the whole month now downloads.

> In this particular case, it was git-send-email setting
> content-transfer-encoding to 8bit and then passing utf8 data in there.

What is wrong with that?  It would be wrong to set "Content-Transfer-Encoding:
BASE64" and then include utf8, but "Content-Transfer-Encoding: 8bit" sounds
okay under my limited understanding of MIME.

You are of course correct. The content-transfer-encoding is fine. I was checking too many headers and forgot which one was actually the broken one.

The problem is it does not specify the *charset*. It does so specifically for the From header, but not for the main body. And without an encoding specified, the main body is limited to 7 bit ascii (and it contained utf8 which made things go boom).

--

Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Alvaro Herrera
Дата:
On 2020-Aug-07, Magnus Hagander wrote:

> The problem is it does not specify the *charset*. It does so specifically
> for the From header, but not for the main body. And without an encoding
> specified, the main body is limited to 7 bit ascii (and it contained utf8
> which made things go boom).

Hm, so what's the fix for this problem?  Are you editing those old
messages?  I happened to notice yesterday that mbox/pgsql-hackers.201408
only has a couple dozen messages ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Noah Misch
Дата:
On Fri, Aug 07, 2020 at 11:51:27AM +0200, Magnus Hagander wrote:
> On Fri, Aug 7, 2020 at 5:05 AM Noah Misch <noah@leadboat.com> wrote:
> > On Thu, Aug 06, 2020 at 11:37:17AM +0200, Magnus Hagander wrote:
> > > In this particular case, it was git-send-email setting
> > > content-transfer-encoding to 8bit and then passing utf8 data in there.
> >
> > What is wrong with that?  It would be wrong to set
> > "Content-Transfer-Encoding:
> > BASE64" and then include utf8, but "Content-Transfer-Encoding: 8bit" sounds
> > okay under my limited understanding of MIME.
> 
> You are of course correct. The content-transfer-encoding is fine. I was
> checking too many headers and forgot which one was actually the broken one.
> 
> The problem is it does not specify the *charset*. It does so specifically
> for the From header, but not for the main body. And without an encoding
> specified, the main body is limited to 7 bit ascii (and it contained utf8
> which made things go boom).

Hmm.  Can you reply attaching the verbatim message, or otherwise point to how
to view it?  While I can cause an improper charset as follows, this doesn't
look like a match for the symptom you found.

$ echo ä >>README; git commit -m 'test utf8' README
[master 2cbac89] test utf8
 1 file changed, 1 insertion(+)
$ git -c "sendemail.assume8bitEncoding=bogus" send-email --to=noah@leadboat.com HEAD^..HEAD
/tmp/UHeTvyTROO/0001-test-utf8.patch
(mbox) Adding cc: Noah Misch <noah@leadboat.com> from line 'From: Noah Misch <noah@leadboat.com>'

From: Noah Misch <noah@leadboat.com>
To: noah@leadboat.com
Subject: [PATCH] test utf8
Date: Sun,  9 Aug 2020 00:43:18 -0700
Message-Id: <1596958998-448745-1-git-send-email-noah@leadboat.com>
X-Mailer: git-send-email 1.8.3.1
MIME-Version: 1.0
Content-Type: text/plain; charset=bogus
Content-Transfer-Encoding: 8bit
...



Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Magnus Hagander
Дата:
On Sun, Aug 9, 2020 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:
On Fri, Aug 07, 2020 at 11:51:27AM +0200, Magnus Hagander wrote:
> On Fri, Aug 7, 2020 at 5:05 AM Noah Misch <noah@leadboat.com> wrote:
> > On Thu, Aug 06, 2020 at 11:37:17AM +0200, Magnus Hagander wrote:
> > > In this particular case, it was git-send-email setting
> > > content-transfer-encoding to 8bit and then passing utf8 data in there.
> >
> > What is wrong with that?  It would be wrong to set
> > "Content-Transfer-Encoding:
> > BASE64" and then include utf8, but "Content-Transfer-Encoding: 8bit" sounds
> > okay under my limited understanding of MIME.
>
> You are of course correct. The content-transfer-encoding is fine. I was
> checking too many headers and forgot which one was actually the broken one.
>
> The problem is it does not specify the *charset*. It does so specifically
> for the From header, but not for the main body. And without an encoding
> specified, the main body is limited to 7 bit ascii (and it contained utf8
> which made things go boom).

Hmm.  Can you reply attaching the verbatim message, or otherwise point to how
to view it?  While I can cause an improper charset as follows, this doesn't
look like a match for the symptom you found.

I will grab it out and send you off-list.
 

$ echo ä >>README; git commit -m 'test utf8' README
[master 2cbac89] test utf8
 1 file changed, 1 insertion(+)
$ git -c "sendemail.assume8bitEncoding=bogus" send-email --to=noah@leadboat.com HEAD^..HEAD
/tmp/UHeTvyTROO/0001-test-utf8.patch
(mbox) Adding cc: Noah Misch <noah@leadboat.com> from line 'From: Noah Misch <noah@leadboat.com>'

From: Noah Misch <noah@leadboat.com>
To: noah@leadboat.com
Subject: [PATCH] test utf8
Date: Sun,  9 Aug 2020 00:43:18 -0700
Message-Id: <1596958998-448745-1-git-send-email-noah@leadboat.com>
X-Mailer: git-send-email 1.8.3.1
MIME-Version: 1.0
Content-Type: text/plain; charset=bogus
Content-Transfer-Encoding: 8bit
...

In the emails that are failing, that Content-Type header simply does not exist.

--

Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Magnus Hagander
Дата:


On Fri, Aug 7, 2020 at 7:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Aug-07, Magnus Hagander wrote:

> The problem is it does not specify the *charset*. It does so specifically
> for the From header, but not for the main body. And without an encoding
> specified, the main body is limited to 7 bit ascii (and it contained utf8
> which made things go boom).

Hm, so what's the fix for this problem?  Are you editing those old
messages?  I happened to notice yesterday that mbox/pgsql-hackers.201408
only has a couple dozen messages ...

I'm playing with a couple of different ways of handling it. 

Glad you found that example, because there's definitely something *else* that's wrong with that one. So now I have a second example to work off when trying to find a fix!

--

Re: mbox/pgsql-hackers.202003 missing all messages after 2020-03-09

От
Noah Misch
Дата:
On Sun, Aug 09, 2020 at 03:30:20PM +0200, Magnus Hagander wrote:
> On Fri, Aug 7, 2020 at 7:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On 2020-Aug-07, Magnus Hagander wrote:
> > > The problem is it does not specify the *charset*. It does so specifically
> > > for the From header, but not for the main body. And without an encoding
> > > specified, the main body is limited to 7 bit ascii (and it contained utf8
> > > which made things go boom).
> >
> > Hm, so what's the fix for this problem?  Are you editing those old
> > messages?  I happened to notice yesterday that mbox/pgsql-hackers.201408
> > only has a couple dozen messages ...
> >
> 
> I'm playing with a couple of different ways of handling it.
> 
> Glad you found that example, because there's definitely something *else*
> that's wrong with that one. So now I have a second example to work off when
> trying to find a fix!

Here's another example:
https://www.postgresql.org/list/pgsql-hackers/mbox/pgsql-hackers.200107
is missing all messages after 2001-07-03.