Обсуждение: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

Поиск
Список
Период
Сортировка

[PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Célestin Matte
Дата:
Hello,

As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this
field,some might not, and this will cause load_message.py to crash.
 
As a solution to this, when this field is missing, this patch:
- attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import
anold mbox)
 
- generates a new Message-ID if none exists, following (a simpler version of) [2].

[1] https://www.rfc-editor.org/rfc/rfc2822#section-3.6.4
[2] https://datatracker.ietf.org/doc/html/draft-ietf-usefor-message-id-00#section-3

Cheers,
-- 
Célestin Matte
Вложения

Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Célestin Matte
Дата:
By the way, loader/load_message.py has a double crash issue, when importing a message raising an IgnorableException:

[...]
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/pgarchives/local//loader/load_message.py", line 155, in <module>
    log_failed_message(listid, "mbox", opt.mbox, ap, e)
  File "/srv/pgarchives/local//loader/load_message.py", line 36, in log_failed_message
    'err': str(str(err), 'us-ascii', 'replace'),
TypeError: decoding str is not supported

I don't understand what this line is supposed to do (removing non-ascii characters?), but a simple str(err) fixes the
issue.

Cheers,
-- 
Célestin Matte



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Magnus Hagander
Дата:


On Wed, Nov 3, 2021 at 6:02 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
Hello,

As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field, some might not, and this will cause load_message.py to crash.
As a solution to this, when this field is missing, this patch:
- attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import an old mbox)
- generates a new Message-ID if none exists, following (a simpler version of) [2].

I don't think this should be the responsibility of pglister. As you say, "most MTAs do add this field" -- and the solution is to configure the MTA to do this. We already rely on the MTA to get a lot of other important things right.

It may be something that should get documented somewhere as a requirement. 

--

Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Magnus Hagander
Дата:


On Thu, Nov 4, 2021 at 11:31 AM Magnus Hagander <magnus@hagander.net> wrote:


On Wed, Nov 3, 2021 at 6:02 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
Hello,

As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field, some might not, and this will cause load_message.py to crash.
As a solution to this, when this field is missing, this patch:
- attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import an old mbox)
- generates a new Message-ID if none exists, following (a simpler version of) [2].

I don't think this should be the responsibility of pglister. As you say, "most MTAs do add this field" -- and the solution is to configure the MTA to do this. We already rely on the MTA to get a lot of other important things right.

Sorry, I mean pgarchives here of course, not pglister :)
 

--

Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Magnus Hagander
Дата:


On Wed, Nov 3, 2021 at 6:05 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
By the way, loader/load_message.py has a double crash issue, when importing a message raising an IgnorableException:

[...]
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/pgarchives/local//loader/load_message.py", line 155, in <module>
    log_failed_message(listid, "mbox", opt.mbox, ap, e)
  File "/srv/pgarchives/local//loader/load_message.py", line 36, in log_failed_message
    'err': str(str(err), 'us-ascii', 'replace'),
TypeError: decoding str is not supported

I don't understand what this line is supposed to do (removing non-ascii characters?), but a simple str(err) fixes the issue.

It's supposed to remove non-ascii characters.

I think this is a leftover from the py2->py3 conversion. It looks like an overenthusiastic regexp replacement in the 2to3 tool. See bb5775ef where it came from. I'll go change it to jut str(err).

--

Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Célestin Matte
Дата:
> I don't think this should be the responsibility of pglister. As you say, "most MTAs do add this field" -- and the
solutionis to configure the MTA to do this. We already rely on the MTA to get a lot of other important things right.
 

But then these messages will get delivered by pglister but pgarchives will fail to archive them, although they do not
actuallybreak requirements. Shouldn't we follow the RFC here?
 

> It may be something that should get documented somewhere as a requirement. 

I'll write a small readme for pgarchives

-- 
Célestin Matte



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Célestin Matte
Дата:
> It's supposed to remove non-ascii characters.
> 
> I think this is a leftover from the py2->py3 conversion. It looks like an overenthusiastic regexp replacement in the
2to3tool. See bb5775ef where it came from. I'll go change it to jut str(err).
 

Right. Beware that there are other similar failing conversions in the same commit on lines in loader/lib/parser.py that
shouldbe addressed as well
 

-- 
Célestin Matte



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Alvaro Herrera
Дата:
On 2021-Nov-04, Célestin Matte wrote:

> > I don't think this should be the responsibility of pglister. As you
> > say, "most MTAs do add this field" -- and the solution is to
> > configure the MTA to do this. We already rely on the MTA to get a
> > lot of other important things right.
> 
> But then these messages will get delivered by pglister but pgarchives
> will fail to archive them, although they do not actually break
> requirements. Shouldn't we follow the RFC here?

Maybe pglister should refuse to deliver messages that don't contain
a Message-Id.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Christophe Pettus
Дата:

> On Nov 4, 2021, at 12:47, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Maybe pglister should refuse to deliver messages that don't contain
> a Message-Id.

That seems reasonable.  If there's no message ID by the time pglister gets it, something is pretty broken along the
pathfrom the sender to us. 


Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Tom Lane
Дата:
Christophe Pettus <xof@thebuild.com> writes:
>> On Nov 4, 2021, at 12:47, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Maybe pglister should refuse to deliver messages that don't contain
>> a Message-Id.

> That seems reasonable.  If there's no message ID by the time pglister gets it, something is pretty broken along the
pathfrom the sender to us. 

FWIW, I've long used a spam filtering rule that sends anything without
a message ID, or with a forged local message ID, straight to /dev/null.
It's quite effective, and I've not seen it eat any valid traffic.

            regards, tom lane



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Célestin Matte
Дата:
> Have you actually come across any case where a *proper* non-spam message is sent without a message-id and passes
throughactual mailservers on the way? 
 
> 
> Looking through the approximately 1.4 million mails in the postgres list archives, not a single one has a message-id
generatedby the archives server MTA (which is configured to generate it). Not a single one by our inbound relay
servers.And exactly one by the pglister server -- which turns out to be a bounce that ended up in the archives because
ofa misconfiguration back in 2018 that's not visible in the public archives.
 

After some tests, I do have a very few number of non-spam examples (3 emails from 2 different people in a
postfix+mailmanmbox of 5k emails), but they date from 2003-2007.
 

Exim already handles empty Message-IDs by default by generating them [1], although it will let a message with
Resent-Message-IDpass through as-if. I tested such a case, and pglister seems to actually drop the message (or fail
silently).


[1]: https://www.exim.org/exim-html-current/doc/html/spec_html/ch-message_processing.html#SECID226
-- 
Célestin Matte



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Justin Clift
Дата:
On 2021-11-05 22:31, Célestin Matte wrote:
>> Have you actually come across any case where a *proper* non-spam 
>> message is sent without a message-id and passes through actual 
>> mailservers on the way? 

As a data point, I'm pretty sure it was the message-id field
being missing from emails sent by scan.co.uk (a UK computer
retailer) a while back, which caused their non-delivery to
the mail servers here (@postgresql.org).

So, being that we bounce them, it wouldn't at all surprise me
that there's not much presence in our archives. :)

+ Justin



Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing

От
Stefan Kaltenbrunner
Дата:
On 11/4/21 10:07 PM, Magnus Hagander wrote:
> 
> 
> On Thu, Nov 4, 2021 at 8:47 PM Alvaro Herrera <alvherre@alvh.no-ip.org 
> <mailto:alvherre@alvh.no-ip.org>> wrote:
> 
>     On 2021-Nov-04, Célestin Matte wrote:
> 
>      > > I don't think this should be the responsibility of pglister. As you
>      > > say, "most MTAs do add this field" -- and the solution is to
>      > > configure the MTA to do this. We already rely on the MTA to get a
>      > > lot of other important things right.
>      >
>      > But then these messages will get delivered by pglister but pgarchives
>      > will fail to archive them, although they do not actually break
>      > requirements. Shouldn't we follow the RFC here?
> 
> 
> I agree that the scenario is a problem, per below.  I don't agree that 
> making up an id is a solution to that problem.
> 
> 
>     Maybe pglister should refuse to deliver messages that don't contain
>     a Message-Id.
> 
> 
> It should. I actually thought it did already, but apparently it does 
> not. I guess we've only ever used it under properly configured MTAs :)
> 
> Have you actually come across any case where a *proper* non-spam message 
> is sent without a message-id and passes through actual mailservers on 
> the way?
> 
> Looking through the approximately 1.4 million mails in the postgres list 
> archives, not a single one has a message-id generated by the archives 
> server MTA (which is configured to generate it). Not a single one by our 
> inbound relay servers. And exactly one by the pglister server -- which 
> turns out to be a bounce that ended up in the archives because of a 
> misconfiguration back in 2018 that's not visible in the public archives.

as mentioned down-thread by Justin Clift we have been plain rejecting 
mails without a message-id on the postgresql.org inbound relays since 
March 27th 2012(!) according to our repo and the number of rejects due 
to that rule is actually not-insignificant (approximately 200-400/day 
with the majority being for a very small number of bounce generating 
senders) but the number of complaints is also approaching (almost) zero.

So the reason why pglister is not seeing them a lot is because we dont 
accept them upstream, not that they dont exist in the wild...

Though the ones in the wild seem to be "not very useful"...



Stefan

Stefan