Обсуждение: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
Hello, As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field,some might not, and this will cause load_message.py to crash. As a solution to this, when this field is missing, this patch: - attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import anold mbox) - generates a new Message-ID if none exists, following (a simpler version of) [2]. [1] https://www.rfc-editor.org/rfc/rfc2822#section-3.6.4 [2] https://datatracker.ietf.org/doc/html/draft-ietf-usefor-message-id-00#section-3 Cheers, -- Célestin Matte
Вложения
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Célestin Matte
Дата:
By the way, loader/load_message.py has a double crash issue, when importing a message raising an IgnorableException: [...] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/srv/pgarchives/local//loader/load_message.py", line 155, in <module> log_failed_message(listid, "mbox", opt.mbox, ap, e) File "/srv/pgarchives/local//loader/load_message.py", line 36, in log_failed_message 'err': str(str(err), 'us-ascii', 'replace'), TypeError: decoding str is not supported I don't understand what this line is supposed to do (removing non-ascii characters?), but a simple str(err) fixes the issue. Cheers, -- Célestin Matte
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Magnus Hagander
Дата:
On Wed, Nov 3, 2021 at 6:02 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
Hello,
As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field, some might not, and this will cause load_message.py to crash.
As a solution to this, when this field is missing, this patch:
- attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import an old mbox)
- generates a new Message-ID if none exists, following (a simpler version of) [2].
I don't think this should be the responsibility of pglister. As you say, "most MTAs do add this field" -- and the solution is to configure the MTA to do this. We already rely on the MTA to get a lot of other important things right.
It may be something that should get documented somewhere as a requirement.
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Magnus Hagander
Дата:
On Thu, Nov 4, 2021 at 11:31 AM Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Nov 3, 2021 at 6:02 PM Célestin Matte <celestin.matte@cmatte.me> wrote:Hello,
As surprising as it may seem, Message-ID is actually not a mandatory email field [1]. While most MTAs do add this field, some might not, and this will cause load_message.py to crash.
As a solution to this, when this field is missing, this patch:
- attempts to find a "Sent-Message-ID" header and use it as the Message-ID (a case I encountered when trying to import an old mbox)
- generates a new Message-ID if none exists, following (a simpler version of) [2].I don't think this should be the responsibility of pglister. As you say, "most MTAs do add this field" -- and the solution is to configure the MTA to do this. We already rely on the MTA to get a lot of other important things right.
Sorry, I mean pgarchives here of course, not pglister :)
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Magnus Hagander
Дата:
On Wed, Nov 3, 2021 at 6:05 PM Célestin Matte <celestin.matte@cmatte.me> wrote:
By the way, loader/load_message.py has a double crash issue, when importing a message raising an IgnorableException:
[...]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/pgarchives/local//loader/load_message.py", line 155, in <module>
log_failed_message(listid, "mbox", opt.mbox, ap, e)
File "/srv/pgarchives/local//loader/load_message.py", line 36, in log_failed_message
'err': str(str(err), 'us-ascii', 'replace'),
TypeError: decoding str is not supported
I don't understand what this line is supposed to do (removing non-ascii characters?), but a simple str(err) fixes the issue.
It's supposed to remove non-ascii characters.
I think this is a leftover from the py2->py3 conversion. It looks like an overenthusiastic regexp replacement in the 2to3 tool. See bb5775ef where it came from. I'll go change it to jut str(err).
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Célestin Matte
Дата:
> I don't think this should be the responsibility of pglister. As you say, "most MTAs do add this field" -- and the solutionis to configure the MTA to do this. We already rely on the MTA to get a lot of other important things right. But then these messages will get delivered by pglister but pgarchives will fail to archive them, although they do not actuallybreak requirements. Shouldn't we follow the RFC here? > It may be something that should get documented somewhere as a requirement. I'll write a small readme for pgarchives -- Célestin Matte
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Célestin Matte
Дата:
> It's supposed to remove non-ascii characters. > > I think this is a leftover from the py2->py3 conversion. It looks like an overenthusiastic regexp replacement in the 2to3tool. See bb5775ef where it came from. I'll go change it to jut str(err). Right. Beware that there are other similar failing conversions in the same commit on lines in loader/lib/parser.py that shouldbe addressed as well -- Célestin Matte
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Alvaro Herrera
Дата:
On 2021-Nov-04, Célestin Matte wrote: > > I don't think this should be the responsibility of pglister. As you > > say, "most MTAs do add this field" -- and the solution is to > > configure the MTA to do this. We already rely on the MTA to get a > > lot of other important things right. > > But then these messages will get delivered by pglister but pgarchives > will fail to archive them, although they do not actually break > requirements. Shouldn't we follow the RFC here? Maybe pglister should refuse to deliver messages that don't contain a Message-Id. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Christophe Pettus
Дата:
> On Nov 4, 2021, at 12:47, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Maybe pglister should refuse to deliver messages that don't contain > a Message-Id. That seems reasonable. If there's no message ID by the time pglister gets it, something is pretty broken along the pathfrom the sender to us.
Christophe Pettus <xof@thebuild.com> writes: >> On Nov 4, 2021, at 12:47, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Maybe pglister should refuse to deliver messages that don't contain >> a Message-Id. > That seems reasonable. If there's no message ID by the time pglister gets it, something is pretty broken along the pathfrom the sender to us. FWIW, I've long used a spam filtering rule that sends anything without a message ID, or with a forged local message ID, straight to /dev/null. It's quite effective, and I've not seen it eat any valid traffic. regards, tom lane
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Célestin Matte
Дата:
> Have you actually come across any case where a *proper* non-spam message is sent without a message-id and passes throughactual mailservers on the way? > > Looking through the approximately 1.4 million mails in the postgres list archives, not a single one has a message-id generatedby the archives server MTA (which is configured to generate it). Not a single one by our inbound relay servers.And exactly one by the pglister server -- which turns out to be a bounce that ended up in the archives because ofa misconfiguration back in 2018 that's not visible in the public archives. After some tests, I do have a very few number of non-spam examples (3 emails from 2 different people in a postfix+mailmanmbox of 5k emails), but they date from 2003-2007. Exim already handles empty Message-IDs by default by generating them [1], although it will let a message with Resent-Message-IDpass through as-if. I tested such a case, and pglister seems to actually drop the message (or fail silently). [1]: https://www.exim.org/exim-html-current/doc/html/spec_html/ch-message_processing.html#SECID226 -- Célestin Matte
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Justin Clift
Дата:
On 2021-11-05 22:31, Célestin Matte wrote: >> Have you actually come across any case where a *proper* non-spam >> message is sent without a message-id and passes through actual >> mailservers on the way? As a data point, I'm pretty sure it was the message-id field being missing from emails sent by scan.co.uk (a UK computer retailer) a while back, which caused their non-delivery to the mail servers here (@postgresql.org). So, being that we bounce them, it wouldn't at all surprise me that there's not much presence in our archives. :) + Justin
Re: [PATCH] pgarchives: parser: handle messages in which Message-ID is missing
От
Stefan Kaltenbrunner
Дата:
On 11/4/21 10:07 PM, Magnus Hagander wrote: > > > On Thu, Nov 4, 2021 at 8:47 PM Alvaro Herrera <alvherre@alvh.no-ip.org > <mailto:alvherre@alvh.no-ip.org>> wrote: > > On 2021-Nov-04, Célestin Matte wrote: > > > > I don't think this should be the responsibility of pglister. As you > > > say, "most MTAs do add this field" -- and the solution is to > > > configure the MTA to do this. We already rely on the MTA to get a > > > lot of other important things right. > > > > But then these messages will get delivered by pglister but pgarchives > > will fail to archive them, although they do not actually break > > requirements. Shouldn't we follow the RFC here? > > > I agree that the scenario is a problem, per below. I don't agree that > making up an id is a solution to that problem. > > > Maybe pglister should refuse to deliver messages that don't contain > a Message-Id. > > > It should. I actually thought it did already, but apparently it does > not. I guess we've only ever used it under properly configured MTAs :) > > Have you actually come across any case where a *proper* non-spam message > is sent without a message-id and passes through actual mailservers on > the way? > > Looking through the approximately 1.4 million mails in the postgres list > archives, not a single one has a message-id generated by the archives > server MTA (which is configured to generate it). Not a single one by our > inbound relay servers. And exactly one by the pglister server -- which > turns out to be a bounce that ended up in the archives because of a > misconfiguration back in 2018 that's not visible in the public archives. as mentioned down-thread by Justin Clift we have been plain rejecting mails without a message-id on the postgresql.org inbound relays since March 27th 2012(!) according to our repo and the number of rejects due to that rule is actually not-insignificant (approximately 200-400/day with the majority being for a very small number of bounce generating senders) but the number of complaints is also approaching (almost) zero. So the reason why pglister is not seeing them a lot is because we dont accept them upstream, not that they dont exist in the wild... Though the ones in the wild seem to be "not very useful"... Stefan Stefan