Re: No easy way to join discussion in existing thread when not subscribed

Поиск
Список
Период
Сортировка
От Stefan Kaltenbrunner
Тема Re: No easy way to join discussion in existing thread when not subscribed
Дата
Msg-id 560AF266.4000302@kaltenbrunner.cc
обсуждение исходный текст
Ответ на Re: No easy way to join discussion in existing thread when not subscribed  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-www
On 09/29/2015 10:11 PM, Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
>> On 09/29/2015 09:34 PM, Amir Rohan wrote:
> 
>>> btw, there's something off with the mbox processing chain you use.
>>> I think it is non-compliant with the spec (as per qmail manpage of
>>> yore), which requires so called ">From quoting":
>>>
>>> http://qmail.org/man/man5/mbox.html
>>>
>>> See for example <20150802150506.GH11473@alap3.anarazel.de> in
>>> pgsql-hackers.201508, which includes email messages as mime
>>> attachments and triggers (I believe) "missing Message-Ids"
>>> warnings from your tool, and is perhaps mangled in the archives,
>>> I've seen a few dozen of those while testing.
>>
>> we have a number of current issues where data in the archives gets
>> mangled/corrupted we are looking into. We are currently working on some
>> infrastructure to "test" parsing fixes across all the messages in the
>> archives to get a better understanding of what kind effect a change has.
> 
> We do?  I didn't know we were trying to keep track of these things,
> otherwise I would have pointed it out earlier.  I think there's the same
> problem in Majordomo2 as well -- or rather, it's a bug in itself that
> Majordomo passes these messages through without complaining about the
> broken From line.  (A decade ago, I would have said that Majordomo
> should have fixed the message itself, but nowadays that's probably not
> workable due to hash-signatures of the message bodies etc.)

well magnush and I at least discussed that we need infrastructure to
test parsing fixes/changes in a sensible way - see
http://www.postgresql.org/message-id/CAEepm=1dKk2hG3qQi25GpzABnTir8CiW9TjocJj1x8XTcd6c6A@mail.gmail.com

The way the current archives work is that the archive system is
basically a subscriber to the mailinglists and gets fed the mails as
they are sent out to any other subscriber.

> 
>>> The average email length in -hackers was about 10k in august.
>>> The largest thread contained 91 messages, the median was 3.
>>> So, say it takes 1M to store an mbox file for a large thread,
>>> assuming august is a representative sample.
> 
>> good data - will look into the entire archives to see what the "largest
>> thread every" (in terms of octet bytes) was and whether the current
>> system would be able to cope. My concern mostly stems from operational
>> experience(on the sysadmin team) that some operations on the archives
>> currently are fairly computational and memory intensive causing issues
>> with availability and we would want to not add more vectors that can
>> cause that.
> 
> The problem is that some threads contain patchsets that become large,
> and we don't mind posting them over and over as we revise them, so the
> total byte count can become pretty large.  See for instance
> https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org
> where I posted the DDL deparse patch series several times, about 400kB
> apiece.  It didn't last too long though and I doubt that's the largest
> one; maybe search for Andres' patch series for logical decoding, a
> couple of years ago.
> 
> Also, if we use this as a basis to implement mbox-for-commitfest as I
> proposed, it would become much worse because a CF can contain 50-100
> threads.
> 
> If this could be done using an iterator that processes one or few
> messages at a time, that would probably fix the issue completely.

maybe - but I also think we should look into the low-tech version of
this and build actual on-disk mbox files on a per-commitfest or a
per-thread base at the time we get the mail into the system.
At that time we can serialize the process sensibly to not overload the
system and afterwards the "only" problem we have to solve is delivering
(semi) static files from a filesystem.


Stefan



В списке pgsql-www по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: No easy way to join discussion in existing thread when not subscribed
Следующее
От: "Amir Rohan"
Дата:
Сообщение: Re: No easy way to join discussion in existing thread when not subscribed