Re: No easy way to join discussion in existing thread when not subscribed

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: No easy way to join discussion in existing thread when not subscribed
Дата
Msg-id 20150929201131.GG2573@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: No easy way to join discussion in existing thread when not subscribed  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Ответы Re: No easy way to join discussion in existing thread when not subscribed  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Список pgsql-www
Stefan Kaltenbrunner wrote:
> On 09/29/2015 09:34 PM, Amir Rohan wrote:

> > btw, there's something off with the mbox processing chain you use.
> > I think it is non-compliant with the spec (as per qmail manpage of
> > yore), which requires so called ">From quoting":
> > 
> > http://qmail.org/man/man5/mbox.html
> > 
> > See for example <20150802150506.GH11473@alap3.anarazel.de> in
> > pgsql-hackers.201508, which includes email messages as mime
> > attachments and triggers (I believe) "missing Message-Ids"
> > warnings from your tool, and is perhaps mangled in the archives,
> > I've seen a few dozen of those while testing.
> 
> we have a number of current issues where data in the archives gets
> mangled/corrupted we are looking into. We are currently working on some
> infrastructure to "test" parsing fixes across all the messages in the
> archives to get a better understanding of what kind effect a change has.

We do?  I didn't know we were trying to keep track of these things,
otherwise I would have pointed it out earlier.  I think there's the same
problem in Majordomo2 as well -- or rather, it's a bug in itself that
Majordomo passes these messages through without complaining about the
broken From line.  (A decade ago, I would have said that Majordomo
should have fixed the message itself, but nowadays that's probably not
workable due to hash-signatures of the message bodies etc.)

> > The average email length in -hackers was about 10k in august.
> > The largest thread contained 91 messages, the median was 3.
> > So, say it takes 1M to store an mbox file for a large thread,
> > assuming august is a representative sample.

> good data - will look into the entire archives to see what the "largest
> thread every" (in terms of octet bytes) was and whether the current
> system would be able to cope. My concern mostly stems from operational
> experience(on the sysadmin team) that some operations on the archives
> currently are fairly computational and memory intensive causing issues
> with availability and we would want to not add more vectors that can
> cause that.

The problem is that some threads contain patchsets that become large,
and we don't mind posting them over and over as we revise them, so the
total byte count can become pretty large.  See for instance
https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org
where I posted the DDL deparse patch series several times, about 400kB
apiece.  It didn't last too long though and I doubt that's the largest
one; maybe search for Andres' patch series for logical decoding, a
couple of years ago.

Also, if we use this as a basis to implement mbox-for-commitfest as I
proposed, it would become much worse because a CF can contain 50-100
threads.

If this could be done using an iterator that processes one or few
messages at a time, that would probably fix the issue completely.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-www по дате отправления:

Предыдущее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: No easy way to join discussion in existing thread when not subscribed
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: No easy way to join discussion in existing thread when not subscribed