Re: Post-2018 messages in archives

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Post-2018 messages in archives
Дата
Msg-id CABUevEzPiDjcFvMWVgOjbXOnLC+nWnZW-0PRX7O_PKSxoJtKrQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Post-2018 messages in archives  (Noah Misch <noah@leadboat.com>)
Список pgsql-www
On Thu, Dec 6, 2018 at 7:14 AM Noah Misch <noah@leadboat.com> wrote:
On Wed, Dec 05, 2018 at 11:31:39PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Dec 05, 2018 at 09:39:18AM +0100, Magnus Hagander wrote:
> >>> Unfortunately we don't keep the ingest time separately. But for the future,
> >>> doing so would probably be a good idea, for other reasons as well.
>
> > Works for me.  Pondering it more, the timestamp that matters most for archive
> > purposes is the timestamp at which list subscribers started to receive their
> > copies of the message.  Based on that, I'm thinking we should ignore the Date
> > header and always use the timestamp from a particular "Received ... by
> > HOSTNAME.postgresql.org" header.  Before settling on that, I'd want to check
> > how many messages change timestamp by more than ~100s, and I'd want to spot
> > check a few messages to see whether the change looks like an improvement.
>
> Another point worth considering here is moderation queue delays, which
> are not infrequently measured in days :-(.  I am not quite sure whether
> it'd be better to tag a moderation-delayed message with the timestamp
> when it entered the queue or the time when it exited.  But either one
> would be better than believing the Date: header.

Good point.  I'd prefer to use the time when it exited the queue, which
conforms to "timestamp at which list subscribers started to receive their
copies of the message" mentioned above.  I usually download November's mbox in
the first few days of December.  If we use the timestamp of entering the queue
(or the Date header), there's no particular upper bound on when the November
mbox stops accruing new messages.

Given that this has happened 10 times across 1.25 million messages, I really can't get excited about building any form of complicated solution for it.. :)

So for this, just using the automatic timestamp assigned to the row when it enteres the archives should do. Normally it will only differ a second or a few compared to the suggestions above, and it would only grow to something bigger if the archives server was temporarily down or there were other delivery issues.

--

В списке pgsql-www по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Post-2018 messages in archives
Следующее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: Dropping training events