Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Дата	27 мая 2016 г. 04:43:46
Msg-id	983f784e-994a-2445-6276-6fe128e82766@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 05/26/2016 10:10 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> Attached is a patch that should fix the coalescing, including the clock
>> skew detection. In the end I reorganized the code a bit, moving the
>> check at the end, after the clock skew detection. Otherwise I'd have to
>> do the clock skew detection on multiple places, and that seemed ugly.
>
> I hadn't been paying any attention to this thread, I must confess.
> But I rediscovered this no-coalescing problem while pursuing the poor
> behavior for shared catalogs that Peter complained of in
> https://www.postgresql.org/message-id/56AD41AC.1030509@gmx.net
>
> I posted a patch at
> https://www.postgresql.org/message-id/13023.1464213041@sss.pgh.pa.us
> which I think is functionally equivalent to what you have here, but
> it goes to some lengths to make the code more readable, whereas this
> is just adding another layer of complication to something that's
> already a mess (eg, the request_time field is quite useless as-is).
> So I'd like to propose pushing that in place of this patch ... do
> you care to review it first?

I do care and I'll look at it over the next few days. FWIW when writing 
that patch I intentionally refrained from major changes, as I think the 
plan was to backpatch it. But +1 for more readable code from me.

>
> Reacting to the thread overall:
>
> I see Noah's concern about wanting to merge the write work for
> requests about different databases. I've got mixed feelings about
> that: it's arguable that any such change would make things worse not
> better. In particular, it's inevitable that trying to merge writes
> will result in delaying the response to the first request, whether
> or not we are able to merge anything. That's not good in itself, and
> it means that we can't hope to merge requests over any very long
> interval, which very possibly will prevent any merging from
> happening in real situations. Also, considering that we know the
> stats collector can be pretty slow to respond at all under load, I'm
>  worried that this would result in more outright failures.
>
> Moreover, what we'd hope to gain from merging is fewer writes of the
> global stats file and the shared-catalog stats file; but neither of
> those are very big, so I'm skeptical of what we'd win.

Yep. Clearly there's a trade-off between slowing down response to the 
first request vs. speeding-up the whole process, but as you point out we 
probably can't gain enough to justify that.

I wonder if that's still true on clusters with many databases (say, 
shared systems with thousands of dbs). Perhaps walking the list just 
once would save enough CPU to make this a win.

In any case, if we decide to abandon the idea of merging requests for 
multiple databases, that probably means we can further simplify the 
code. last_statrequests is a list but it actually never contains more 
than just a single request. We kept it that way because of the plan to 
add the merging. But if that's not worth it ...

>
> In view of 52e8fc3e2, there's more or less no case in which we'd be
> writing stats without writing stats for the shared catalogs. So I'm
> tempted to propose that we try to reduce the overhead by merging the
> shared-catalog stats back into the global-stats file, thereby
> halving the filesystem metadata traffic for updating those.

I find this a bit contradictory with the previous paragraph. If you 
believe that reducing the filesystem metadata traffic will have a 
measurable benefit, then surely merging writes for multiple dbs (thus 
not writing the global/shared files multiple times) will have even 
higher impact, no?

E.g. let's assume we're still writing the global+shared+db files for 
each database. If there are requests for 10 databases, we'll write 30 
files. If we merge those requests first, we're writing only 12 files.

So I'm not sure about the metadata traffic argument, we'd need to see 
some numbers showing it really makes a difference.

That being said, I'm not opposed to merging the shared catalog into the 
global-stats file - it's not really a separate database so having it in 
a separate file is a bit weird.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 27 мая 2016 г., 02:49:57
Сообщение: Re: foreign table batch inserts

Следующее

От: Michael Paquier
Дата: 27 мая 2016 г., 04:55:29
Сообщение: Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Предыдущее

Следующее