Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Дата
Msg-id 479E618E.5050309@enterprisedb.com
обсуждение исходный текст
Ответ на Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable  (Jeff Davis <pgsql@j-davis.com>)
Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable  (Simon Riggs <simon@2ndquadrant.com>)
Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable  ("Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at>)
Список pgsql-hackers
Simon Riggs wrote:
> On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
>> Simon Riggs <simon@2ndquadrant.com> writes:
>>> Rather than having a boolean GUC, we should have a number and make the
>>> parameter "synchronised_scan_threshold".
>> This would open up a can of worms I'd prefer not to touch, having to do
>> with whether the buffer-access-strategy behavior should track that or
>> not.  As the note in heapam.c says,
>>
>>      * If the table is large relative to NBuffers, use a bulk-read access
>>      * strategy and enable synchronized scanning (see syncscan.c).  Although
>>      * the thresholds for these features could be different, we make them the
>>      * same so that there are only two behaviors to tune rather than four.
>>
>> It's a bit late in the cycle to be revisiting that choice.  Now we do
>> already have three behaviors to worry about (BAS on and syncscan off)
>> but throwing in a randomly settable knob will take it back to four,
>> and we have no idea how that fourth case will behave.  The other tack we
>> could take (having the one GUC variable control both thresholds) is
>> not good since it will result in pg_dump trashing the buffer cache.
> 
> OK, good points. 
> 
> I'm still concerned that the thresholds gets higher as we increase
> shared_buffers. We may be removing performance features as fast as we
> gain performance when we set shared_buffers higher.
> 
> Might we agree that the threshold should be fixed at 8MB, rather than
> varying upwards as we try to tune? 

Synchronized scans, and the bulk-read strategy, don't help if the table 
fits in cache. If it fits in shared buffers, you're better off keeping 
it there, than swap pages between the OS cache and shared buffers, or 
spend any effort synchronizing scans. That's why we agreed back then 
that the threshold should be X% of shared_buffers.

It's a good point that we don't want pg_dump to screw up the cluster 
order, but that's the only use case I've seen this far for disabling 
sync scans. Even that wouldn't matter much if our estimate for 
"clusteredness" didn't get screwed up by a table that looks like this: 
"5 6 7 8 9 1 2 3 4"

Now, maybe there's more use cases where you'd want to tune the 
threshold, but I'd like to see some before we add more knobs.

To benefit from a lower threshold, you'd need to have a table large 
enough that its cache footprint matters, but is still smaller than 25% 
of shared_buffers, and have seq scans on it. In that scenario, you might 
benefit from a lower threshold, because that would leave some 
shared_buffers free for other use. Even that is quite hand-wavey; the 
buffer cache LRU algorithm handles that kind of scenarios reasonably 
well already, and whether or not

To benefit from a larger threshold, you'd need to have a table larger 
than 25% of shared_buffers, but still smaller than shared_buffers, and 
seq scan it often enough that you want to keep it in shared buffers. If 
you're frequently seq scanning a table of that size, you're most likely 
suffering from a bad plan. Even then, the performance difference 
shouldn't be that great, the table surely fits in OS cache anyway, with 
typical shared_buffers settings.

Tables that are seq scanned are typically very small, like a summary 
table with just a few rows, or huge tables in a data warehousing 
system. Between the extremes, I don't think the threshold actually has a 
very big impact.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: CLUSTER and synchronized scans and pg_dump et al
Следующее
От: "Christopher Browne"
Дата:
Сообщение: Re: [PATCHES] Better default_statistics_target