Re: [DOCS] synchronize_seqscans' description is a bit misleading

Поиск
Список
Период
Сортировка
От Gurjeet Singh
Тема Re: [DOCS] synchronize_seqscans' description is a bit misleading
Дата
Msg-id CABwTF4XZvDki20+edBaHPzT3rvahEtpnoa+W+nwKWgBvqoPG4Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [DOCS] synchronize_seqscans' description is a bit misleading  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [DOCS] synchronize_seqscans' description is a bit misleading  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Apr 10, 2013 at 11:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gurjeet Singh <gurjeet@singh.im> writes:
> If I'm reading the code right [1], this GUC does not actually *synchronize*
> the scans, but instead just makes sure that a new scan starts from a block
> that was reported by some other backend performing a scan on the same
> relation.

Well, that's the only *direct* effect, but ...

> Since the backends scanning the relation may be processing the relation at
> different speeds, even though each one took the hint when starting the
> scan, they may end up being out of sync with each other.

The point you're missing is that the synchronization is self-enforcing:
whichever backend gets ahead of the others will be the one forced to
request (and wait for) the next physical I/O.  This will naturally slow
down the lower-CPU-cost-per-page scans.  The other ones tend to catch up
during the I/O operation.

Got it. So far, so good.

Let's consider a pathological case where a scan is performed by a user controlled cursor, whose scan speed depends on how fast the user presses the "Next" button, then this scan is quickly going to fall out of sync with other scans. Moreover, if a new scan happens to pick up the block reported by this slow scan, then that new scan may have to read blocks off the disk afresh.

So, again, it is not guaranteed that all the scans on a relation will synchronize with each other. Hence my proposal to include the term 'probability' in the definition.


The feature is not terribly useful unless I/O costs are high compared to
the CPU cost-per-page.  But when that is true, it's actually rather
robust.  Backends don't have to have exactly the same per-page
processing cost, because pages stay in shared buffers for a while after
the current scan leader reads them.

Agreed. Even if the buffer has been evicted from shared_buffers, there's a high likelihood that the scan that's close on the heels of others will fetch it from FS cache.
 

> Imagining that all scans on a table are always synchronized, may make some
> wrongly believe that adding more backends scanning the same table will not
> incur any extra I/O; that is, only one stream of blocks will be read from
> disk no matter how many backends you add to the mix. I noticed this when I
> was creating partition tables, and each of those was a CREATE TABLE AS
> SELECT FROM original_table (to avoid WAL generation), and running more than
> 3 such transactions caused the disk read throughput to behave unpredictably,
> sometimes even dipping below 1 MB/s for a few seconds at a stretch.

It's not really the scans that's causing that to be unpredictable, it's
the write I/O from the output side, which is forcing highly
nonsequential behavior (or at least I suspect so ... how many disk units
were involved in this test?)

You may be right. I don't have access to the system anymore, and I don't remember the disk layout, but it's quite possible that write operations were causing the  read throughput to drop. I did try to reproduce the behaviour on my laptop with up to 6 backends doing pure reads on a table that was multiple times the system RAM, but I could not get them to get out of sync.

--
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: ObjectClass/ObjectType mixup
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [DOCS] synchronize_seqscans' description is a bit misleading