Re: old synchronized scan patch

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: old synchronized scan patch
Дата
Msg-id 1165282980.4302.34.camel@dogma.v10.wvs
обсуждение исходный текст
Ответ на Re: old synchronized scan patch  ("Luke Lonergan" <llonergan@greenplum.com>)
Список pgsql-hackers
On Mon, 2006-12-04 at 16:47 -0800, Luke Lonergan wrote:
> Jeff,
> > My current patch starts a new sequential scan on a given relation at the
> > page of an already-running scan. It makes no guarantees that the scans
> > stay together, but in practice I don't think they deviate much. To try
> > to enforce synchronization of scanning I fear would do more harm than
> > good. Thoughts?
> 
> I think this is good enough - a "background scanner" approach would be the
> logical alternative, but may not be necessary.  I suspect you are correct
> about the scans being nearly synced.
> 

A background scanner requires synchronizing the backends, which can
cause all kinds of bad performance problems. Otherwise, how would you
ensure that ALL the backends that need a page get it before the train
moves on? I don't think it's necessary or desirable to have a background
scanner.

> This may not be the case if and when we implement a priority based
> scheduler, but in that case we're already managing the throughput based on
> content anyway.
> 

Right. I don't see how you'd be able to get the data to a backend that
needs it without running that backend. If it's a priority scheduler, you
may not run that backend.

> > Also, it's more of a "hint" system that uses a direct mapping of the
> > relations Oid to hold the position of the scan. That means that, in rare
> > cases, the page offset could be wrong, in which case it will degenerate
> > to the current performance characteristics with no cost. The benefit of
> > doing it this way is that it's simple code, with essentially no
> > performance penalty or additional locking. Also, I can use a fixed
> > amount of shared memory (1 page is about right).
> 
> If I understand correctly, Tom's concern is that this page is potentially
> accessed once for every page read and may consequently become very hot.  How
> do you manage the scan position so this doesn't happen?  Do we have any
> readahead in the I/O layer?  There is certainly readahead in the OS I/O
> cache, but it's dynamic and we don't know the block position...
> 

My proposal is hint-based, and I consider any reads coming from that
data page to be untrusted. I can just statically map the Oid of the
relation onto a location in the page (which may or may not collide with
another Oid). The advantage here is that I don't need to lock. There's
no contention because every access is just reading a word or two from
the shared page (or writing a word or two). Of course, it must be a
static data structure because it can't contain pointers. But 8kb should
be enough to hold information on plenty of interesting relations.

I don't trust the data because collisions are possible. If the number is
obviously wrong (higher than number of pages in relation file), a new
scan starts at offset 0. If it's wrong but that can't be detected, it
will start at that location anyway, which is OK because that arbitrary
value is no worse than the arbitrary value of 0.

The whole premise of my implementation is:
(1) Get 99% of the benefit
(2) Pay near-zero performance penalty, even in worst case.
(3) Simple code changes

If those 3 things aren't true, let me know.

The way I see it, the only real cost is that it may break things that
assume deterministic sequential scans.

Regards,Jeff Davis



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Luke Lonergan"
Дата:
Сообщение: Re: old synchronized scan patch
Следующее
От: Tom Lane
Дата:
Сообщение: Re: old synchronized scan patch