Luke Lonergan wrote:
> For plans that qualify with the above conditions, the executor will issue
> blocking calls to lseek(), which will translate to a single disk actuator
> moving to the needed location in seek_time, approximately 8ms.
I doubt it's actually the lseeks, but the reads/writes after the lseeks
that block.
> If we implement AIO and allow for multiple pending I/Os used to prefetch
> groups of qualifying tuples, basically a form of random readahead, we can
> improve the throughput for any given query by taking advantage of multiple
> disk actuators.
Rather than jumping to AIO, which is a huge change, I think we could get
much of the benefit by using posix_fadvise(WILLNEED) in strategic places
to tell the OS what pages we're going to need in the near future. If the
OS has implemented that properly, it should schedule I/Os for the
requested pages ahead of time. That would require very little change to
PostgreSQL code, and could simply be #ifdef'd away on platforms that
don't support posix_fadvise.
> Note that
> the same approach would also work to speed sequential access by overlapping
> compute and I/O.
Yes, though the OS should already doing read ahead for us. How efficient
it is is another question. posix_fadvise(SEQUENTIAL) could be used to
give a hint on that as well.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com