Re: RAID arrays and performance

Поиск
Список
Период
Сортировка
От Mark Mielke
Тема Re: RAID arrays and performance
Дата
Msg-id 47557B12.5040809@mark.mielke.cc
обсуждение исходный текст
Ответ на Re: RAID arrays and performance  (James Mansion <james@mansionfamily.plus.com>)
Ответы Re: RAID arrays and performance  (James Mansion <james@mansionfamily.plus.com>)
Список pgsql-performance
James Mansion wrote:
> Mark Mielke wrote:
>> This assumes that you can know which pages to fetch ahead of time -
>> which you do not except for sequential read of a single table.
> Why doesn't it help to issue IO ahead-of-time requests when you are
> scanning an index?  You can read-ahead
> in index pages, and submit requests for data pages as soon as it is
> clear you'll want them.  Doing so can allow
> the disks and OS to relax the order in which you receive them, which
> may allow you to process them while IO
> continues, and it may also optimise away some seeking and settle
> time.  Maybe.
Sorry to be unclear. To achieve a massive speedup (12X for 12 disks with
RAID 0) requires that you know what reads to perform in advance. The
moment you do not, you only have a starting point, your operations begin
to serialize again. For example, you must scan the first index, to be
able to know what table rows to read. At a minimum, this breaks your
query into: 1) Preload all the index pages you will need, 2) Scan the
index pages you needed, 3) Preload all the table page you will need, 4)
Scan the table pages you needed. But do you really need the whole index?
What if you only need parts of the index, and this plan now reads the
whole index using async I/O "just in case" it is useful? Index is a
B-Tree for a reason. In Matthew's case where he has an IN clause with
thousands of possibles (I think?), perhaps a complete index scan is
always the best case - but that's only one use case, and in my opinion,
an obscure one. As soon as additional table joins become involved, the
chance that whole index scans are required would probably normally
reduce, which turns the index scan into a regular B-Tree scan, which is
difficult to perform async I/O for, as you don't necessarily know which
pages to read next.

It seems like a valuable goal - but throwing imaginary numbers around
does not appeal to me. I am more interested in Gregory's simulations. I
would like to understand his simulation better, and see his results.
Speculation about amazing potential is barely worth the words used to
express it. The real work is in design and implementation. :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

В списке pgsql-performance по дате отправления:

Предыдущее
От: Pallav Kalva
Дата:
Сообщение: Optimizer Not using the Right plan
Следующее
От: Mark Mielke
Дата:
Сообщение: Re: RAID arrays and performance