Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move

Поиск
Список
Период
Сортировка
От Robert Klemme
Тема Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move
Дата
Msg-id BANLkTi=NwvR6jH-gHFyP8fbATp0GnYc7WA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move  (Greg Smith <greg@2ndQuadrant.com>)
Список pgsql-performance


On Thu, Jun 9, 2011 at 7:44 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 06/09/2011 07:43 AM, Willy-Bas Loos wrote:
Well, after reading your article i have been reading some materail about it on the internet, stating that separating indexes from data for performance benefits is a myth.
I found your comment " So then a single query will only ever access one of both at a time." very smart (no sarcasm there).
I also found a thread on AskTom that said mainly "the goal is to achieve even io." (that makes absolute sense)

The idea that separating indexes and tables from one another via a tablespace is inherently good is a myth.  Particularly nowadays, where the fastest part of a drive is nearly twice as fast as the slowest one in sequential transfers, and the ratio between sequential and random I/O is huge.  Trying to get clever about breaking out a tablespace is unlikely to outsmart what you'd get if you just let the OS deal with that stuff.

What is true is that when you have multiple tiers of storage speeds available, allocating the indexes and tables among them optimally is both difficult and potentially worthwhile.  A customer of mine has two drive arrays, one of which is about 50% faster than the other; second was added as expansion once the first filled.  Nowadays, both are 75% full, and I/O on each has to be carefully balanced.  Making sure the heavily hit indexes are on the fast array, and that less critical things are not placed there, is the difference between that site staying up or going down.

The hidden surprise in this problem for most people is the day they discover that *the* most popular indexes, the ones they figured had to go on the fastest storage around, are actually sitting in RAM all the time anyway.  It's always fun and sad at the same time to watch someone spend a fortune on some small expensive storage solution, move their most performance critical data to it, and discover nothing changed.  Some days that works great; others, it's no faster all, because that data was already in memory.

Adding a few more thoughts to this: it is important to understand the very different nature of read and write IO.  While write IO usually can be done concurrently to different IO channels (devices) for read IO there are typically dependencies, e.g. you need to read the index before you know which part of the table you need to read.  Thus both cannot be done concurrently for a single select unless the whole query is partitioned and executed in parallel (Oracle can do that for example).  Even then each parallel executor has this dependency between index and table data.  That's the same situation as with concurrent queries into the same table and index.  There are of course exceptions, e.g. during a sequential table scan you can know beforehand which blocks need to be read next and fetch them wile processing the current block(s).  The buffering strategy also plays an important role here.

Bottom line: one needs to look at each case individually, do the math and ideally also measurements.

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

В списке pgsql-performance по дате отправления:

Предыдущее
От: Marti Raudsepp
Дата:
Сообщение: Re: 100% CPU Utilization when we run queries.
Следующее
От: tv@fuzzy.cz
Дата:
Сообщение: Re: strange query plan with LIMIT