Re: Large tables (was: RAID 0 not as fast as

Поиск
Список
Период
Сортировка
От Bucky Jordan
Тема Re: Large tables (was: RAID 0 not as fast as
Дата
Msg-id 78ED28FACE63744386D68D8A9D1CF5D4209A74@MAIL.corp.lumeta.com
обсуждение исходный текст
Ответ на Re: Large tables (was: RAID 0 not as fast as  (Markus Schaber <schabi@logix-tt.com>)
Ответы Re: Large tables (was: RAID 0 not as fast as  (Markus Schaber <schabi@logix-tt.com>)
Re: Large tables (was: RAID 0 not as fast as  ("Luke Lonergan" <llonergan@greenplum.com>)
Список pgsql-performance
Markus,

First, thanks- your email was very enlightining. But, it does bring up a
few additional questions, so thanks for your patience also- I've listed
them below.

> It applies per active backend. When connecting, the Postmaster forks a
> new backend process. Each backend process has its own scanner and
> executor. The main postmaster is only for coordination (forking,
config
> reload etc.), all the work is done in the forked per-connection
backends.

Each postgres process also uses shared memory (aka the buffer cache) so
as to not fetch data that another process has already requested,
correct?

> Our discussion is about some different type of application, where you
> have a single application issuing a single query at a time dealing
with
> a large amount (several gigs up to teras) of data.
Commonly these are referred to as OLAP applications, correct? Which is
where I believe my application is more focused (it may be handling some
transactions in the future, but at the moment, it follows the "load lots
of data, then analyze it" pattern).

> The discussed problem arises when such large queries generate random
> (non-continous) disk access (e. G. index scans). Here, the underlying
> RAID cannot effectively prefetch data as it does not know what the
> application will need next. This effectively limits the speed to that
of
> a single disk, regardless of the details of the underlying RAID, as it
> can only process a request at a time, and has to wait for the
> application for the next one.
Does this have anything to do with postgres indexes not storing data, as
some previous posts to this list have mentioned? (In otherwords, having
the index in memory doesn't help? Or are we talking about indexes that
are too large to fit in RAM?)

So this issue would be only on a per query basis? Could it be alleviated
somewhat if I ran multiple smaller queries? For example, I want to
calculate a summary table on 500m records- fire off 5 queries that count
100m records each and update the summary table, leaving MVCC to handle
update contention?

Actually, now that I think about it- that would only work if the
sections I mentioned above were on different disks right? So I would
actually have to do table partitioning with tablespaces on different
spindles to get that to be beneficial? (which is basically not feasible
with RAID, since I don't get to pick what disks the data goes on...)

Are there any other workarounds for current postgres?

Thanks again,

Bucky

В списке pgsql-performance по дате отправления:

Предыдущее
От: Markus Schaber
Дата:
Сообщение: Re: Large tables (was: RAID 0 not as fast as
Следующее
От: Markus Schaber
Дата:
Сообщение: Re: Large tables (was: RAID 0 not as fast as