Re: PG 9.3 materialized view VS Views, indexes, shared memory

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: PG 9.3 materialized view VS Views, indexes, shared memory
Дата
Msg-id 54F97414.9040208@BlueTreble.com
обсуждение исходный текст
Ответ на Re: PG 9.3 materialized view VS Views, indexes, shared memory  (Nicolas Paris <niparisco@gmail.com>)
Ответы Re: PG 9.3 materialized view VS Views, indexes, shared memory  (Nicolas Paris <niparisco@gmail.com>)
Список pgsql-performance
On 3/6/15 2:16 AM, Nicolas Paris wrote:
>     If you have that many different views I doubt you want that many
>     indexes anyway.
>
> ​It's a datawarehouse, then each view is used by many user for each query.
> Those views must be subset of the huge material table. All indexes are
> needed

Yes, but they don't have to be partial.

>     ​Have you tried just hitting the base table and indexes directly,
>     either through plain views or just direct SQL?
>
> ​ I have tried each. The performances are worst querying on a subset
> (the views) than querying on whole huge table when using the huge indexes

You mean the materialized views, right? If so, that makes sense: Instead
of having all your users hitting one common set of data (your fact
table) you had them hitting a bunch of other data (the mat views). But
you still had other stuff hitting the fact table. So now you were
dealing with a lot more data than if you just stuck to the single fact
table.

> => this is the solution I am implementing. (800 is not true, but in 10
> years it maybe will be)

In 10 years we'll all be using quantum computers anyway... ;P

> ​ Actually, I have added a boolean column on the huge table for each
> views​. This is the way each view is a subset of huge table (Create View
> as Select  * FROM hugeTable WHERE columnX is true --etc 800 times). Then
> I create 800partials indexes on that column(create index...WHERE columnX
> is TRUE), for each view.
> This works great as the query planer chooses the partials indexes when
> querying the little subset of the terrific table (potential 20bilion rows)
>
> This is better than material views for some reasons :
> - saves places on hard drive (columnX is boolean +same indexes - data
> for MatViews)
> - saves time generating materialised views

But this isn't better than the mat views because of a bunch of booleans;
it's better because it means less stain on the disk cache.

> This is quite more complicated because in the project, the number of
> view is increasing, and dynamic then :
> - then adding new mat views is simple
> - adding new views => adding new column on the huge table. It can take
> long time to update boolean for each tuple. Then I need to truncate/bulk
> load all data each time I add a new View. Other problem is dynamic
> number column table was a bit tricky to implement in an ETL soft such
> Talend, but the benefits are I hope great.

I think you'll ultimately be unhappy trying to go down this route, for
the reasons you mention, plus the very large amount of extra space
you'll be using. 800 booleans is 800 extra bytes for every row in your
fact table. That's a lot. Even if you used a bitmap instead (which means
you have to mess around with tracking which bit means what and probably
other problems as well) you're still looking at 100 bytes per row.
That's nothing to sneeze at.

My suggestion is to test using nothing but plain views and plain indexes
on the base table. I expect that some of those views will not perform
adequately, but many (or most) of them will be fine. For the views that
are too slow, look at what the expensive part of the view and
materialize *only that*. I suspect you'll find that when you do that
you'll discover that several views are slow because of the same thing,
so if you materialize that one thing one time you can then use it to
speed up several views.

Using that approach means you'll have a lot less data that you have to read.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


В списке pgsql-performance по дате отправления:

Предыдущее
От: Nicolas Paris
Дата:
Сообщение: Re: PG 9.3 materialized view VS Views, indexes, shared memory
Следующее
От: Nicolas Paris
Дата:
Сообщение: Re: PG 9.3 materialized view VS Views, indexes, shared memory