Обсуждение: Do Views execute underlying query everytime ??

Поиск
Список
Период
Сортировка

Do Views execute underlying query everytime ??

От
Amit V Shah
Дата:
Hi all,

I have like a repository table with is very very huge with atleast a few
hundreds of millions, may be over that. The information is stored in form of
rows in these tables. I need to make that information wide based on some
grouping and display them as columns on the screen.

I am thinking of having a solution where I create views for each screen,
which are just read only.

However, I donot know if the query that creates the view is executed
everytime I select something from the view. Because if that is the case,
then I think my queries will again be slow. But if that is the way views
work, then what would be the point in creating them ..

Any suggestions, helps --

(Please pardon if this question should not be on performance forum)

Thanks,
Amit


Re: Do Views execute underlying query everytime ??

От
Matthew Nuzum
Дата:
On 6/21/05, Amit V Shah <ashah@tagaudit.com> wrote:
> Hi all,
...
> I am thinking of having a solution where I create views for each screen,
> which are just read only.
>
> However, I donot know if the query that creates the view is executed
> everytime I select something from the view. Because if that is the case,
> then I think my queries will again be slow. But if that is the way views
> work, then what would be the point in creating them ..
>
> Any suggestions, helps --

They do get executed every time. I have a similar issue, but my data
does not change very frequently, so instead of using a view, I create
lookup tables to hold the data. So once a day I do something like
this:
drop lookup_table_1;
create table lookup_table_1 as SELECT ...;

In my case, rows are not deleted or updated, so I don't actually do a
"drop table..." I merely add new records to the existing table, but if
your data changes, the drop table technique can be faster than doing a
delete or update.

--
Matthew Nuzum
www.bearfruit.org

Re: Do Views execute underlying query everytime ??

От
Amit V Shah
Дата:
After I sent out this email, I found this article from google

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Looks like we can control as to when the views refresh... I am still kind of
confused, and would appreciate help !!

The create/drop table does sound a solution that can work, but the thing is
I want to get manual intervention out, and besides, my work flow is very
complex so this might not be an option for me :-(

Thanks,
Amit

-----Original Message-----
From: Matthew Nuzum [mailto:mattnuzum@gmail.com]
Sent: Tuesday, June 21, 2005 10:45 AM
To: Amit V Shah
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Do Views execute underlying query everytime ??


On 6/21/05, Amit V Shah <ashah@tagaudit.com> wrote:
> Hi all,
...
> I am thinking of having a solution where I create views for each screen,
> which are just read only.
>
> However, I donot know if the query that creates the view is executed
> everytime I select something from the view. Because if that is the case,
> then I think my queries will again be slow. But if that is the way views
> work, then what would be the point in creating them ..
>
> Any suggestions, helps --

They do get executed every time. I have a similar issue, but my data
does not change very frequently, so instead of using a view, I create
lookup tables to hold the data. So once a day I do something like
this:
drop lookup_table_1;
create table lookup_table_1 as SELECT ...;

In my case, rows are not deleted or updated, so I don't actually do a
"drop table..." I merely add new records to the existing table, but if
your data changes, the drop table technique can be faster than doing a
delete or update.

--
Matthew Nuzum
www.bearfruit.org


Re: Do Views execute underlying query everytime ??

От
Richard Huxton
Дата:
Amit V Shah wrote:
> Hi all,
>
> I have like a repository table with is very very huge with atleast a few
> hundreds of millions, may be over that. The information is stored in form of
> rows in these tables. I need to make that information wide based on some
> grouping and display them as columns on the screen.
>
> I am thinking of having a solution where I create views for each screen,
> which are just read only.
>
> However, I donot know if the query that creates the view is executed
> everytime I select something from the view. Because if that is the case,
> then I think my queries will again be slow. But if that is the way views
> work, then what would be the point in creating them ..

That's exactly how they work. You'd still want them because they let you
simplify access control (user A can only see some rows, user B can see
all rows) or just make your queries simpler.

Sounds like you want what is known as a "materialised view" which is
basically a summary table that is kept up to date by triggers. You query
the table instead of actually recalculating every time. Perhaps google
for "postgresql materialized view" (you might want a "z" or "s" in
materialised).

--
   Richard Huxton
   Archonet Ltd

Re: Do Views execute underlying query everytime ??

От
PFC
Дата:

> However, I donot know if the query that creates the view is executed
> everytime I select something from the view. Because if that is the case,
> then I think my queries will again be slow. But if that is the way views
> work, then what would be the point in creating them ..

    Views are more for when you have a query which keeps coming a zillion
time in your application like :

SELECT p.*, pd.* FROM products p, products_names pd WHERE p.id=pd.id AND
pd.language=...

    You create a view like :

CREATE VIEW products_with_name AS SELECT p.*, pd.* FROM products p,
products_names pd WHERE p.id=pd.id

    And then you :

SELECT * FROM products_with_name WHERE id=... AND language=...

    It saves a lot of headache and typing over and over again the same thing,
and you can tell your ORM library to use them, too.

    But for your application, they're useless, You should create a
"materialized view"... which is just a table and update it from a CRON job.
    You can still use a view to fill your table, and as a way to hold your
query, so the cron job doesn't have to issue real queries, just filling
tables from views :

CREATE VIEW cached_stuff_view AS ...

And once in while :

BEGIN;
DROP TABLE cached_stuff;
CREATE TABLE cached_stuff AS SELECT * FROM cached_stuff_view;
CREATE INDEX ... ON cached_stuff( ... )
COMMIT;
ANALYZE cached_stuff;

Or :
BEGIN;
TRUNCATE cached_stuff;
INSERT INTO cached_stuff SELECT * FROM cached_stuff_view;
COMMIT;
ANALYZE cached_stuff;

If you update your entire table it's faster to just junk it or truncate it
then recreate it, but maybe you'd prefer TRUNCATE which saves you from
having to re-create of indexes... but it'll be faster if you drop the
indexes and re-create them afterwards anyway instead of them being updated
for each row inserted. So I'd say DROP TABLE.












Re: Do Views execute underlying query everytime ??

От
John A Meinel
Дата:
Amit V Shah wrote:

>After I sent out this email, I found this article from google
>
>http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
>
>Looks like we can control as to when the views refresh... I am still kind of
>confused, and would appreciate help !!
>
>The create/drop table does sound a solution that can work, but the thing is
>I want to get manual intervention out, and besides, my work flow is very
>complex so this might not be an option for me :-(
>
>Thanks,
>Amit
>

Just to make it clear, a view is not the same as a materialized view.
A view is just a set of rules to the planner so that it can simplify
interactions with the database. A materialized view is a query which has
been saved into a table.

To set it up properly, really depends on what your needs are.

   1. How much time can elapse between an update to the system, and an
      update to the materialized views?
   2. How many updates / (sec, min, hour, month) do you expect. Is
      insert performance critical, or secondary.

For instance, if you get a lot of updates, but you can have a 1 hour lag
between the time a new row is inserted and the view is updated, you can
just create a cron job that runs every hour to regenerate the
materialized view.

If you don't get many updates, but you need them to show up right away,
then you can add triggers to the affected tables, such that
inserting/updating to a specific table causes an update to the
materialized view.

There are quite a few potential tradeoffs. Rather than doing a
materialized view, you could just improve your filters. If you are doing
a query to show people the results, you generally have some sort of
upper bound on how much data you can display. Humans don't like reading
more than 100 or 1000 rows. So create your normal query, and just take
on a LIMIT 100 at the end. If you structure your query properly, and
have appropriate indexes, you should be able to make the LIMIT count,
and allow you to save a lot of overhead of generating rows that you
don't use.

I would probably start by posting the queries you are currently using,
along with an EXPLAIN ANALYZE, and a description of what you actually
need from the query. Then this list can be quite helpful in
restructuring your query to make it faster.

John
=:->


Вложения

Re: Do Views execute underlying query everytime ??

От
Amit V Shah
Дата:
First of all, thanks to everyone for helping me !

Looks like materialized views will be my answer.

Let me explain my situation a little better.

The repository table looks like this -

create table repository (statName varchar(45), statValue varchar(45),
metaData varchar(45));

MetaData is a foreign key to other tables.

The screens show something like following -

Screen 1 -
Stat1      Stat2        Stat3
Value      Value        Value
Value      Value        Value



Screen 2 -
Stat3      Stat1        Stat5
Value      Value        Value
Value      Value        Value


etc. etc.

The data is grouped based on metaData.

Updates will only occur nightly and can be controlled. But selects occur
9-5.

One of the compelling reasons I feel is that to create such tables out of
repository tables, the query would be very complicated. If I have a
materialized view, I think the information will be "cached".

Another concern I have is load. If I have lot of simultaneous users creating
such "wide tables" out of one "long table", that would generate substantial
load on the servers. ??

I like the materialized view solution better than having other tables for
each screen. (Would be nice if someone can comment on that)

So that is my situation.

Again, thanks everyone for helping
Amit

-----Original Message-----
From: John A Meinel [mailto:john@arbash-meinel.com]
Sent: Tuesday, June 21, 2005 11:01 AM
To: Amit V Shah
Cc: 'newz@bearfruit.org'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Do Views execute underlying query everytime ??


Amit V Shah wrote:

>After I sent out this email, I found this article from google
>
>http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
>
>Looks like we can control as to when the views refresh... I am still kind
of
>confused, and would appreciate help !!
>
>The create/drop table does sound a solution that can work, but the thing is
>I want to get manual intervention out, and besides, my work flow is very
>complex so this might not be an option for me :-(
>
>Thanks,
>Amit
>

Just to make it clear, a view is not the same as a materialized view.
A view is just a set of rules to the planner so that it can simplify
interactions with the database. A materialized view is a query which has
been saved into a table.

To set it up properly, really depends on what your needs are.

   1. How much time can elapse between an update to the system, and an
      update to the materialized views?
   2. How many updates / (sec, min, hour, month) do you expect. Is
      insert performance critical, or secondary.

For instance, if you get a lot of updates, but you can have a 1 hour lag
between the time a new row is inserted and the view is updated, you can
just create a cron job that runs every hour to regenerate the
materialized view.

If you don't get many updates, but you need them to show up right away,
then you can add triggers to the affected tables, such that
inserting/updating to a specific table causes an update to the
materialized view.

There are quite a few potential tradeoffs. Rather than doing a
materialized view, you could just improve your filters. If you are doing
a query to show people the results, you generally have some sort of
upper bound on how much data you can display. Humans don't like reading
more than 100 or 1000 rows. So create your normal query, and just take
on a LIMIT 100 at the end. If you structure your query properly, and
have appropriate indexes, you should be able to make the LIMIT count,
and allow you to save a lot of overhead of generating rows that you
don't use.

I would probably start by posting the queries you are currently using,
along with an EXPLAIN ANALYZE, and a description of what you actually
need from the query. Then this list can be quite helpful in
restructuring your query to make it faster.

John
=:->


Re: Do Views execute underlying query everytime ??

От
PFC
Дата:
 From what you say I understand that you have a huge table like this :

( name, value, id )

And you want to make statistics on (value) according to (name,id).

***************************************************

First of all a "materialized view" doen't exist in postgres, it's just a
word to name "a table automatically updated by triggers".
An example would be like this :

table orders (order_id, ...)
table ordered_products (order_id, product_id, quantity, ...)

If you want to optimize the slow request :
"SELECT product_id, sum(quantity) as total_quantity_ordered
 FROM ordered_products GROUP BY product_id"

You would create a cache table like this :
table ordered_products_cache (product_id, quantity)

And add triggers ON UPDATE/INSERT/DELETE on table ordered_products to
update ordered_products_cache accordingly.

Of course in this case everytime someone touches ordered_products, an
update is issued to ordered_products_cache.

***************************************************

In your case I don't think that is the solution, because you do big
updates. With triggers this would mean issuing one update of your
materialized view per row in your big update. This could be slow.

In this case you might want to update the cache table in one request
rather than doing an awful lot of updates.

So you have two solutions :

1- Junk it all and rebuild it from scratch (this can be faster than it
seems)
2- Put the rows to be added in a temporary table, update the cache table
considering the difference between this temporary table and your big
table, then insert the rows in the big table.

This is the fastest solution but it requires a bit more coding (not THAT
much though).

***************************************************

As for the structure of your cache table, you want :


Screen 1 -
Stat1      Stat2        Stat3
Value      Value        Value
Value      Value        Value



Screen 2 -
Stat3      Stat1        Stat5
Value      Value        Value
Value      Value        Value

You have several lines, so what is that ? is it grouped by date ? I'll
presume it is.

So your screens basically show a subset of :

SELECT date, name, sum(value) FROM table GROUP BY name, date

This is what you should put in your summary table.
Then index it on (date,name) and build your screens with :

SELECT * FROM summary WHERE (date BETWEEN .. AND ..) AND name IN (Stat3,
Stat1,  Stat5)

That should be pretty easy ; you get a list of (name,date,value) that you
just have to format accordingly on your screen.
























Re: Do Views execute underlying query everytime ??

От
Matthew Nuzum
Дата:
On 6/21/05, PFC <lists@boutiquenumerique.com> wrote:
...
> In your case I don't think that is the solution, because you do big
> updates. With triggers this would mean issuing one update of your
> materialized view per row in your big update. This could be slow.
>
> In this case you might want to update the cache table in one request
> rather than doing an awful lot of updates.
>
> So you have two solutions :
>
> 1- Junk it all and rebuild it from scratch (this can be faster than it
> seems)
> 2- Put the rows to be added in a temporary table, update the cache table
> considering the difference between this temporary table and your big
> table, then insert the rows in the big table.
>
> This is the fastest solution but it requires a bit more coding (not THAT
> much though).
>
Amit,

I understand your desire to not need any manual intervention...

I don't know what OS you use, but here are two practical techniques
you can use to achieve the above solution suggested by PFC:

a: If you are on a Unix like OS such as Linux of Free BSD you have the
beautiful cron program that will run commands nightly.

b: If you are on Windows you have to do something else. The simplest
solution I've found is called "pycron" (easily locatable by google)
and is a service that emulates Unix cron on windows (bypassing a lot
of the windows scheduler hassle).

Now, using either of those solutions, let's say at 6:00 am you want to
do your batch query.

1. Put the queries you want into a text file EXACTLY as you would type
them using psql and save the text file. For example, the file may be
named "create_mat_view.txt".
2. Test them by doing this from a command prompt: psql dbname <
create_mat_view.txt
3. Create a cron entry to run the command once a day, it might look like this:
0 6 * * * /usr/bin/psql dbname < /home/admin/create_mat_view.txt
or maybe like this:
0 6 * * * "C:\Program Files\PostgreSQL\8.0\psql.exe" dbname <
"C:\create_mat_view.txt"

I hope this helps,
--
Matthew Nuzum
www.bearfruit.org