Re: Materialized views in Oracle

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Materialized views in Oracle
Дата
Msg-id 4E7AD6FB.8090908@ringerc.id.au
обсуждение исходный текст
Ответ на Re: Materialized views in Oracle  (Mike Christensen <mike@kitchenpc.com>)
Ответы Re: Materialized views in Oracle
Re: Materialized views in Oracle
Список pgsql-general
On 09/22/2011 01:34 PM, Mike Christensen wrote:

[lots]

I'm going to heavily paraphrase a summary, then a summarized response.
If you disagree with the way I interpret and condense the full message,
please feel free to grump. Roughly, I read your comments as:

- You're focused on a single use-case that you're having problems with,
rather than a more general issue with mat. views. [Nonetheless, this
experience highlights some issues that probably apply to all Oracle
materialized views].

- You want materialized views to support selective updating ("fast
refresh") for any query, and they don't. You appear to find the
limitations on what queries are and are not supported frustrating. The
inability to recognise the equivalance of "a INNER JOIN b ON a.id=b.id"
and "a,b WHERE a.id=b.id" is particularly annoying, as is the inability
to do a selective update on a materialized view involving  an outer join
or a subquery.

- The frustration created by these limitations is caused more by the
awful error messages and the limited documentation, rather than from the
limitations themselves. It sounds like you wouldn't mind so much if you
knew what you could and couldn't do with materialized views in advance
or at least got descriptive error messages.

- You don't see materialized views without selective updating ("fast
refresh") as useful. [I disagree, though I can see how it wouldn't be
very useful for the particular use case you're facing.]



My viewpoint here is:


- Your use case isn't exactly a poster-child for materialized views in
the first place, so there's a degree of round-peg vs square-hole going
on. It'd be really nice if a materialized view could solve your problem,
but it'd be really nice if I could cycle at 100 km/h, too. In the real
world, everything has trade-offs and limitations.

- I don't see the limitations on which queries can be used with FAST
REFRESH materialized views as unreasonable, but cannot believe how bad
the error messages about them are.

- The error messages are horrible, and it sounds like the documentation
could certainly use some improvement. That seems consistent with Oracle
in general, and not unique to materialized views.

- Personally, I'm very impressed that FAST REFRESH materialized views
work at all! That would be a HARD feature to implement in a general way
and make perform even half-decently. Oracle deserve some credit for
this. They've managed to make things like a journal-to-ledger
materialized view where only a given customer's ledger entries are
updated when a journal entry is added possible - without any custom coding.

- Everything seems to be very manual, with nothing done automagically
behind the scenes. That seems to fit the general Oracle philosophy of
how things should work. I don't see any sign that it's significantly
different for materialized views than it is for the rest of Oracle.

- The inability to transform join syntax is pretty lame. I blame the SQL
standard for having so many stupid, arbitrary ways to write the same
thing, and having a language so incredibly painful to work with that
producing an easily transformed and analysed representation of a query
is nearly impossible. (eg: Seriously, who came up with the window
function syntax, and what were they smoking?).

- A materialized view that requires a full query re-execution on every
commit is IMO certainly still very useful. If the source table(s) are
not updated particularly often and the view query is very expensive to
run, but the view is queried extremely frequently, then this will still
be a huge win.



[inline response to long version]:

> The data, as you've probably imagined, is highly
> un-normalized and the queries are pretty crazy, so I got the idea to
> make a few strategic DB changes as well as stick this report in a
> materialized view.  That way, we could just query from that, throw in
> whatever filters we need and poof, instant report in like half a
> second.  If done right.

Yeah, there's your problem. That's a pretty bad use case for a
materialized view IMO. That's never going to be anything except painful
and slow.

Unless you need this data extremely frequently as part of other queries
that're made much cheaper by the use of the materialized view, it's
going to be a net loss. You're better off just cleaning up the report as
much as you can and running it on demand or on a schedule.

> Oracle has a few really cool features around materialized views.
> There's a REFRESH ON COMMIT option that automatically updates the
> materialized view any time the underlying data is changed (otherwise
> you have to manually update it, or setup an update schedule for it).
> There's also a FAST REFRESH option that will only update the data
> that's changed, rather than run the entire query and re-build every
> row (Doing a full refresh on every commit would be crazy expensive, so
> no)..  In my opinion, doing a FAST REFRESH ON COMMIT seems to be the
> only interesting feature - otherwise, there's not really a point -
> just make a normal table and re-build it once a day.

There's a lot to be said for a regular refresh on commit, too. Sometimes
you need that data to always be consistent, and if the data it's drawn
from is updated once an hour but the view is queried 10 times a second,
the cost of the update will be well worth it.

> Plus, they demand these reports to be real-time data anyway.

Thar's yer problem, maytee. Executives.

How real-time is real-time? Do they want it "as soon as I press the
button, with no delay?" Do they actually need it up-to-date as of five
seconds before it was requested? Or just reasonably fresh and with no
waiting around to get it?

Maybe a bit of work refining their requirements might help. Being
executives at a telco, there's a fair chance they'll demand that it not
only be instant and perfectly up to date, but real-time linked into
their presentation so it changes (with pretty animations) as they're
giving their talk ... but you never know, you might get a sane one.
There must be one or two somewhere in the phone industry.

> First, apparently there can be no JOINS in your query.  If you create
> a view with:
>
> SELECT A.Foo, B.Foo FROM A INNER JOIN B ON A.ID=B.ID
>
> You'll get the cryptic error:
>
> ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view
>
> Now, you can do the SAME exact query without the ANSI joins:
>
> SELECT A.Foo, B.Foo FROM A, B WHERE A.ID=B.ID
>
> Now all of a sudden it works.  Yes, that makes a lot of sense, and
> thanks for the helpful error message too.

Yeah, it's a bit surprising that they can't recognise those as
equivalent, but it's hardly the end of the world. The error message is a
bit nasty though.

> Second: You have to create these materialized view logs for all the
> underlying tables if you want FAST REFRESH.  This allows Oracle to log
> the exact data that changed so it can figure out which parts of your
> view table to update.  Why can't Oracle just log its own data when
> there's fast refreshing views dependent on it?  You not only have to
> create these logs, but you have to create them with special ROWID
> parameters otherwise you get more cryptic unhelpful Oracle errors like
> the one above.

That just seems to be the Oracle philosophy of making everything an
explicit admin action. The cynic in me wonders if this is a scheme to
keep the armies of Oracle admins doing endless training course
refreshes. In truth I imagine it's more about giving admins control and
allowing them to avoid unintended side-effects like adding a whole lot
of logging overhead for a hot table they didn't realize would be
referenced by a materialized view that goes through several intermediate
views.

> Next: If you have a SUBSELECT in your query, you're screwed:
>
> SELECT
>    A,
>    B,
>    (SELECT MIN(DATE) FROM MyDates WHERE ID=TABLE1.ID) as EarliestDate
> FROM TABLE1
>
> If you do this, you'll get:
>
> ORA-22818: subquery expressions not allowed here

Fair enough. Remember, they have to figure out how to do this selective
updating magic. To do that, they need to be able to prove how an update
to a given table will affect a subset of rows in the materialized view,
and need to be able to re-calculate all those rows.

How would *you* implement that when any ol' subquery is allowed?

Writing rules and proof engines for this sort of stuff is *hard*. The
more restrictions you apply, the easier it gets. It's also often best to
start out with something limited but working, then progressively enhance
it by removing limitations.

> You have to re-write this query in a weird LEFT JOIN instead.  Once
> you do that, you'll also break FAST REFRESH with the cryptic error
> message from above.

Again, that's no great surprise. A LEFT OUTER JOIN isn't going to be the
easiest thing to do set proofs about.

> Basically, FAST REFRESH and ON COMMIT are extremely limited in what
> you can do with them; any query more complicated than a basic SELECT *
> FROM FOO seems to be asking for trouble.

Personally I'm kind of impressed by what FAST REFRESH *can* do. That
sort of thing is hard to do, in the mathematical sense of "hard".

> We first have to create materialized view logs for both these guys:
>
> CREATE MATERIALIZED VIEW LOG ON foo WITH ROWID;
> CREATE MATERIALIZED VIEW LOG ON bar WITH ROWID;

Yep, so you're making the costs of the view explicit. Fair enough to me,
though I suspect PostgreSQL's approach would be a 'NOTICE' message
informing you the log was being created - if it needed a log-based
approach, that is.

> CREATE MATERIALIZED VIEW foo_bar
>    NOLOGGING
>    CACHE
>    BUILD IMMEDIATE
>    REFRESH FAST ON COMMIT  AS SELECT foo.foo,
>                                      bar.bar,
>                                      foo.ROWID AS foo_rowid,
>                                      bar.ROWID AS bar_rowid
>                                 FROM foo, bar
>                                WHERE foo.foo = bar.foo;
>
> Notice a lot of things: No ANSI JOINs (if you use them, cryptic
> error)

Yup, that one's weird, but then Oracle has always had funny ideas about
how joins should be written.

> we have to explicitly say what columns we want

Well, you're always doing that anyway. Right? RIGHT!?!

Seriously, "*" is the worst damn feature in SQL. Its use outside
interactive testing should be punishable by cattle prod.

> we have to return the ROWID columns of every unique table.

I'm not at all surprised that they need that data to support fast
refresh, but I'll grant it is a bit weird that you have to explicitly
write them in the query result set. I'd expect the DB to transparently
add them behind the scenes if they weren't specified.

> If we don't do any of
> those things, we get "Cannot set the ON COMMIT refresh attribute."

Yup, it's the error messages that kill me. My attempts to try out Oracle
freebie editions were quickly crushed by the horrible interface of their
interactive SQL tool and the absolutely miserable error messages the
database produces.

> An Oracle expert can probably get into details about why all of these
> things are absolutely vital to the database engine, but discovering
> these things took over 4 hours of my time today.

Again, it sounds like the real issue is bad documentation and bad error
messages. You're *ALWAYS* going to have limitations on what a
selectively updated materialized view can do; it's just the way you're
informed of them that sucks.

[snip]

> Warnings: --->
>     W (1): Warning: execution completed with warning
>            <---
>
> That's right - It told me there was a warning, but wasn't actually
> nice enough to tell me what the warning is.  There is a "SHOW ERRORS"
> command, but it apparently requires SQLPlus and don't get me started
> on that thing.  Instead, you have to query a special error table.

OK, and here we have "oracle has a miserable UI from hell" issues
that're well known, but nothing to do with mat views.

> Since TRAININGPLAN is in a
> state of flux until the transaction is committed, it's apparently
> impossible to select those rows and get the minimum date.  I don't
> know if Postgres has this problem, but it seems to me the engine
> should be able to tell me what the minimum date will be since the
> trigger is running AFTER the rows are updated, and then commit
> everything at once.

Correct. An AFTER trigger in PostgreSQL will see the transaction's local
view of the uncommitted data. If the transaction rolls back, the effects
of the trigger will too so there's no risk of a dirty read leaking out
and becoming visible outside the transaction. If the transaction
commits, the data is valid.

This can get complicated when you have triggers acting recursively on a
table and it isn't always that easy to understand exactly what a trigger
will see.

--
Craig Ringer

В списке pgsql-general по дате отправления:

Предыдущее
От: Mike Christensen
Дата:
Сообщение: Re: Materialized views in Oracle
Следующее
От: Thomas Kellerer
Дата:
Сообщение: Re: Materialized views in Oracle