Re: record identical operator

Поиск

Список

Период

Сортировка

От	Kevin Grittner
Тема	Re: record identical operator
Дата	24 сентября 2013 г. 12:38:36
Msg-id	1380026307.10770.YahooMailNeo@web162901.mail.bf1.yahoo.com обсуждение исходный текст
Ответ на	Re: record identical operator (Stephen Frost <sfrost@snowman.net>)
Ответы	Re: record identical operator
Список	pgsql-hackers

Дерево обсуждения

Stephen Frost <sfrost@snowman.net> wrote:

> Skipping out on much of the side-discussion to try and drive at
> the heart of this..
>
> Robert Haas (robertmhaas@gmail.com) wrote:
>> I would suggest that you read the referenced papers for details
>> as to how the algorithm works.  To make a long story short, they
>> do need to keep track of what's changed, and how.
>
> I suppose it's good to know that I wasn't completely
> misunderstanding the discussion in Ottawa.
>
>> However, that still seems largely orthogonal to the present
>> discussion.
>
> It *solves* this issue, from where I'm sitting, without any
> binary operators at all.

> [ argument that the only way we should ever do REFRESH is by
> using captured deltas, through incremental maintenance techniques
> ]

That would ensure that a query could not be used to define a
matview until we had implemented incremental maintenance for
queries of that type and complexity.  I expect that to come *close*
to covering all useful queries that way will take five to ten
years, if all goes well.  The approach I'm taking makes all queries
available *now*, with improvements in how many can be maintained
incrementally improving over time.  This is the route every other
database I've looked at has taken (for good reason, I think).

Ultimately, even when we have incremental maintenance supported for
all queries that can be used to define a matview, I think there
will be demand for a REFRESH command that re-runs the base query.
Not only does that fit the workload for some matviews, but consider
this, from the paper I cited[1]:

| Recomputing the view from scratch is too wasteful in most cases.
| Using the heuristic of inertia (only a part of the view changes
| in response to changes in the base relations), it is often
| cheaper to compute only the changes in the view.  We stress that
| the above is only a heuristic.  For example, if an entire base
| relation is deleted, it may be cheaper to recompute a view that
| depends on the deleted relation (if the new view will quickly
| evaluate to an empty relation) than to compute the changes to the
| view.

What we're talking about is a performance heuristic -- not
something more profound than that.  The route I'm taking is to get
it *working correctly now* using the simple technique, and then
embarking on the long journey of optimizing progressively more
cases.

What your argument boils down to IMV is essentially a case of
premature optimization.  You have yet to show any case where the
existing patch does not yield correct results, or show that there
is a better way to get to the point this patch takes us.

> [ later post shows a query that does not produce deterministic
> results ]

Sure, two direct runs of that same query, or two runs through a
regular view, could show different results (considering
synchronized scans, among other things).  I don't see what that
proves.  Of course a refresh of a matview is only going to produce
one of those and then will not produce a different result until it
is refreshed or (once we add incremental maintenance) something
changes in the underlying data.

Nobody ever claimed that a query which does not produce consistent
results would somehow produce them with this patch.  There are
queries using citext, numeric, and other types which *do* provide
consistent results which are consistently produced by a straight
query, a simple view, or a non-concurrent refresh of a materialized
view; this patch will cause a concurrent refresh to produce the
same results as those, rather than something different.  Period.
That's the point, and the whole point.  You have not shown that it
doesn't.  You have not shown why adding a 12th non-default opclass
is a particular problem here (although we have a consensus to use
different operators, to reserve this operator namespace for other
things).  You have not made any case at all for why people should
wait for incremental maintenance to be mature (a project which will
take years) before being able to use materialized views with
concurrent refreshes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] A. Gupta, I. S. Mumick, and V. S. Subrahmanian.  Maintaining
Views Incrementally.  In SIGMOD 1993, pages 157-167.
http://dl.acm.org/citation.cfm?id=170066

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: record identical operator