Re: record identical operator

Поиск

Список

Период

Сортировка

От	Kevin Grittner
Тема	Re: record identical operator
Дата	23 сентября 2013 г. 18:44:36
Msg-id	1379961865.87839.YahooMailNeo@web162901.mail.bf1.yahoo.com обсуждение исходный текст
Ответ на	Re: record identical operator (Stephen Frost <sfrost@snowman.net>)
Ответы	Re: record identical operator
Список	pgsql-hackers

Дерево обсуждения

Stephen Frost <sfrost@snowman.net> wrote:
> Robert Haas (robertmhaas@gmail.com) wrote:

>> Now, the next project Kevin's going to work on, and that he was
>> working on when he discovered this problem, is incremental
>> maintenance: that is, allowing us to update the view *without*
>> needing to rerun the entire query.  This record comparison
>> operator will be just as important in that context.
>
> You state this but I don't see where you justify this claim..

Unless we can tell whether there are any differences between two
versions of a row, we can't accurately generate the delta to drive
the incremental maintenance.  The initial thread discussing how
incremental maintenance would be done is here:

http://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com#1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com

The thread on the initial patch for REFRESH MATERIALIZED VIEW
CONCURRENTLY started with:

| Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY
| for 9.4 CF1.  The goal of this patch is to allow a refresh
| without interfering with concurrent reads, using transactional
| semantics.
|
| It is my hope to get this committed during this CF to allow me to
| focus on incremental maintenance for the rest of the release cycle.

There was much discussion, testing, and revision then, which is
here:

http://www.postgresql.org/message-id/flat/1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com#1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com

I think pretty much every concern raised was addressed except for a
lingering doubt expressed by Noah over whether IS NOT DISTINCT FROM
semantics were really the right basis for matching rows.  Based on
that feedback, I spent a lot of time looking at why that might or
might not be correct, and decided that I had been wrong to base the
behavior on that, for the reasons Robert expressed so clearly a
couple messages back.  Hence this patch.

I had made the mistake of using an operator which used a
column-by-column comparison based on the equality operator for the
default opclass for comparing two values of each respective column
type.  I had been challenged on that in the review process, and was
responding to it with the fix contained in this patch.

The first post on this thread starts with:

| Attached is a patch for a bit of infrastructure I believe to be
| necessary for correct behavior of REFRESH MATERIALIZED VIEW
| CONCURRENTLY as well as incremental maintenance of matviews.

I think it is fairly obvious that REFRESH should REgenerate a FRESH
copy of the data, versus incremental maintenance -- which attempts
to keep the matview up-to-date without regenerating the full set of
data.  Whenever there is logical replication (and materialized
views are, conceptually, one form of that -- within the database) I
feel it is important to be able to correct any possible "drift".
With matviews, I see the way to do that as the REFRESH command, and
I feel that it is important to be able to do that in a way that can
run concurrently with readers of the matview -- without blocking
them or being blocked by them.

Discussion of incremental maintenance really belongs on a different
thread.  Since I have gone to the trouble to read a lot of papers
on the topic, and select one that I think is a good basis for our
implementation, I hope everyone will frame discussion in terms of
either:
  -  how best to implement the techniques from that paper, or
  -  why some other paper presents a better technique.

I think it would be madness to approach implementing incremental
maintenance based on an ad hoc set of thoughts rather than a
peer-reviewed paper.  I know how tempting it is it start from zero
and think, "if we just do X we can cover this sort of query."  I
had a few thoughts like that before I read all those papers and
discovered that there were many subtle issues to cover.  We will
have plenty of time to get creative with alternatives when we get
past the query types specifically addressed in the paper and begin
to move into, for example, CTEs and window functions.  I certainly
expect robust discussion around those areas, or even some of the
infrastructure for capturing deltas and triggering the incremental
maintenance.  I really didn't expect to have to burn so much time
and energy arguing over whether a REFRESH should leave the matview
accurately containing the results of the matview's query.

>> We can argue about how it should be named

After looking at the existing suggestions and thinking about it I'm
leaning toward these operators (based on a star in front of the
usual default comparison operators):

  *=  *<>  *>  *>=  *<  *<=

>> and whether it should be documented

I thought we had a consensus to document both the existing record
comparison operators and these new ones, and I'm fine with that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: record identical operator