Re: record identical operator

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: record identical operator
Дата
Msg-id 20130924143921.GS2706@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: record identical operator  (Kevin Grittner <kgrittn@ymail.com>)
Ответы Re: record identical operator  (Kevin Grittner <kgrittn@ymail.com>)
Список pgsql-hackers
* Kevin Grittner (kgrittn@ymail.com) wrote:
> Stephen Frost <sfrost@snowman.net> wrote:
> > I worry that adding these will come back to bite us later
>
> How?

User misuse is certainly one consideration, but I wonder what's going to
happen if we change our internal representation of data (eg: numerics
get changed again), or when incremental matview maintenance happens and
we start looking at subsets of rows instead of the entire query.  Will
the first update of a matview after a change to numeric's internal data
structure cause the entire thing to be rewritten?

> > and that we're making promises we won't be able to keep.
>
> The promise that a concurrent refresh will produce the same set of
> rows as a non-concurrent one?

The promise that we'll always return the binary representation of the
data that we saw last.  When greatest(x,y) comes back 'false' for a
MAX(), we then have to go check "well, does the type consider them
equal?", because, if the type considers them equal, we then have to
decide if we should replace x with y anyway, because it's different
at a binary level.  That's what we're saying we'll always do now.

We're also saying that we'll replace things based on plan differences
rather than based on if the rows underneath actually changed at all.
We could end up with material differences in the result of matviews
updated through incremental REFRESH and matviews updated through
actual incremental mainteance- and people may *care* about those
because we've told them (or they discover) they can depend on these
types of changes to be reflected in the result.

> > Trying to do this incremental-but-not-really maintenance where
> > the whole query is run but we try to skimp on what's actually
> > getting updated in the matview is a premature optimization, imv,
> > and one which may be less performant and more painful, with more
> > gotchas and challenges for our users, to deal with in the long
> > run.
>
> I have the evidence of a ten-fold performance improvement plus
> minimized WAL and replication work on my side.  What evidence do
> you have to back your assertions?  (Don't forget to work in bloat
> and vacuum truncation issues to the costs of your proposal.)

I don't doubt that there are cases in both directions and I'm not trying
to argue that it'd always be faster, but I doubt it's always slower.
I'm surprised that you had a case where the query was apparently quite
fast yet the data set hardly changed and resulted in a very large result
but I don't doubt that it happened.  What I was trying to get at is
really that the delete/insert approach would be good enough in very many
cases and it wouldn't have what look, to me anyway, as some pretty ugly
warts around these cases.
Thanks,        Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Следующее
От: Robert Haas
Дата:
Сообщение: Re: ENABLE/DISABLE CONSTRAINT NAME