Re: matview niceties: pick any two of these three

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: matview niceties: pick any two of these three
Дата
Msg-id 1367611294.85081.YahooMailNeo@web162902.mail.bf1.yahoo.com
обсуждение исходный текст
Ответ на Re: matview niceties: pick any two of these three  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: matview niceties: pick any two of these three
Список pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The current matview design gets around this problem by
>>> requiring that transition between scannable and unscannable
>>> states involve a complete table rewrite, and thus the
>>> transactionality issue can be hidden behind a transactional
>>> update of the matview's pg_class.relfilenode field.  IMO, that
>>> is obviously a dead-end design, because we are going to want
>>> scannability status updates associated with partial updates of
>>> the matview's contents.
>
>> I don't think the change between populated and non-populated
>> needs to change without a new heap being swapped in.
>
> I think you are using the distinction between "populated" and
> "scannable" to dodge the issue.  My point is that whenever we add
> some better control over whether matviews are considered
> too-stale-to-use, it is going to be important that the staleness
> state can be changed, in either direction, without a full rewrite
> of the matview's contents.  Whether you call that "populated", or
> "scannable", or "stale", or "warm fuzzy" doesn't matter to me.
> If we can't do that then the feature is not actually going to be
> of any great use.

I suspect that we will never support scanning of a non-populated
matview, but that other considerations for what is scannable will
be added.  That's not a word game, it's an important semantic
difference.

> And I want such updates to be transactional, because they'll very
> likely be connected to transactional updates of the matview's
> contents.

Yep, that is definitely my intent.

> So I continue to maintain that the current design is a dead end.

Not if you don't muddle multiple concepts as one thing.

>> We'll get into all kinds of design trouble if we conflate these
>> two separate concepts.
>
> Perhaps I still don't understand what useful distinction is
> there.  If there is a critical distinction, why is the current
> patch exposing only one state value to users?

We don't have any other criteria for what is scannable implemented
yet in this release, but perhaps you're right that we should have a
pg_relation_is_populated() function available to user in addition
to the pg_relation_is_scannable() function.  For the moment, the
latter would just return the result of calling the former.  I
hadn't thought the former was useful at the SQL level at this
point, but thinking about it now, I wonder whether "populated"
wouldn't be the more appropriate test for pg_dump to use.  I'm not
sure just yet, but it bears some thinking.  More on that tomorrow
after I sleep on it.

>> I could probably add a hundred other desirable properties for a
>> materialized view implementation; I was trying to list the ones
>> which were still up-for-grabs for the initial 9.3
>> implementation.  If we want to make progress on catching up to
>> other major databases on this, it will take progress in many
>> releases; and every one of those will lack something that
>> someone wants.  OK, a lot of things that a lot of people want.
>> But if we can't commit something that is less than complete or
>> less than perfect to make incremental change, we're not going to
>> be getting anywhere very fast.
>
> Yes, exactly.  I am not sure why you are so desperate to have
> unlogged matviews in 9.3 that you are willing to risk finding
> yourself boxed in and unable to make future improvements that
> everybody agrees are useful.  ISTM that plain matviews are a
> perfectly good first-cut feature and we can add unlogged ones
> whenever we can think of a better way to do it.

I think matviews are a useful feature without an unlogged option;
but I think they will be significantly more useful with the option.
 I agree it's a judgment call whether the hacks needed to support
it for now are worth the benefit of the option.  Based on my
reading of the user needs and the code, I think it is worth the
trade-off, but I can understand arguments to the contrary.  The
lack of specifics on where problems would be has left me
unconvinced.  The most specific arguments so far have been with the
"extend from zero page length during incremental maintenance"
(which I hope I have made clear why I don't see a problem there,
but maybe I need to say more...), and the argument that we might
want to stuff metadata at the front of the heap to track this,
which is out of the question for 9.3 and would require the same
response later whether we use the zero-length hack for now or not.

>> If I thought that the current hack for tracking the populated
>> state would be permanent, or that it would be so hard to replace
>> in a later release that it would impede further development, I
>> would be the first one to rip it out.  I haven't been able to
>> see such a risk, and nobody has adequately explained where they
>> think the risk lies.
>
> You can't update that state transactionally, and you can't update
> it without a complete rewrite.

Right, the *populated* state, indicating whether the query has been
run to initially fill the matview with data, blocks other access.

> The only solutions you've offered to that are devices Rube
> Goldberg would be proud of (not that the code wouldn't fill his
> heart with joy as it is).  This is not a good place to be for
> a first-cut implementation, at least not if we don't want to find
> ourselves needing to make a representation change that pg_upgrade
> can't readily cope with.

I've already explained why I don't think pg_upgrade is an issue,
short of the metadata in the first heap page change which was
suggested, but I'll give it another go.  pg_dump uses CREATE
MATERIALIZED VIEW ... WITH NO DATA in the same phase as views are
created, because each can reference the other.  If we come up with
a new mechanism for indicating non-populated matviews in a later
release, the server from such a release which is digesting those
commands will use its new mechanism.  pg_upgrade need not know or
care what the server is doing, or even that these commands are in
the schema dump.  So no problem there.

For the pg_dump upgrade path, the REFRESH commands which may follow
for some or all matviews will likewise do the right thing for the
version of the server processing them.  Still no problem.

At the point where we move information about whether a matview is
populate to the catalogs, we will probably, like at least one other
database product, add an ALTER MATERIALIZED VIEW option to flag a
matview as scannable, and the pg_dump of that new release could be
modified to emit those where appropriate in the schema dump phase
when the option used by pg_upgrade is set.  That's some coding, but
hardly a big deal.

So where is the Rube Goldberg solution in that?

> The long and the short of it is this: having unlogged matviews in
> 9.3 is not worth taking that risk for.  IMO anyway.

We differ on that, but maybe you see a risk I'm missing, so I'm
trying to understand what you see.

>> I will certainly go along with the consensus, but from my
>> perspective, living with the current hack for determining
>> whether a matview has been populated is worth the benefit of
>> having unlogged matviews in 9.3.  I don't expect that hack to be
>> viable for more that one or two releases; I just think that it's
>> a practical matter of working with the tools we have *now* for
>> unlogged relations to try to create a new type of unlogged
>> relation.
>
> If you yourself admit that this hack has got a very short life
> expectancy, why are you so willing to buy into the assumption
> that you can get rid of it readily?  I do not grant that
> assumption, and I'm not sure how come you think that hacking
> pg_upgrade to the point of being able to get rid of it will be
> zero-cost, or even acceptable to the community at all.

I have not heard a single suggestion of why pg_upgrade would need a
single line changed.  I'm believe that pg_dump will need some ALTER
statements added when we add the catalog-based flagging of this,
but that's true whether or not we use this technique in the
meantime.

The reason I think it can be readily abandoned is that the heap
remains a valid heap according to the historical rules, and the
rules which by all appearances would follow.  So it is a question
of whether the populated state can be transferred from a version
using this technique to a version which is not with little or no
pain (which I've explained above seems quite feasible to me) or
whether we have to go so far as to tell people to REFRESH their
unlogged materialized views in the release notes for the new
release.  Unlikely as that seems ot me, I think people would accept
that.

What do you see that I'm missing?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: matview niceties: pick any two of these three
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: corrupt pages detected by enabling checksums