Re: Remaining beta blockers

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Remaining beta blockers
Дата
Msg-id 1367332385.33261.YahooMailNeo@web162905.mail.bf1.yahoo.com
обсуждение исходный текст
Ответ на Re: Remaining beta blockers  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Remaining beta blockers  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-04-30 04:29:26 -0700, Kevin Grittner wrote:

>> Let's look at the "corner" this supposedly paints us into.  If a
>> later major release creates a better mechanism, current pg_dump and
>> load will already use it, based on the way matviews are created
>> empty and REFRESHed by pg_dump.  Worst case, we need to modify the
>> behavior of pg_dump running with the switch used by pg_upgrade to
>> use a new ALTER MATERIALIZED VIEW SET (populated); (or whatever
>> syntax is chosen) -- a command we would probably want at that point
>> anyway.  I'm not seeing cause for panic here.
>
> I don't think panic is appropriate either, but I think there are some
> valid concerns around this.
>
> 1) vacuum can truncate the table to an empty relation already if there is
>   no data returned by the view's query which then leads to the wrong
>   scannability state.
>
>   S1: CREATE MATERIALIZED VIEW matview_empty AS SELECT false WHERE random() < -1;
>   S2: S2: SELECT * FROM matview_empty; -- works
>   S1: VACUUM matview_empty;
>   S2: SELECT * FROM matview_empty; -- still works
>   S3: SELECT * FROM matview_empty; -- errors out
>
>   So we need to make vacuum skip cleaning out the last page. Once we
>   get incrementally updated matviews there are more situations to get
>   into this than just a query not returning anything.
>   I remember this being discussed somewhere already, but couldn't find
>   it quickly enough.

Yeah, I posted a short patch earlier on this thread that fixes
that.  Nobody has commented on it, and so far I have not committed
anything related to this without posting details and giving ample
opportunity for anyone to comment.  If nobody objects, I can push
that, and this issue is gone.

>   Imo this is quite an indicator that the idea of using the filesize is
>   just plain wrong. Adding logic to vacuum not to truncate data because
>   its a flag for matview scannability is quite the layer violation and
>   a sign for a bad design decision. Such a hack has already been added
>   to copy_heap_data(), while not as bad, shows that it is hard to find
>   all the places where we need to add it.

I agree, but if you review the thread I started with a flag in
pg_class, I tried using the "special" area of the first page of the
heap, and finally wound up with the current technique based on
several people reviewing the patch and offering opinions and
reaching an apparent consensus that this was the least evil way to
do it given current infrastructure for unlogged tables.  This
really is a result of trying to preserver *correct* behavior while
catering to those emphasizing how important the performance of
unlogged matviews is.

> 2) Since we don't have a metapage to represent scannability in 9.3 we
>   cannot easily use one in 9.4+ without pg_upgrade emptying all
>   matviews unless we only rely on the catalogs which we currently
>   cannot.

I am not following this argument at all.  Doesn't pg_upgrade use
pg_dump to create the tables and matviews WITH NO DATA and take
later action for ones which are populated in the source?  It would
not be hard at all to move to a new release which used a different
technique for tracking populated tables by changing what pg_dump
does for populated tables with the switch pg_upgrade uses.

>   This will either slow down development or make users
>   unhappy. Alternatively we can add yet another fork, but that has its
>   price (quite a bit more open files during normal operation, slower
>   CREATE DATABASE).
>
>   This is actually an argument for not releasing matviews without such
>   an external state. Going from disk-based state to catalog is easy,
>   the other way round: not so much.

I am not seeing this at all.  Given my comment above about how this
works for pg_upgrade and pg_dump, can you explain how this is a
problem?

> 3) Using the filesize as a flag will make other stuff like preallocating
>   on-disk data in bigger chunks and related things more complicated.

Why?  You don't need to preallocate for a non-populated matview,
since its heap will be replaced on REFRESH.  You would not *want*
to preallocate a lot of space for something guaranteed to remain at
zero length until deleted.

> 4) doing the check for scannability in the executor imo is a rather bad
>   idea. Note that we e.g. don't see an error about a matview which
>   won't be scanned because of constraint exclusion or one-time
>   filters.
>
>   S1: CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false
> WITH NO DATA;
>   S1: SELECT * FROM matview_unit_false;
>
> You can get there in far more reasonable cases than WHERE false.

I am a little concerned about that, but the case you show doesn't demonstrate a problem:

test=# CREATE MATERIALIZED VIEW matview_unit_false AS SELECT false WHERE false WITH NO DATA;
SELECT 0
test=# SELECT * FROM matview_unit_false;
ERROR:  materialized view "matview_unit_false" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

The view was defined WITH NO DATA and is behaving correctly for
that case.  When populated (with zero rows) it behaves correctly:

test=# REFRESH materialized view matview_unit_false ;
REFRESH MATERIALIZED VIEW
test=# SELECT * FROM matview_unit_false;
 bool
------
(0 rows)

I would be interested to see whether anyone can come up with an
actual mis-behavior related to this either before or after the
recent refactoring.

> 5) I have to agree with Kevin that the scannability is an important thing
>   to track though.
>
>   a) we cannot remove support for WITH NO DATA because of pg_dump order
>       and unlogged relations. So even without unlogged relations the
>       problem exists although its easier to represent.
>   b) Just giving wrong responses is bad [tm]. Outdated data is something
>       completely different (it has existed in that state in the (near)
>       past) than giving an empty response (might never have been a visible
>       state, and more likely not so in any reasonably near
>       past). Consider an application behind a pooler suddently getting
>       an empty response from a SELECT * FROM unlogged_matview; It won't
>       notice anything without a unscannable state since it probably
>       won't even notice the database restart.
>
> Not sure what the consequence of this is. The most reasonable solution
> seems to be to introduce a metapage somewhere *now* which sucks, but ...

That seems far riskier to me than using the current
lame-but-functional approach now and improving it in a subsequent
release.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Remaining beta blockers
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: Remaining beta blockers