Обсуждение: does refreshing materialized view make the database bloat?

Поиск
Список
Период
Сортировка

does refreshing materialized view make the database bloat?

От
jian he
Дата:
Hi,

src_backend_commands_matview.c
547: /*
548:  * refresh_by_match_merge
549:  *
550:  * Refresh a materialized view with transactional semantics, while allowing
551:  * concurrent reads.
552:  *
553:  * This is called after a new version of the data has been created in a
554:  * temporary table.  It performs a full outer join against the old version of
555:  * the data, producing "diff" results.  This join cannot work if there are any
556:  * duplicated rows in either the old or new versions, in the sense that every
557:  * column would compare as equal between the two rows.  It does work correctly
558:  * in the face of rows which have at least one NULL value, with all non-NULL
559:  * columns equal.  The behavior of NULLs on equality tests and on UNIQUE
560:  * indexes turns out to be quite convenient here; the tests we need to make
561:  * are consistent with default behavior.  If there is at least one UNIQUE
562:  * index on the materialized view, we have exactly the guarantee we need.
563:  *
564:  * The temporary table used to hold the diff results contains just the TID of
565:  * the old record (if matched) and the ROW from the new table as a single
566:  * column of complex record type (if matched).
567:  *
568:  * Once we have the diff table, we perform set-based DELETE and INSERT
569:  * operations against the materialized view, and discard both temporary
570:  * tables.
571:  *
572:  * Everything from the generation of the new data to applying the differences
573:  * takes place under cover of an ExclusiveLock, since it seems as though we
574:  * would want to prohibit not only concurrent REFRESH operations, but also
575:  * incremental maintenance.  It also doesn't seem reasonable or safe to allow
576:  * SELECT FOR UPDATE or SELECT FOR SHARE on rows being updated or deleted by
577:  * this command.
578:  */
579:
Once we have the diff table, we perform set-based DELETE and INSERT
operations against the materialized view, and discard both temporary tables.

Here the temporary tables are "discard" meaning the temporary tables are deleted and the temporary tables' spaces are reclaimed immediately?
Or the temporary tables are deleted and the spaces will be reclaimed by another mechanism? 

simplify:does refreshing materialized view make the database bloat.



Re: does refreshing materialized view make the database bloat?

От
"David G. Johnston"
Дата:
On Friday, January 13, 2023, jian he <jian.universality@gmail.com> wrote:

Once we have the diff table, we perform set-based DELETE and INSERT
operations against the materialized view, and discard both temporary tables.

Here the temporary tables are "discard" meaning the temporary tables are deleted and the temporary tables' spaces are reclaimed immediately?
Or the temporary tables are deleted and the spaces will be reclaimed by another mechanism? 

simplify:does refreshing materialized view make the database bloat.


The materialized view itself bloats.  The temp tables are removed immediately (the “drop table” docs don’t say this explicitly though it the most logical behavior and implied by the fact it takes an access exclusive lock).

David J.

Re: does refreshing materialized view make the database bloat?

От
"Peter J. Holzer"
Дата:
On 2023-01-14 11:34:13 +0530, jian he wrote:
>     Once we have the diff table, we perform set-based DELETE and INSERT
>     operations against the materialized view, and discard both temporary
>     tables.
>
>
> Here the temporary tables are "discard" meaning the temporary tables are
> deleted and the temporary tables' spaces are reclaimed immediately?
> Or the temporary tables are deleted and the spaces will be reclaimed by another
> mechanism? 

Tables are implemented as files by PostgreSQL. When a table is dropped,
the file is deleted[1]. Whether that means that the space is
"immediately" available again is up to the operating system.

        hp

[1] Possibly delayed until commit.

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: does refreshing materialized view make the database bloat?

От
jian he
Дата:


On Sat, Jan 14, 2023 at 11:49 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Friday, January 13, 2023, jian he <jian.universality@gmail.com> wrote:

Once we have the diff table, we perform set-based DELETE and INSERT
operations against the materialized view, and discard both temporary tables.

Here the temporary tables are "discard" meaning the temporary tables are deleted and the temporary tables' spaces are reclaimed immediately?
Or the temporary tables are deleted and the spaces will be reclaimed by another mechanism? 

simplify:does refreshing materialized view make the database bloat.


The materialized view itself bloats.  The temp tables are removed immediately (the “drop table” docs don’t say this explicitly though it the most logical behavior and implied by the fact it takes an access exclusive lock).

David J.


Hi,
why the materialized view itself bloats. If no refresh then no bloat right? If fresh then set based delete operation will make materialized view bloat?
I also found the same question online. https://dba.stackexchange.com/questions/219079/bloat-on-materialized-views Unfortunately nobody answered...




Re: does refreshing materialized view make the database bloat?

От
"David G. Johnston"
Дата:
On Sunday, January 15, 2023, jian he <jian.universality@gmail.com> wrote:


Hi,
why the materialized view itself bloats. If no refresh then no bloat right? If fresh then set based delete operation will make materialized view bloat?
I also found the same question online. https://dba.stackexchange.com/questions/219079/bloat-on-materialized-views Unfortunately nobody answered...


The definition of bloat is a deleted row.  Bloat can be reduced by subsequent row insertions.

David J.

Re: does refreshing materialized view make the database bloat?

От
jian he
Дата:


On Mon, Jan 16, 2023 at 10:28 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sunday, January 15, 2023, jian he <jian.universality@gmail.com> wrote:


Hi,
why the materialized view itself bloats. If no refresh then no bloat right? If fresh then set based delete operation will make materialized view bloat?
I also found the same question online. https://dba.stackexchange.com/questions/219079/bloat-on-materialized-views Unfortunately nobody answered...


The definition of bloat is a deleted row.  Bloat can be reduced by subsequent row insertions.

David J.


Hi.
In the following example, I cannot see bloat (via extension pgstattuple dead_tuple_count>0). Wondering where the bloat is.

BEGIN;create table tbt( a int) with(fillfactor=40, autovacuum_enabled=off);
insert into tbt  select g from generate_series(1,2000) g;
create materialized view tbtmv as select * from tbt;
commit;

--------do the update.
update tbt set a = 10 + a  where a < 20;
REFRESH MATERIALIZED view tbtmv;
SELECT * FROM pgstattuple('tbtmv'); -----no dead tuples count.

------try delete.
delete  from tbt  where a < 50;
REFRESH MATERIALIZED view tbtmv;
SELECT * FROM pgstattuple('tbtmv');-------still no dead tuples.

Re: does refreshing materialized view make the database bloat?

От
"David G. Johnston"
Дата:
On Sunday, January 15, 2023, jian he <jian.universality@gmail.com> wrote:


In the following example, I cannot see bloat (via extension pgstattuple dead_tuple_count>0). Wondering where the bloat is.

--------do the update.
update tbt set a = 10 + a  where a < 20;
REFRESH MATERIALIZED view tbtmv;
SELECT * FROM pgstattuple('tbtmv'); -----no dead tuples count.

You didn’t specify concurrently so the merge method you quoted is not being used.

David J.