Обсуждение: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

Поиск
Список
Период
Сортировка

9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

От
Guillaume Cottenceau
Дата:
Hello,

I am toying around with 9.2.1, trying to measure/determine how
index-only scans can improve our performance.

A small script which is attached to this mail, shows that as long
as the table has been VACUUM FULL'd, there is a unusual high
amount of heap fetches. It is strange that the visibilitymap_test
predicate fails in these situations, is the visibility map
somehow trashed in this situation? It should not, or at least the
documentation[1] should state it (my understanding is that vacuum
full does *more* than vacuum, but nothing less) (note to usual
anti vacuum full trollers: I know you hate vacuum full).

Using pg 9.2.1 compiled from sources, almost standard
configuration except shared_buffers at 512M, effective_cache_size
at 1536M, random_page_cost at 2, and vacuum delays increased.

Please find complete logs attached, and selected logs below:

After table creation + analyze:

 Index Only Scan using i on ta  (cost=0.00..156991.10 rows=2018667 width=4) (actual time=0.034..336.443 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 2000000

After vacuum:

 Index Only Scan using i on ta  (cost=0.00..50882.62 rows=2018667 width=4) (actual time=0.014..193.120 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 0

After vacuum analyze:

 Index Only Scan using i on ta  (cost=0.00..50167.13 rows=1990353 width=4) (actual time=0.015..193.035 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 0

After vacuum full:

 Index Only Scan using i on ta  (cost=0.00..155991.44 rows=1990333 width=4) (actual time=0.042..364.412 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 2000000
                 ^^^^^^^ uh uh, looking bad

After vacuum full analyze:

 Index Only Scan using i on ta  (cost=0.00..157011.85 rows=2030984 width=4) (actual time=0.025..365.657 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 2000000

After vacuum:

 Index Only Scan using i on ta  (cost=0.00..51192.45 rows=2031000 width=4) (actual time=0.015..192.520 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 0

Thanks for any comments/hints,

Ref:
[1] http://www.postgresql.org/docs/9.1/static/sql-vacuum.html

DROP TABLE ta;
psql:/tmp/vacfull.sql:3: ERROR:  table "ta" does not exist
CREATE TABLE ta (ca int, cb int, cc int);
CREATE TABLE
INSERT INTO ta VALUES (generate_series(1, 5), generate_series(1, 10000000), generate_series(1, 10000000));
INSERT 0 10000000
ANALYZE ta;
ANALYZE
CREATE INDEX i ON ta (ca, cb, cc);
CREATE INDEX
EXPLAIN ANALYZE SELECT cb FROM ta WHERE ca = 1 ORDER BY cb;
                                                          QUERY PLAN
      

------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i on ta  (cost=0.00..156991.10 rows=2018667 width=4) (actual time=0.034..336.443 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 2000000
 Total runtime: 385.023 ms
(4 rows)

VACUUM ta;
VACUUM
EXPLAIN ANALYZE SELECT cb FROM ta WHERE ca = 1 ORDER BY cb;
                                                         QUERY PLAN
     

-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i on ta  (cost=0.00..50882.62 rows=2018667 width=4) (actual time=0.014..193.120 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 0
 Total runtime: 241.079 ms
(4 rows)

VACUUM ANALYZE ta;
VACUUM
EXPLAIN ANALYZE SELECT cb FROM ta WHERE ca = 1 ORDER BY cb;
                                                         QUERY PLAN
     

-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i on ta  (cost=0.00..50167.13 rows=1990353 width=4) (actual time=0.015..193.035 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 0
 Total runtime: 241.101 ms
(4 rows)

VACUUM FULL ta;
VACUUM
EXPLAIN ANALYZE SELECT cb FROM ta WHERE ca = 1 ORDER BY cb;
                                                          QUERY PLAN
      

------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i on ta  (cost=0.00..155991.44 rows=1990333 width=4) (actual time=0.042..364.412 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 2000000
 Total runtime: 412.715 ms
(4 rows)

VACUUM FULL ANALYZE ta;
VACUUM
EXPLAIN ANALYZE SELECT cb FROM ta WHERE ca = 1 ORDER BY cb;
                                                          QUERY PLAN
      

------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i on ta  (cost=0.00..157011.85 rows=2030984 width=4) (actual time=0.025..365.657 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 2000000
 Total runtime: 414.223 ms
(4 rows)

VACUUM ta;
VACUUM
EXPLAIN ANALYZE SELECT cb FROM ta WHERE ca = 1 ORDER BY cb;
                                                         QUERY PLAN
     

-----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using i on ta  (cost=0.00..51192.45 rows=2031000 width=4) (actual time=0.015..192.520 rows=2000000
loops=1)
   Index Cond: (ca = 1)
   Heap Fetches: 0
 Total runtime: 240.918 ms
(4 rows)

DROP TABLE ta;
DROP TABLE

--
Guillaume Cottenceau

Вложения

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

От
Pavan Deolasee
Дата:

On Thu, Nov 29, 2012 at 5:03 PM, Guillaume Cottenceau <gc@mnc.ch> wrote:
Hello,

I am toying around with 9.2.1, trying to measure/determine how
index-only scans can improve our performance.

A small script which is attached to this mail, shows that as long
as the table has been VACUUM FULL'd, there is a unusual high
amount of heap fetches. It is strange that the visibilitymap_test
predicate fails in these situations, is the visibility map
somehow trashed in this situation? It should not, or at least the
documentation[1] should state it (my understanding is that vacuum
full does *more* than vacuum, but nothing less) (note to usual
anti vacuum full trollers: I know you hate vacuum full).


I don't find it very surprising given that VACUUM FULL is now implemented as a CLUSTER command which rewrites the entire heap, thus invalidating all the visibility map info whatsoever. The code paths that VACUUM FULL and LAZY VACUUM takes are now completely different.

Even with the old VACUUM FULL we would have seen some impact on heap fetches because it used to move tuples around and thus potentially resetting visibility map bits. But its definitely going to be worse with the new implementation.

Now can CLUSTER or VACUUM FULL recreate the visibility map with all bits set to visible, thats an entirely different question. I don't think it can, but then I haven't thought through this completely. 

Thanks,
Pavan

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

От
Andres Freund
Дата:
On 2012-11-29 17:20:01 +0530, Pavan Deolasee wrote:
> On Thu, Nov 29, 2012 at 5:03 PM, Guillaume Cottenceau <gc@mnc.ch> wrote:
>
> > Hello,
> >
> > I am toying around with 9.2.1, trying to measure/determine how
> > index-only scans can improve our performance.
> >
> > A small script which is attached to this mail, shows that as long
> > as the table has been VACUUM FULL'd, there is a unusual high
> > amount of heap fetches. It is strange that the visibilitymap_test
> > predicate fails in these situations, is the visibility map
> > somehow trashed in this situation? It should not, or at least the
> > documentation[1] should state it (my understanding is that vacuum
> > full does *more* than vacuum, but nothing less) (note to usual
> > anti vacuum full trollers: I know you hate vacuum full).
> >
> >
> I don't find it very surprising given that VACUUM FULL is now implemented
> as a CLUSTER command which rewrites the entire heap, thus invalidating all
> the visibility map info whatsoever.

Me neither.

> Now can CLUSTER or VACUUM FULL recreate the visibility map with all bits
> set to visible, thats an entirely different question. I don't think it can,
> but then I haven't thought through this completely.

It can't set everything to visible as it also copies RECENTLY_DEAD
tuples and tuples which are not yet visible to other transactions, but
it should be relatively easy to keep enough information about whether it
can set the current page to all visible. At least for the data in the
main relation, the toast tables are a different matter.
Just tracking whether the page in rewriteheap.c's state->rs_buffer
contains only tuples that are clearly visible according to the xmin
horizon seems to be enough.

The current effect of resetting the VM has the disadvantage of making
the next autovacuum basically a full table vacuum without any
benefits...

Greetings,

Andres

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

От
Pavan Deolasee
Дата:


On Thu, Nov 29, 2012 at 5:42 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-29 17:20:01 +0530, Pavan Deolasee wrote:

> Now can CLUSTER or VACUUM FULL recreate the visibility map with all bits
> set to visible, thats an entirely different question. I don't think it can,
> but then I haven't thought through this completely.

It can't set everything to visible as it also copies RECENTLY_DEAD
tuples and tuples which are not yet visible to other transactions, but
it should be relatively easy to keep enough information about whether it
can set the current page to all visible. 

Yeah, that looks fairly easy to have. Thinking about it more, now that we have ability to skip WAL for the case when a table is created and populated in the same transaction, we could also set the visibility map bits for such a table (if we are not doing that already). That should be fairly safe too.

Thanks,
Pavan

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

От
Andres Freund
Дата:
On 2012-11-29 17:59:39 +0530, Pavan Deolasee wrote:
> On Thu, Nov 29, 2012 at 5:42 PM, Andres Freund <andres@2ndquadrant.com>wrote:
>
> > On 2012-11-29 17:20:01 +0530, Pavan Deolasee wrote:
> >
> > > Now can CLUSTER or VACUUM FULL recreate the visibility map with all bits
> > > set to visible, thats an entirely different question. I don't think it
> > can,
> > > but then I haven't thought through this completely.
> >
> > It can't set everything to visible as it also copies RECENTLY_DEAD
> > tuples and tuples which are not yet visible to other transactions, but
> > it should be relatively easy to keep enough information about whether it
> > can set the current page to all visible.
>
>
> Yeah, that looks fairly easy to have. Thinking about it more, now that we
> have ability to skip WAL for the case when a table is created and populated
> in the same transaction, we could also set the visibility map bits for such
> a table (if we are not doing that already). That should be fairly safe too.

I don't think the latter would be safe at all. Every repeatable read
transaction that started before the table creation would see that tables
content based on the visibilitymap instead of seeing it as empty.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

От
Pavan Deolasee
Дата:


On Thu, Nov 29, 2012 at 6:06 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-29 17:59:39 +0530, Pavan Deolasee wrote:

>
>
> Yeah, that looks fairly easy to have. Thinking about it more, now that we
> have ability to skip WAL for the case when a table is created and populated
> in the same transaction, we could also set the visibility map bits for such
> a table (if we are not doing that already). That should be fairly safe too.

I don't think the latter would be safe at all. Every repeatable read
transaction that started before the table creation would see that tables
content based on the visibilitymap instead of seeing it as empty.

Yeah, but that should be easy to fix, no ? We know the transaction that created the table and we can check if that transaction is visible to our snapshot or not. If the creating transaction itself is not visible, the data in the table is not visible either. OTOH if the creating transaction is visible and is committed, we can trust the visibility map as well. Thats probably better than scanning the entire table just to find that we can/can't see all/any rows.

Its getting slightly off-topic, so my apologies anyways.

Thanks,
Pavan