Обсуждение: Bug? relpages, reltuples resets to zero
Hi,
there seems to be a problem with the relation statistics in
pg_class. Could someone explain why this happens?
doc=> vacuum;
VACUUM
doc=> select relname, relpages, reltuples from pg_class
doc-> where relname = 'doc_wordref';
relname |relpages|reltuples
-----------+--------+---------
doc_wordref| 1099| 136027
(1 row)
-- ******** That's right
doc=> explain select distinct refpage from doc_wordref
doc-> where refword ~ '^a';
NOTICE: QUERY PLAN:
Unique
-> Sort
-> Index Scan using doc_wordref_word_idx on doc_wordref
EXPLAIN
-- ******** As expected
doc=> select distinct refpage from doc_wordref
doc-> where refword ~ '^a';
refpage
-------
2
3
...
(164 rows)
doc=> select relname, relpages, reltuples from pg_class
doc-> where relname = 'doc_wordref';
relname |relpages|reltuples
-----------+--------+---------
doc_wordref| 0| 0
(1 row)
-- ******** Ooops - where are they gone?
doc=> explain select distinct refpage from doc_wordref
dos-> where refword ~ '^a';
NOTICE: QUERY PLAN:
Unique
-> Sort
-> Index Scan using doc_wordref_word_idx on doc_wordref
-- ******** Doesn't matter in the same connection, so reconnect
EXPLAIN
doc=> \c -
connecting to new database: doc
doc=> explain select distinct refpage from doc_wordref
dos-> where refword ~ '^a';
NOTICE: QUERY PLAN:
Unique
-> Sort
-> Seq Scan on doc_wordref
-- ******** Boom
EXPLAIN
doc=>
Why does the SELECT throw away the information about the
number of pages and tuples from pg_class? Is this a bug or a
feature? If it's a feature, how can I disable it? It is
reproduceable.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
> Hi,
>
> there seems to be a problem with the relation statistics in
> pg_class. Could someone explain why this happens?
>
> doc=> vacuum;
> VACUUM
> doc=> select relname, relpages, reltuples from pg_class
> doc-> where relname = 'doc_wordref';
> doc=> select relname, relpages, reltuples from pg_class
> doc-> where relname = 'doc_wordref';
> relname |relpages|reltuples
> -----------+--------+---------
> doc_wordref| 0| 0
> (1 row)
>
> -- ******** Ooops - where are they gone?
>
> doc=> explain select distinct refpage from doc_wordref
> dos-> where refword ~ '^a';
> NOTICE: QUERY PLAN:
I have seen the optimizer stop using indexes, but could never reproduce
it, and hoped my mega-patch would have fix it.
My only guess is that vacuum has changed the buffer cache copy of the
pg_class tuple, but did not mark it as dirty, so it was not written back
out when removed from the buffer cache. When reloaded after the query,
the buffer cache is loaded from the disk copy, and the disk copy has
zeros, because the vacuum copy was not written to disk.
The active code is in vacuum.c::vc_updstats:
/* XXX -- after write, should invalidate relcache in other backends */
WriteNoReleaseBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid));
RelationInvalidateHeapTuple(rd, rtup);
This should be marking the buffer as dirty and written out the buffer to
disk, so when it gets reloaded, it has the new vacuum statatistics. It
is also invalidating the catalog cache, so that doesn't get used for
stats.
The code looks fine to me. I can't figure out why that would happen.
Can you try replacing WriteNoReleaseBuffer() with WriteBuffer() and see
if that fixes it.
-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026
> Hi, > > there seems to be a problem with the relation statistics in > pg_class. Could someone explain why this happens? > > doc=> vacuum; > VACUUM > doc=> select relname, relpages, reltuples from pg_class > doc-> where relname = 'doc_wordref'; > relname |relpages|reltuples > -----------+--------+--------- > doc_wordref| 1099| 136027 > (1 row) > > -- ******** That's right > > doc=> explain select distinct refpage from doc_wordref > doc-> where refword ~ '^a'; > NOTICE: QUERY PLAN: > Attached is a patch to vacuum.c I would like you to try. I wonder if dirty pages are somehow never getting written out because of vacuum's use of transactions. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 *** ./backend/commands/vacuum.c.orig Thu Oct 22 17:14:37 1998 --- ./backend/commands/vacuum.c Thu Oct 22 17:15:35 1998 *************** *** 1875,1886 **** heap_close(sd); } - /* XXX -- after write, should invalidate relcache in other backends */ - WriteNoReleaseBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid)); - RelationInvalidateHeapTuple(rd, rtup); ! ReleaseBuffer(buffer); heap_close(rd); } --- 1875,1885 ---- heap_close(sd); } RelationInvalidateHeapTuple(rd, rtup); ! /* XXX -- after write, should invalidate relcache in other backends */ ! WriteBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid)); ! heap_close(rd); }
Bruce Momjian wrote:
> I have seen the optimizer stop using indexes, but could never reproduce
> it, and hoped my mega-patch would have fix it.
>
> My only guess is that vacuum has changed the buffer cache copy of the
> pg_class tuple, but did not mark it as dirty, so it was not written back
> out when removed from the buffer cache. When reloaded after the query,
> the buffer cache is loaded from the disk copy, and the disk copy has
> zeros, because the vacuum copy was not written to disk.
>
> The active code is in vacuum.c::vc_updstats:
Your guess was right, thanks. But your solution does not work
:-(
I found a way to easily reproduce the error.
Run VACUUM
Restart postmaster
-> relpages and reltuples gone
I think the simple way of modifying the tuple in the page
does not work. I found that catalog/index.c does the same for
relpages and reltuples in UpdateStats(). Calling
UpdateStats() after vc_updstats() as a quick hack solved the
problem.
I'm now cvsup'ing, then I'll modify vacuum.c to do it the
same way as index.c does it. I don't know if calling
UpdateStats() instead is really a good idea, because vacuum
potentially truncates files and UpdateStats() does
RelationGetNumberOfBlocks() instead of getting it by
argument. This might be wrong at that time.
Let's see what happens when vacuum does it harder.
Later, Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
> I think the simple way of modifying the tuple in the page
> does not work. I found that catalog/index.c does the same for
> relpages and reltuples in UpdateStats(). Calling
> UpdateStats() after vc_updstats() as a quick hack solved the
> problem.
And now I know the way of modifying them in the buffer is the
only one to succeed. Doing it like index.c with
heap_replace() irritates vacuum itself.
Sometimes it's good to check returncodes :-).
WriteNoReleaseBuffer() (as it's name says) takes a buffer
number as argument, not a disk block number. Thus, it
returned FALSE sometimes when called to write buffer # 0.
The attdisbursion is also permanently saved on disk now. It
had the same problem.
Anyway, the fix is below.
Patch is regression tested.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
*** vacuum.c.orig Fri Oct 23 16:47:21 1998
--- vacuum.c Fri Oct 23 16:41:56 1998
***************
*** 1792,1799 ****
--- 1792,1807 ----
/* overwrite the existing statistics in the tuple */
if (VacAttrStatsEqValid(stats))
{
+ Buffer abuffer;
+ /*
+ * We manipulate the heap tuple in the
+ * buffer, so we fetch it to get the
+ * buffer number
+ */
+ atup = heap_fetch(ad, SnapshotNow, &atup->t_ctid, &abuffer);
vc_setpagelock(ad, ItemPointerGetBlockNumber(&atup->t_ctid));
+ attp = (Form_pg_attribute) GETSTRUCT(atup);
if (stats->nonnull_cnt + stats->null_cnt == 0 ||
(stats->null_cnt <= 1 && stats->best_cnt == 1))
***************
*** 1822,1828 ****
if (selratio > 1.0)
selratio = 1.0;
attp->attdisbursion = selratio;
! WriteNoReleaseBuffer(ItemPointerGetBlockNumber(&atup->t_ctid));
/* DO PG_STATISTIC INSERTS */
--- 1830,1843 ----
if (selratio > 1.0)
selratio = 1.0;
attp->attdisbursion = selratio;
!
! /*
! * Invalidate the cache for the tuple
! * and write the buffer
! */
! RelationInvalidateHeapTuple(ad, atup);
! WriteNoReleaseBuffer(abuffer);
! ReleaseBuffer(abuffer);
/* DO PG_STATISTIC INSERTS */
***************
*** 1875,1884 ****
heap_close(sd);
}
RelationInvalidateHeapTuple(rd, rtup);
! /* XXX -- after write, should invalidate relcache in other backends */
! WriteBuffer(ItemPointerGetBlockNumber(&rtup->t_ctid));
heap_close(rd);
}
--- 1890,1904 ----
heap_close(sd);
}
+ /*
+ * Invalidate the cached pg_class tuple and
+ * write the buffer
+ */
RelationInvalidateHeapTuple(rd, rtup);
! WriteNoReleaseBuffer(buffer);
!
! ReleaseBuffer(buffer);
heap_close(rd);
}
> > I think the simple way of modifying the tuple in the page > > does not work. I found that catalog/index.c does the same for > > relpages and reltuples in UpdateStats(). Calling > > UpdateStats() after vc_updstats() as a quick hack solved the > > problem. > > And now I know the way of modifying them in the buffer is the > only one to succeed. Doing it like index.c with > heap_replace() irritates vacuum itself. > > Sometimes it's good to check returncodes :-). > WriteNoReleaseBuffer() (as it's name says) takes a buffer > number as argument, not a disk block number. Thus, it > returned FALSE sometimes when called to write buffer # 0. > > The attdisbursion is also permanently saved on disk now. It > had the same problem. > > Anyway, the fix is below. > > Patch is regression tested. Applied. Thanks. I am glad you were able to find the actual cause, rather than calling that UpdateStats function. Vacuum is too strange in the way it modifies things, and calling a general-purpose function that calls all sorts of other functions could be a problem. Of course, you are right. I was passing in a block number, rather than a buffer number. I checked 6.3.2, and I had it right there, so somehow I got confused in the megapatch on that item. I just checked the mega-patch, and that is the only place I used ItemPointerGetBlockNumber as a replacement for Buffer. Glad to see you were able to make sense of the new heap_fetch() syntax to get the buffer. The old code was very unclear in these areas. Now that it is in the developers FAQ, it should be better. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Sorry to be nit-picky, but is pg_indexes proper English. Isn't it pg_indices? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Sorry to be nit-picky, but is pg_indexes proper English. Isn't it > pg_indices? In proper English, "indices" is plural of the singular noun "index". In American, "indexes" is also allowed. (In proper English, "indexes" is a conjugation of the transitive verb "index", of course.) -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Bruce Momjian wrote:
>
> Sorry to be nit-picky, but is pg_indexes proper English. Isn't it
> pg_indices?
I thought that too. I named it just pg_indexes because Oracle
has ALL_INDEXES, DBA_INDEXES and USER_INDEXES. I wonder why
I'm allways copying those typos when implementing something
similar to what they have.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #