Обсуждение: Re: [GENERAL] select statement against pg_stats returns inconsistent data
Shelby Cain <alyandon@yahoo.com> writes:
> The select statements return different data for
> most_commons_vals depending on whether n_distinct is
> included in the select clause or not.
> I only seem to get the behavior below against int8
> columns - but I haven't interated through every
> conceivable data type either.
Hoo, I'm surprised no one noticed this during 7.4 development/testing.
The problem applies for any datatype that requires double alignment,
which includes int8, float8, and timestamp as well as most of the
geometric types. pg_statistic is declared as using type "anyarray",
and this type really needs to be marked as requiring double alignment
so that arrays of double-aligned datatypes will come out correctly.
The correct source fix is a one-line change in pg_type.h, but this will
not propagate into existing databases without an initdb. It looks like
what you'd need to do to fix an existing database is
-- clear out broken data in pg_statistic
DELETE FROM pg_statistic;
-- this should update 1 row:
UPDATE pg_type SET typalign = 'd' WHERE oid = 2277;
-- this should update 6 rows:
UPDATE pg_attribute SET attalign = 'd' WHERE atttypid = 2277;
-- might be a good idea to start a fresh backend at this point
-- repopulate pg_statistic
ANALYZE;
Ugh :-(
regards, tom lane
Tom Lane wrote: > Hoo, I'm surprised no one noticed this during 7.4 development/testing. > The problem applies for any datatype that requires double alignment, > which includes int8, float8, and timestamp as well as most of the > geometric types. pg_statistic is declared as using type "anyarray", > and this type really needs to be marked as requiring double alignment > so that arrays of double-aligned datatypes will come out correctly. anyarray has been defined this way since 7.3 -- any concerns there? I see that back then pg_statistic used text[] instead of anyarray, so perhaps not. Joe
Joe Conway <mail@joeconway.com> writes:
> anyarray has been defined this way since 7.3 -- any concerns there?
I don't think so --- we weren't trying to use it as an actual column
datatype back then.
7.4 has a problem though :-( ... this is one of the "damn I wish we'd
caught that before release" ones, since it can't easily be fixed without
initdb. Reminds me that I need to get to work on making pg_upgrade
viable again.
regards, tom lane
> I don't think so --- we weren't trying to use it as an actual column > datatype back then. > > 7.4 has a problem though :-( ... this is one of the "damn I wish we'd > caught that before release" ones, since it can't easily be fixed without > initdb. Reminds me that I need to get to work on making pg_upgrade > viable again. Has anyone given any thought as to whether dumping and restoring pg_statistic is worthwhile? eg. some sort of ALTER TABLE..SET STATISTICS (1.0, 3.3, 'asdf',....) command? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Has anyone given any thought as to whether dumping and restoring
> pg_statistic is worthwhile?
Why? You can reconstruct it with a simple "ANALYZE" command. Dumping
and restoring would mean nailing down cross-version assumptions about
what it contains, which doesn't seem real forward-looking...
regards, tom lane
> Why? You can reconstruct it with a simple "ANALYZE" command. Dumping > and restoring would mean nailing down cross-version assumptions about > what it contains, which doesn't seem real forward-looking... I seem to recall that people like that kind of thing so that the dump is really the current state of the database. Also, I believe big db's like DB2 and Oracle do such a thing. I just recall it being discussed some time ago... Chris
Christopher Kings-Lynne wrote: > > Why? You can reconstruct it with a simple "ANALYZE" command. Dumping > > and restoring would mean nailing down cross-version assumptions about > > what it contains, which doesn't seem real forward-looking... > > I seem to recall that people like that kind of thing so that the dump is > really the current state of the database. > > Also, I believe big db's like DB2 and Oracle do such a thing. > > I just recall it being discussed some time ago... I have heard of dumping stats so you are sure your production db has the same stats as your test database, but with ANALYZE so fast, and our optimizer so good, I don't see a use case for us. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073