Обсуждение: select statement against pg_stats returns inconsistent data
The select statements return different data for most_commons_vals depending on whether n_distinct is included in the select clause or not. I only seem to get the behavior below against int8 columns - but I haven't interated through every conceivable data type either. Is this expected behavior or perhaps a bug? Regards, Shelby Cain ========================================================= c1scain=# select version(); version --------------------------------------------------------------------------------------- PostgreSQL 7.4.1 on i686-pc-cygwin, compiled by GCC gcc (GCC) 3.3.1 (cygming special) (1 row) c1scain=# create table test_table (lastname varchar(20), firstname varchar(20), userid int8, testid int8); CREATE TABLE c1scain=# insert into test_table values ('cain', 'shelby', random()*10000, random()*10000); INSERT 13015123 1 c1scain=# insert into test_table values ('cain', 'shelby', random()*10000, random()*10000); INSERT 13015124 1 c1scain=# insert into test_table values ('cain', 'shelby', random()*10000, random()*10000); INSERT 13015125 1 c1scain=# insert into test_table values ('cain', 'shelby', random()*10000, random()*10000); INSERT 13015126 1 c1scain=# insert into test_table values ('cain', 'shelby', random()*10000, random()*10000); INSERT 13015127 1 c1scain=# insert into test_table (select * from test_table); INSERT 0 5 c1scain=# insert into test_table (select * from test_table); INSERT 0 10 c1scain=# insert into test_table (select * from test_table); INSERT 0 20 c1scain=# insert into test_table (select * from test_table); INSERT 0 40 c1scain=# analyze test_table; ANALYZE c1scain=# select distinct userid from test_table; userid -------- 211 2641 4333 7642 8053 (5 rows) c1scain=# select distinct testid from test_table; testid -------- 73 834 1399 2315 4511 (5 rows) c1scain=# select tablename, attname, most_common_vals from pg_stats where tablename = 'test_table'; tablename | attname | most_common_vals ------------+-----------+--------------------------- test_table | lastname | {cain} test_table | firstname | {shelby} test_table | userid | {211,2641,4333,7642,8053} test_table | testid | {73,834,1399,2315,4511} (4 rows) c1scain=# select tablename, attname, n_distinct, most_common_vals from pg_stats where tablename = 'test_table'; tablename | attname | n_distinct | most_common_vals ------------+-----------+------------+------------------------------------------------------ test_table | lastname | 1 | {cain} test_table | firstname | 1 | {shelby} test_table | userid | 5 | {211,18610093293568,32822140076032,34587371634688,0} test_table | testid | 5 | {73,6008659247104,9942849290240,19374597472256,0} __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools
Shelby Cain <alyandon@yahoo.com> writes: > The select statements return different data for > most_commons_vals depending on whether n_distinct is > included in the select clause or not. > I only seem to get the behavior below against int8 > columns - but I haven't interated through every > conceivable data type either. Hoo, I'm surprised no one noticed this during 7.4 development/testing. The problem applies for any datatype that requires double alignment, which includes int8, float8, and timestamp as well as most of the geometric types. pg_statistic is declared as using type "anyarray", and this type really needs to be marked as requiring double alignment so that arrays of double-aligned datatypes will come out correctly. The correct source fix is a one-line change in pg_type.h, but this will not propagate into existing databases without an initdb. It looks like what you'd need to do to fix an existing database is -- clear out broken data in pg_statistic DELETE FROM pg_statistic; -- this should update 1 row: UPDATE pg_type SET typalign = 'd' WHERE oid = 2277; -- this should update 6 rows: UPDATE pg_attribute SET attalign = 'd' WHERE atttypid = 2277; -- might be a good idea to start a fresh backend at this point -- repopulate pg_statistic ANALYZE; Ugh :-( regards, tom lane
Tom Lane wrote: > Hoo, I'm surprised no one noticed this during 7.4 development/testing. > The problem applies for any datatype that requires double alignment, > which includes int8, float8, and timestamp as well as most of the > geometric types. pg_statistic is declared as using type "anyarray", > and this type really needs to be marked as requiring double alignment > so that arrays of double-aligned datatypes will come out correctly. anyarray has been defined this way since 7.3 -- any concerns there? I see that back then pg_statistic used text[] instead of anyarray, so perhaps not. Joe
Joe Conway <mail@joeconway.com> writes: > anyarray has been defined this way since 7.3 -- any concerns there? I don't think so --- we weren't trying to use it as an actual column datatype back then. 7.4 has a problem though :-( ... this is one of the "damn I wish we'd caught that before release" ones, since it can't easily be fixed without initdb. Reminds me that I need to get to work on making pg_upgrade viable again. regards, tom lane
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Has anyone given any thought as to whether dumping and restoring > pg_statistic is worthwhile? Why? You can reconstruct it with a simple "ANALYZE" command. Dumping and restoring would mean nailing down cross-version assumptions about what it contains, which doesn't seem real forward-looking... regards, tom lane
Re: select statement against pg_stats returns inconsistent
От
"V i s h a l Kashyap @ [Sai Hertz And Control Systems]"
Дата:
Dear Shelby Cain , >Is this expected behavior or perhaps a bug? > > For a novice like me can anyone please tell me 1. Will this effect my application developed on PostgreSQL 2. Will my Application break at some point I heavly use the type of queries defined in the post. Would be greatfull for any kinda answers. -- Best Regards, Vishal Kashyap Director / Lead Developer, Sai Hertz And Control Systems Pvt Ltd, http://saihertz.rediffblogs.com Jabber IM: vishalkashyap@jabber.org ICQ : 264360076 ----------------------------------------------- You yourself, as much as anybody in the entire universe, deserve your love and affection. - Buddha --------------- I am usually called as Vishal Kashyap and my Girlfriend calls me Vishal CASH UP. Because everyone loves me as Vishal Kashyap and my Girlfriend loves me as CASH. ___ //\\\ ( 0_0 ) ----------------o0o-----o0o---------------------
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hoo, I'm surprised no one noticed this during 7.4 > development/testing. > The problem applies for any datatype that requires > double alignment, > which includes int8, float8, and timestamp as well > as most of the > geometric types. pg_statistic is declared as using > type "anyarray", > and this type really needs to be marked as requiring > double alignment > so that arrays of double-aligned datatypes will come > out correctly. > > The correct source fix is a one-line change in > pg_type.h, but this will > not propagate into existing databases without an > initdb. It looks like > what you'd need to do to fix an existing database is > > -- clear out broken data in pg_statistic > DELETE FROM pg_statistic; > -- this should update 1 row: > UPDATE pg_type SET typalign = 'd' WHERE oid = 2277; > -- this should update 6 rows: > UPDATE pg_attribute SET attalign = 'd' WHERE > atttypid = 2277; > -- might be a good idea to start a fresh backend at > this point > -- repopulate pg_statistic > ANALYZE; > > Ugh :-( > > regards, tom lane Works for me. Thanks! __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools
> I don't think so --- we weren't trying to use it as an actual column > datatype back then. > > 7.4 has a problem though :-( ... this is one of the "damn I wish we'd > caught that before release" ones, since it can't easily be fixed without > initdb. Reminds me that I need to get to work on making pg_upgrade > viable again. Has anyone given any thought as to whether dumping and restoring pg_statistic is worthwhile? eg. some sort of ALTER TABLE..SET STATISTICS (1.0, 3.3, 'asdf',....) command? Chris