Обсуждение: pg_class -> reltuples?
Hi all, Is there a reason why the reltuples column of pg_class is stored as a "real", rather than one of the integer data types? Are there any situations in which there will be a non-integer value stored in this column? Cheers, Neil P.S. I tried to search the archives, but archives.postgresql.org is so slow, it's basically unusable. So my apologies if this has already been discussed... -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway wrote: > Hi all, > > Is there a reason why the reltuples column of pg_class is stored as a > "real", rather than one of the integer data types? Are there any > situations in which there will be a non-integer value stored in this > column? That is an excellent question. I assume it is related to having > 4 billion rows, but we have int8 for that. The value is used mostly by the optimizer, which does most of its calcultions using float8 (real), so that may be why. > P.S. I tried to search the archives, but archives.postgresql.org is so > slow, it's basically unusable. So my apologies if this has already been > discussed... Yes, it is hampering me from researching some of these patches too, and fts is completely down. If I could just get a web page of all the threads (forget searching), I would be happy. The archives site contents hasn't been updated since Feb 28. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Neil Conway wrote:
>> Is there a reason why the reltuples column of pg_class is stored as a
>> "real", rather than one of the integer data types?
> That is an excellent question.  I assume it is related to having > 4
> billion rows, but we have int8 for that.
1. We support tables > 4G rows.
2. int8 is not available on all platforms.
3. The only use for reltuples is in the optimizer, which is perfectly  content with approximate values.
        regards, tom lane
			
		On Thursday 07 March 2002 23:11, Bruce Momjian wrote: > Neil Conway wrote: > > > P.S. I tried to search the archives, but archives.postgresql.org is so > > slow, it's basically unusable. So my apologies if this has already been > > discussed... > > Yes, it is hampering me from researching some of these patches too, and > fts is completely down. If I could just get a web page of all the > threads (forget searching), I would be happy. The archives site contents > hasn't been updated since Feb 28. maybe google? http://groups.google.com/groups?hl=en&group=comp.databases.postgresql.hackers Though a cursory glance shows some mails which went over the list aren't there, particularly the most recent threads are pretty patchy, pun unintended.. Ian Barwick
> maybe google? > > http://groups.google.com/groups?hl=en&group=comp.databases.postgresql.hackers > > Though a cursory glance shows some mails which went over the > list aren't there, particularly the most recent threads are pretty patchy, pun > unintended.. Thanks. That is a huge help. In fact, this lists all the groups: http://groups.google.com/groups?hl=en&group=comp.databases.postgresql -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 2002-03-07 at 17:51, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Neil Conway wrote: > >> Is there a reason why the reltuples column of pg_class is stored as a > >> "real", rather than one of the integer data types? > > > That is an excellent question. I assume it is related to having > 4 > > billion rows, but we have int8 for that. > > 1. We support tables > 4G rows. I agree we should try to support very large tables -- so why waste space on storing floating point? And am I missing something, or is a "real" only 4 bytes? > 2. int8 is not available on all platforms. I have no problem making restrictions on data types for portability, but at least we should be consistent: % grep -rI 'long long' * | wc -l 37 % grep -rI 'int64' * | wc -l 191 On all the platforms I tested (x86, SPARC, PPC, PA-RISC, Alpha), a 'long long' is supported, and is 8 bytes. Which platforms don't have this, and are we actively supporting them? Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes:
> I have no problem making restrictions on data types for portability, but
> at least we should be consistent:
We *are* consistent.  int8 is not used in the system catalogs, and where
it is used, the system will continue to function if it's implemented as
a 32-bit datatype.  (At least, things still worked the last time I tried
turning off HAVE_LONG_LONG_INT.  If someone broke it since then, it
needs to be fixed.)
        regards, tom lane
			
		On Thu, 2002-03-07 at 19:54, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > I have no problem making restrictions on data types for portability, but > > at least we should be consistent: > > We *are* consistent. int8 is not used in the system catalogs, and where > it is used, the system will continue to function if it's implemented as > a 32-bit datatype. (At least, things still worked the last time I tried > turning off HAVE_LONG_LONG_INT. If someone broke it since then, it > needs to be fixed.) 9 regression tests fail without HAVE_LONG_LONG_INT on a 32-bit machine (int8, constraints, select_implicit, select_having, subselect, union, aggregates, misc, rules). It's pretty obvious that int8 should fail, but the others look like bugs. As for the original question, maybe I'm missing something obvious, but is there a reason why reltuples can't be an int8? (which is already typedef'ed to a int4 on broken machines/compilers) This would mean that on machines without a 64-bit int type, tables greater than 2^32 rows can't be stored (or at least, reltuples breaks). But I'm inclined to dismiss those platforms as broken, anyway... In any case, I think the current situation is the wrong way around: we're using a workaround on _all_ platforms, just to avoid breaking a few old systems. Wouldn't it make more sense to use an int8 by default, and fall back to a floating-point workaround if the default, optimal solution isn't available? Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes:
> 9 regression tests fail without HAVE_LONG_LONG_INT on a 32-bit machine
> (int8, constraints, select_implicit, select_having, subselect, union,
> aggregates, misc, rules). It's pretty obvious that int8 should fail, but
> the others look like bugs.
I think int8_tbl may be used in some of the other tests, so diffs there
are not necessarily a big deal.  Did you examine the diffs closely?
> As for the original question, maybe I'm missing something obvious, but
> is there a reason why reltuples can't be an int8? (which is already
> typedef'ed to a int4 on broken machines/compilers)
Yes: it won't work.  If reltuples is construed to be 8 bytes by some
compilers and 4 bytes by others, then the struct definition will fail to
overlay onto the storage as seen by the general-purpose tuple access
routines.  (We could maybe fix that by having pg_type.h and some other
places conditionally compile the declared size of type int8, but it
ain't worth the trouble.)
> This would mean that
> on machines without a 64-bit int type, tables greater than 2^32 rows
> can't be stored (or at least, reltuples breaks). But I'm inclined to
> dismiss those platforms as broken, anyway...
Sorry, but I have very little patience for arguments that "if it works
on all the machines I use, it's good enough".  Especially for a case
like this, where there is zero advantage to using int8 anyway.
Using a float here is not a "workaround", it's the right thing to do.
(The optimizer would only have to convert it to float anyway for its
internal calculations.)
> Wouldn't it make more sense to use an int8 by default,
> and fall back to a floating-point workaround if the default, optimal
> solution isn't available?
So the user-visible column types of pg_class would vary depending on
this implementation detail?  Not a good idea IMHO.
        regards, tom lane