Обсуждение: stawidth inconsistency with all NULL columns
Consider:
CREATE TABLE testwid
(
txtnotnull text,
txtnull text,
int8notnull int8,
int8null int8
);
INSERT INTO testwid
SELECT 'a' || g.i,
NULL,
g.i,
NULL
FROM generate_series(1,10000) AS g(i);
ANALYZE testwid;
SELECT attname, avg_width FROM pg_stats WHERE tablename = 'testwid';
attname | avg_width
-------------+-----------
txtnotnull | 5
txtnull | 0
int8notnull | 8
int8null | 8
(4 rows)
I see in analyze.c
8<-----------------
/* We can only compute average width if we found some non-null values.*/
if (nonnull_cnt > 0)
[snip]
else if (null_cnt > 0)
{
/* We found only nulls; assume the column is entirely null */
stats->stats_valid = true;
stats->stanullfrac = 1.0;
if (is_varwidth)
stats->stawidth = 0; /* "unknown" */
else
stats->stawidth = stats->attrtype->typlen;
stats->stadistinct = 0.0; /* "unknown" */
}
8<-----------------
So apparently intentional, but seems gratuitously inconsistent. Could
this cause any actual inconsistent behaviors? In any case that first
comment does not reflect the code.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Вложения
Joe Conway <mail@joeconway.com> writes:
> else if (null_cnt > 0)
> {
> /* We found only nulls; assume the column is entirely null */
> stats->stats_valid = true;
> stats->stanullfrac = 1.0;
> if (is_varwidth)
> stats->stawidth = 0; /* "unknown" */
> else
> stats->stawidth = stats->attrtype->typlen;
> stats->stadistinct = 0.0; /* "unknown" */
> }
> 8<-----------------
> So apparently intentional, but seems gratuitously inconsistent. Could
> this cause any actual inconsistent behaviors? In any case that first
> comment does not reflect the code.
Are you suggesting that we should set stawidth to zero even for a
fixed-width datatype? That seems pretty silly. We know exactly what
the value should be, and would be if we'd chanced to find even one
non-null entry.
regards, tom lane
On 5/21/19 3:55 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> else if (null_cnt > 0)
>> {
>> /* We found only nulls; assume the column is entirely null */
>> stats->stats_valid = true;
>> stats->stanullfrac = 1.0;
>> if (is_varwidth)
>> stats->stawidth = 0; /* "unknown" */
>> else
>> stats->stawidth = stats->attrtype->typlen;
>> stats->stadistinct = 0.0; /* "unknown" */
>> }
>> 8<-----------------
>
>> So apparently intentional, but seems gratuitously inconsistent. Could
>> this cause any actual inconsistent behaviors? In any case that first
>> comment does not reflect the code.
>
> Are you suggesting that we should set stawidth to zero even for a
> fixed-width datatype? That seems pretty silly. We know exactly what
> the value should be, and would be if we'd chanced to find even one
> non-null entry.
Well you could argue in similar fashion for variable width values -- if
we find even one of those, it will be at least 4 bytes. So why set those
to zero?
Not a big deal, but it struck me as odd when I was looking at the
current state of affairs.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Вложения
Joe Conway <mail@joeconway.com> writes:
> On 5/21/19 3:55 PM, Tom Lane wrote:
>> Are you suggesting that we should set stawidth to zero even for a
>> fixed-width datatype? That seems pretty silly. We know exactly what
>> the value should be, and would be if we'd chanced to find even one
>> non-null entry.
> Well you could argue in similar fashion for variable width values -- if
> we find even one of those, it will be at least 4 bytes. So why set those
> to zero?
Um, really the minimum width is 1 byte, given short headers. But as
the code notes, zero means we don't know what a sane estimate would
be, which is certainly not the case for fixed-width types.
regards, tom lane