Обсуждение: CLUSTER can change t_len

Поиск
Список
Период
Сортировка

CLUSTER can change t_len

От
Jeff Davis
Дата:
I don't think that this is a bug exactly, but it seems inconsistent.

See case below. After the item length gets changed, then when reading
the tuple later you get a t_len that includes padding.

We should document in a comment that t_len can mean multiple things. Or,
we should fix raw_heap_insert() to be consistent with the rest of the
code, which doesn't MAXALIGN the t_len.

Regards,Jeff Davis


create table foo(i int4 unique);
insert into foo values(1);
insert into foo values(2);
checkpoint;
select relfilenode from pg_class where relname = 'foo';
    relfilenode    -------------          16413   (1 row)

--
-- Look at on-disk item lengths, which are 28 (0x38 >> 1)
-- on my machine
--

cluster foo using foo_i_key;
select relfilenode from pg_class where relname = 'foo';
    relfilenode    -------------          16418   (1 row)

checkpoint;

--
-- Now look again. They are 32 (0x40 >> 1) on my machine.
--




Re: CLUSTER can change t_len

От
Itagaki Takahiro
Дата:
On Tue, Nov 9, 2010 at 12:44 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> See case below. After the item length gets changed, then when reading
> the tuple later you get a t_len that includes padding.

We can easily find it with pageinspect:

\i pageinspect.sql
create table foo(i int4);
insert into foo values(1);
SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));lp | lp_len
----+-------- 1 |     28
VACUUM FULL foo;
SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));lp | lp_len
----+-------- 1 |     32

> We should document in a comment that t_len can mean multiple things. Or,
> we should fix raw_heap_insert() to be consistent with the rest of the
> code, which doesn't MAXALIGN the t_len.

We have a comment /* be conservative */ in the function, but I'm not sure
we actually need the MAXALIGN. However, there would be almost no benefits
to keep t_len in small value because we often treat memory in MAXALIGN unit.

diff --git a/src/backend/access/heap/rewriteheap.c
b/src/backend/access/heap/rewriteheap.c
index 0bd1865..0ed94ef 100644
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
*************** raw_heap_insert(RewriteState state, Heap
*** 586,592 ****   else       heaptup = tup;

!   len = MAXALIGN(heaptup->t_len);     /* be conservative */
   /*    * If we're gonna fail for oversize tuple, do it right away
--- 586,592 ----   else       heaptup = tup;

!   len = heaptup->t_len;
   /*    * If we're gonna fail for oversize tuple, do it right away


-- 
Itagaki Takahiro


Re: CLUSTER can change t_len

От
Heikki Linnakangas
Дата:
On 09.11.2010 11:11, Itagaki Takahiro wrote:
> On Tue, Nov 9, 2010 at 12:44 PM, Jeff Davis<pgsql@j-davis.com>  wrote:
>> See case below. After the item length gets changed, then when reading
>> the tuple later you get a t_len that includes padding.
>
> We can easily find it with pageinspect:
>
> \i pageinspect.sql
> create table foo(i int4);
> insert into foo values(1);
> SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));
>   lp | lp_len
> ----+--------
>    1 |     28
> VACUUM FULL foo;
> SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));
>   lp | lp_len
> ----+--------
>    1 |     32
>
>> We should document in a comment that t_len can mean multiple things. Or,
>> we should fix raw_heap_insert() to be consistent with the rest of the
>> code, which doesn't MAXALIGN the t_len.
>
> We have a comment /* be conservative */ in the function, but I'm not sure
> we actually need the MAXALIGN. However, there would be almost no benefits
> to keep t_len in small value because we often treat memory in MAXALIGN unit.

Hmm, the conservatism at that point affects the free space calculations. 
I'm not sure if it makes any difference in practice, but I'm also not 
sure it doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.

This would be more in line with what the main heap_insert code does:

--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -641,7 +641,7 @@ raw_heap_insert(RewriteState state, HeapTuple tup)     }
     /* And now we can insert the tuple into the page */
-    newoff = PageAddItem(page, (Item) heaptup->t_data, len,
+    newoff = PageAddItem(page, (Item) heaptup->t_data, heaptup->t_len,                          InvalidOffsetNumber,
false,true);     if (newoff == InvalidOffsetNumber)         elog(ERROR, "failed to add tuple");
 

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: CLUSTER can change t_len

От
Greg Stark
Дата:
On Tue, Nov 9, 2010 at 10:20 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>>
>> We have a comment /* be conservative */ in the function, but I'm not sure
>> we actually need the MAXALIGN. However, there would be almost no benefits
>> to keep t_len in small value because we often treat memory in MAXALIGN
>> unit.
>
> Hmm, the conservatism at that point affects the free space calculations. I'm
> not sure if it makes any difference in practice, but I'm also not sure it
> doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.
>
> This would be more in line with what the main heap_insert code does:
>


Doesn't this cause assertion failures in heap_fill_tuple when the data
size isn't what's expected? I guess we never actually use the t_len
for later tuple reconstructions, we just recompute the needed size?
-- 
greg


Re: CLUSTER can change t_len

От
Heikki Linnakangas
Дата:
On 09.11.2010 15:57, Greg Stark wrote:
> On Tue, Nov 9, 2010 at 10:20 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>
>>> We have a comment /* be conservative */ in the function, but I'm not sure
>>> we actually need the MAXALIGN. However, there would be almost no benefits
>>> to keep t_len in small value because we often treat memory in MAXALIGN
>>> unit.
>>
>> Hmm, the conservatism at that point affects the free space calculations. I'm
>> not sure if it makes any difference in practice, but I'm also not sure it
>> doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.
>>
>> This would be more in line with what the main heap_insert code does:
>
> Doesn't this cause assertion failures in heap_fill_tuple when the data
> size isn't what's expected? I guess we never actually use the t_len
> for later tuple reconstructions, we just recompute the needed size?

Right, the length from t_len or the item pointer is never passed to 
heap_fill_tuple.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: CLUSTER can change t_len

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 09.11.2010 11:11, Itagaki Takahiro wrote:
>> We have a comment /* be conservative */ in the function, but I'm not sure
>> we actually need the MAXALIGN. However, there would be almost no benefits
>> to keep t_len in small value because we often treat memory in MAXALIGN unit.

> Hmm, the conservatism at that point affects the free space calculations. 
> I'm not sure if it makes any difference in practice, but I'm also not 
> sure it doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.

I tend to agree with Jeff's original point that the behavior should
match regular tuple insertion exactly.  This isn't about saving space,
because it won't; it's about not confusing readers by doing the same
thing in randomly different ways.  I will also note that the regular
path is FAR better tested than raw_heap_insert.  If there are any bugs
here, it's 1000:1 they're in raw_heap_insert not the regular path.
        regards, tom lane


Re: CLUSTER can change t_len

От
Heikki Linnakangas
Дата:
On 09.11.2010 17:14, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> On 09.11.2010 11:11, Itagaki Takahiro wrote:
>>> We have a comment /* be conservative */ in the function, but I'm not sure
>>> we actually need the MAXALIGN. However, there would be almost no benefits
>>> to keep t_len in small value because we often treat memory in MAXALIGN unit.
>
>> Hmm, the conservatism at that point affects the free space calculations.
>> I'm not sure if it makes any difference in practice, but I'm also not
>> sure it doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.
>
> I tend to agree with Jeff's original point that the behavior should
> match regular tuple insertion exactly.  This isn't about saving space,
> because it won't; it's about not confusing readers by doing the same
> thing in randomly different ways.  I will also note that the regular
> path is FAR better tested than raw_heap_insert.  If there are any bugs
> here, it's 1000:1 they're in raw_heap_insert not the regular path.

Agreed. I've committed my patch to make it behave like heap_insert. 
Thank you, Itagaki, for the easy test case using pageinspect.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com