Обсуждение: Getting the length of varlength data using PG_DETOAST_DATUM_SLICE or similar?

Поиск
Список
Период
Сортировка

Getting the length of varlength data using PG_DETOAST_DATUM_SLICE or similar?

От
Mark Dilger
Дата:
Hello, could anyone tell me, for a user contributed variable length data type, 
how can you access the length of the data without pulling the entire thing from 
disk?  Is there a function or macro for this?

As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail.  grep'ing through the release source
forversion 8.1.2, I find very little 
 
usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence 
little clue how they are intended to be used.)  The only files where I find them 
referenced are:
doc/src/sgml/xfunc.sgmlsrc/backend/utils/adt/varlena.csrc/include/fmgr.h


I am writing a variable length data type and trying to optimize the disk usage 
in certain functions.  There are cases where the return value of the function 
can be determined from the length of the data and a prefix of the data without 
fetching the whole data from disk.  (The prefix alone is insufficient -- I need 
to also know the length for the optimization to work.)

The first field of the data type is the length, as follows:
typedef struct datatype_foo {    int32 length;    char data[];} datatype_foo;

But when I fetch the function arguments using
datatype_foo * a = (datatype_foo *)    PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ);

the length field is set to the length of the fetched slice, not the length of 
the data as it exists on disk. Is there some other function that gets the length 
without pulling more than the first block?

Thanks for any insight,

--Mark


Re: Getting the length of varlength data using PG_DETOAST_DATUM_SLICE

От
Bruce Momjian
Дата:
Have you looked at the 8.1.X buildin function pg_column_size()?

---------------------------------------------------------------------------

Mark Dilger wrote:
> Hello, could anyone tell me, for a user contributed variable length data type, 
> how can you access the length of the data without pulling the entire thing from 
> disk?  Is there a function or macro for this?
> 
> As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail. 
>   grep'ing through the release source for version 8.1.2, I find very little 
> usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence 
> little clue how they are intended to be used.)  The only files where I find them 
> referenced are:
> 
>     doc/src/sgml/xfunc.sgml
>     src/backend/utils/adt/varlena.c
>     src/include/fmgr.h
> 
> 
> I am writing a variable length data type and trying to optimize the disk usage 
> in certain functions.  There are cases where the return value of the function 
> can be determined from the length of the data and a prefix of the data without 
> fetching the whole data from disk.  (The prefix alone is insufficient -- I need 
> to also know the length for the optimization to work.)
> 
> The first field of the data type is the length, as follows:
> 
>     typedef struct datatype_foo {
>         int32 length;
>         char data[];
>     } datatype_foo;
> 
> But when I fetch the function arguments using
> 
>     datatype_foo * a = (datatype_foo *)
>         PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ);
> 
> the length field is set to the length of the fetched slice, not the length of 
> the data as it exists on disk. Is there some other function that gets the length 
> without pulling more than the first block?
> 
> Thanks for any insight,
> 
> --Mark
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Getting the length of varlength data using

От
Jeremy Drake
Дата:
It looks like pg_column_size gives you the actual size on disk, ie after
compression.

What looks interesting for you would be byteaoctetlen or the function it
wraps, toast_raw_datum_size.  See src/backend/access/heap/tuptoaster.c.
pg_column_size calls toast_datum_size, while byteaoctetlen/textoctetlen
calls toast_raw_datum_size.



On Sat, 11 Feb 2006, Bruce Momjian wrote:

>
> Have you looked at the 8.1.X buildin function pg_column_size()?
>
> ---------------------------------------------------------------------------
>
> Mark Dilger wrote:
> > Hello, could anyone tell me, for a user contributed variable length data type,
> > how can you access the length of the data without pulling the entire thing from
> > disk?  Is there a function or macro for this?
> >
> > As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail.
> >   grep'ing through the release source for version 8.1.2, I find very little
> > usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence
> > little clue how they are intended to be used.)  The only files where I find them
> > referenced are:
> >
> >     doc/src/sgml/xfunc.sgml
> >     src/backend/utils/adt/varlena.c
> >     src/include/fmgr.h
> >
> >
> > I am writing a variable length data type and trying to optimize the disk usage
> > in certain functions.  There are cases where the return value of the function
> > can be determined from the length of the data and a prefix of the data without
> > fetching the whole data from disk.  (The prefix alone is insufficient -- I need
> > to also know the length for the optimization to work.)
> >
> > The first field of the data type is the length, as follows:
> >
> >     typedef struct datatype_foo {
> >         int32 length;
> >         char data[];
> >     } datatype_foo;
> >
> > But when I fetch the function arguments using
> >
> >     datatype_foo * a = (datatype_foo *)
> >         PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ);
> >
> > the length field is set to the length of the fetched slice, not the length of
> > the data as it exists on disk. Is there some other function that gets the length
> > without pulling more than the first block?
> >
> > Thanks for any insight,
> >
> > --Mark
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >        subscribe-nomail command to majordomo@postgresql.org so that your
> >        message can get through to the mailing list cleanly
> >
>
>

-- 
"Contrary to popular belief, penguins are not the salvation of modern
technology.  Neither do they throw parties for the urban proletariat."


Re: Getting the length of varlength data using PG_DETOAST_DATUM_SLICE

От
Mark Dilger
Дата:
Bruce Momjian wrote:
> Have you looked at the 8.1.X buildin function pg_column_size()?

Thanks Bruce for the lead.  I didn't know what to grep for; this helps.

The header comment for that function says
"Return the size of a datum, possibly compressed"

I take it the uncompressed length is not available -- that this is as close as 
I'm going to get.  I haven't traced through the function yet; maybe it does what 
I need.  I'll look at this some more now that I have a starting point.

Thanks again!

mark


Re: Getting the length of varlength data using PG_DETOAST_DATUM_SLICE

От
Mark Dilger
Дата:
Jeremy Drake wrote:
> It looks like pg_column_size gives you the actual size on disk, ie after
> compression.
> 
> What looks interesting for you would be byteaoctetlen or the function it
> wraps, toast_raw_datum_size.  See src/backend/access/heap/tuptoaster.c.
> pg_column_size calls toast_datum_size, while byteaoctetlen/textoctetlen
> calls toast_raw_datum_size.
> 
> 
> 
> On Sat, 11 Feb 2006, Bruce Momjian wrote:
> 
> 
>>Have you looked at the 8.1.X buildin function pg_column_size()?
>>
>>---------------------------------------------------------------------------
>>
>>Mark Dilger wrote:
>>
>>>Hello, could anyone tell me, for a user contributed variable length data type,
>>>how can you access the length of the data without pulling the entire thing from
>>>disk?  Is there a function or macro for this?
>>>
>>>As a first cut, I tried using the PG_DETOAST_DATUM_SLICE macro, but to no avail.
>>>  grep'ing through the release source for version 8.1.2, I find very little
>>>usage of the PG_GETARG_*_SLICE and PG_DETOAST_DATUM_SLICE macros (and hence
>>>little clue how they are intended to be used.)  The only files where I find them
>>>referenced are:
>>>
>>>    doc/src/sgml/xfunc.sgml
>>>    src/backend/utils/adt/varlena.c
>>>    src/include/fmgr.h
>>>
>>>
>>>I am writing a variable length data type and trying to optimize the disk usage
>>>in certain functions.  There are cases where the return value of the function
>>>can be determined from the length of the data and a prefix of the data without
>>>fetching the whole data from disk.  (The prefix alone is insufficient -- I need
>>>to also know the length for the optimization to work.)
>>>
>>>The first field of the data type is the length, as follows:
>>>
>>>    typedef struct datatype_foo {
>>>        int32 length;
>>>        char data[];
>>>    } datatype_foo;
>>>
>>>But when I fetch the function arguments using
>>>
>>>    datatype_foo * a = (datatype_foo *)
>>>        PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0),0,BLCKSZ);
>>>
>>>the length field is set to the length of the fetched slice, not the length of
>>>the data as it exists on disk. Is there some other function that gets the length
>>>without pulling more than the first block?
>>>
>>>Thanks for any insight,
>>>
>>>--Mark
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 1: if posting/reading through Usenet, please send an appropriate
>>>       subscribe-nomail command to majordomo@postgresql.org so that your
>>>       message can get through to the mailing list cleanly
>>>

Ok, for anyone following the thread, this code works for me:
    int true_size_arg_zero = toast_raw_datum_size(PG_GETARG_DATUM(0));    int true_size_arg_one  =
toast_raw_datum_size(PG_GETARG_DATUM(1));

Be sure to #include "access/tuptoaster.h"

Thanks Jeremy!