Обсуждение: PG_RETURN_?

Поиск
Список
Период
Сортировка

PG_RETURN_?

От
Don Y
Дата:
Hi,

I have a set of functions for a data type that return
small integers (i.e. [0..12]).  I can, of course, represent
it as a char, short or long (CHAR, INT16 or INT32).
re there any advantages/drawbacks to chosing one particular
PG_RETURN_ type over another (realizing that they are
effectively just casts)?

Thanks!
--don

Re: PG_RETURN_?

От
Richard Huxton
Дата:
Don Y wrote:
> Hi,
>
> I have a set of functions for a data type that return
> small integers (i.e. [0..12]).  I can, of course, represent
> it as a char, short or long (CHAR, INT16 or INT32).
> re there any advantages/drawbacks to chosing one particular
> PG_RETURN_ type over another (realizing that they are
> effectively just casts)?

If they are integers then an int would be the obvious choice. If you are
going to treat them as int2 outside the function then int2, otherwise
just integer. Oh, it's int2/int4 not int16/int32.
--
   Richard Huxton
   Archonet Ltd

Re: PG_RETURN_?

От
Don Y
Дата:
Richard Huxton wrote:
> Don Y wrote:
>> Hi,
>>
>> I have a set of functions for a data type that return
>> small integers (i.e. [0..12]).  I can, of course, represent
>> it as a char, short or long (CHAR, INT16 or INT32).
>> re there any advantages/drawbacks to chosing one particular
>> PG_RETURN_ type over another (realizing that they are
>> effectively just casts)?
>
> If they are integers then an int would be the obvious choice. If you are
> going to treat them as int2 outside the function then int2, otherwise
> just integer.

Yes, I was more interested in what might be going on "behind the
scenes" inside the server that could bias my choice of WHICH
integer type to use.  E.g., if arguments are marshalled as
byte arrays vs. as Datum arrays, etc.  (I would suspect the
latter).  Since I could use something as small as a char to
represent the values, the choice is more interested in how
OTHER things would be affected...

 > Oh, it's int2/int4 not int16/int32.

The *data type* is int2/int4 but the PG_RETURN_? macro is
PG_RETURN_INT16 or PG_RETURN_INT32 -- hence the reason
I referred to them as "CHAR, INT16 or INT32" instead of
"char, int2 or int4"  :>

--don


Re: PG_RETURN_?

От
Richard Huxton
Дата:
Don Y wrote:
> Richard Huxton wrote:
>> Don Y wrote:
>>> Hi,
>>>
>>> I have a set of functions for a data type that return
>>> small integers (i.e. [0..12]).  I can, of course, represent
>>> it as a char, short or long (CHAR, INT16 or INT32).
>>> re there any advantages/drawbacks to chosing one particular
>>> PG_RETURN_ type over another (realizing that they are
>>> effectively just casts)?
>>
>> If they are integers then an int would be the obvious choice. If you
>> are going to treat them as int2 outside the function then int2,
>> otherwise just integer.
>
> Yes, I was more interested in what might be going on "behind the
> scenes" inside the server that could bias my choice of WHICH
> integer type to use.  E.g., if arguments are marshalled as
> byte arrays vs. as Datum arrays, etc.  (I would suspect the
> latter).  Since I could use something as small as a char to
> represent the values, the choice is more interested in how
> OTHER things would be affected...

I must admit I've never tested, but I strongly suspect any differences
will be below the level you can accurately measure. Certainly from the
point of view of 8/16/32 bit integers I'd guess they'd all time the same
(they should all end up as a Datum). With a 64-bit CPU I'd guess that
would extend to 64 bits too. Hmm - looking at comments it seems int64 is
  a reference type regardless of CPU (include/postgres.h)

>  > Oh, it's int2/int4 not int16/int32.
>
> The *data type* is int2/int4 but the PG_RETURN_? macro is
> PG_RETURN_INT16 or PG_RETURN_INT32 -- hence the reason
> I referred to them as "CHAR, INT16 or INT32" instead of
> "char, int2 or int4"  :>

You're quite right. I was thinking from the other side.

--
   Richard Huxton
   Archonet Ltd

Re: PG_RETURN_?

От
Martijn van Oosterhout
Дата:
On Tue, May 02, 2006 at 08:43:03AM -0700, Don Y wrote:
> Richard Huxton wrote:
> >Don Y wrote:
> >>Hi,
> >>
> >>I have a set of functions for a data type that return
> >>small integers (i.e. [0..12]).  I can, of course, represent
> >>it as a char, short or long (CHAR, INT16 or INT32).
> >>re there any advantages/drawbacks to chosing one particular
> >>PG_RETURN_ type over another (realizing that they are
> >>effectively just casts)?
> >
> >If they are integers then an int would be the obvious choice. If you are
> >going to treat them as int2 outside the function then int2, otherwise
> >just integer.
>
> Yes, I was more interested in what might be going on "behind the
> scenes" inside the server that could bias my choice of WHICH
> integer type to use.  E.g., if arguments are marshalled as
> byte arrays vs. as Datum arrays, etc.  (I would suspect the
> latter).  Since I could use something as small as a char to
> represent the values, the choice is more interested in how
> OTHER things would be affected...

You should always *always* match the PG_RETURN_* to the declared type
you are returning. anything else will cause problems. PG_RETURN_INT16
means "return in a format consistant with a type declared as
pass-by-value two byte width". PostgreSQL does not check that what
you're returning actually matches what you declared.

The type as declared determines the storage required to store it. That
might be a far more useful factor to consider than what it copied
internally which, as has been pointed out, is probably below what you
can measure.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: PG_RETURN_?

От
Don Y
Дата:
Martijn van Oosterhout wrote:
> On Tue, May 02, 2006 at 08:43:03AM -0700, Don Y wrote:
>> Richard Huxton wrote:
>>> Don Y wrote:
>>>> Hi,
>>>>
>>>> I have a set of functions for a data type that return
>>>> small integers (i.e. [0..12]).  I can, of course, represent
>>>> it as a char, short or long (CHAR, INT16 or INT32).
>>>> re there any advantages/drawbacks to chosing one particular
>>>> PG_RETURN_ type over another (realizing that they are
>>>> effectively just casts)?
>>> If they are integers then an int would be the obvious choice. If you are
>>> going to treat them as int2 outside the function then int2, otherwise
>>> just integer.
>> Yes, I was more interested in what might be going on "behind the
>> scenes" inside the server that could bias my choice of WHICH
>> integer type to use.  E.g., if arguments are marshalled as
>> byte arrays vs. as Datum arrays, etc.  (I would suspect the
>> latter).  Since I could use something as small as a char to
>> represent the values, the choice is more interested in how
>> OTHER things would be affected...
>
> You should always *always* match the PG_RETURN_* to the declared type
> you are returning. anything else will cause problems. PG_RETURN_INT16
> means "return in a format consistant with a type declared as
> pass-by-value two byte width". PostgreSQL does not check that what
> you're returning actually matches what you declared.

Yes, but that wasn't the question.

I can PG_RETURN_CHAR(2), PG_RETURN_INT16(2) or PG_RETURN_INT32(2)
and end up with the same result (assuming the function is defined
to return char, int2 or int4, respectively in the SQL interface).

> The type as declared determines the storage required to store it. That

Yes, but for a function returning a value that does not exceed
sizeof(Datum), there is no *space* consequence.  I would assume
most modern architectures use 32 bit (and larger) registers.

OTOH, some machines incur a (tiny) penalty for casting char to long.
Returning INT32 *may* be better from that standpoint -- assuming
there is no added offsetting cost marshalling.

> might be a far more useful factor to consider than what it copied
> internally which, as has been pointed out, is probably below what you
> can measure.

Sure.  But, given that the difference ONLY amounts to whether
I type "INT32" or "INT16" or "CHAR" in the PG_RETURN_ macro,
an understanding of what is going on "inside" can contribute
epsilon for or against performance.  I'd be annoyed to have
built dozens of functions ASSUMING "INT32" when a *better*
assumption might have been "CHAR"...  (I'm working in an
embedded environment where "spare CPU cycles" mean you've
wasted $$$ on hardware that you don't need  :-/ )

--don

Re: PG_RETURN_?

От
Martijn van Oosterhout
Дата:
On Tue, May 02, 2006 at 10:06:19AM -0700, Don Y wrote:
> >The type as declared determines the storage required to store it. That
>
> Yes, but for a function returning a value that does not exceed
> sizeof(Datum), there is no *space* consequence.  I would assume
> most modern architectures use 32 bit (and larger) registers.

When you return a Datum, it's always the same size. When you're
returning a string, you're still returning a Datum, which may be 4 or 8
bytes depending on the platform.

But what I was referring to was the space to store the data in a tuple
on disk, or to send the data to a client. These are affected by the
choice of representation.

> OTOH, some machines incur a (tiny) penalty for casting char to long.
> Returning INT32 *may* be better from that standpoint -- assuming
> there is no added offsetting cost marshalling.

Within the backend the only representations used are Datum and tuples.
I don't think either of them would have a noticable difference between
various pass-by-value formats.

> ... I'd be annoyed to have
> built dozens of functions ASSUMING "INT32" when a *better*
> assumption might have been "CHAR"...  (I'm working in an
> embedded environment where "spare CPU cycles" mean you've
> wasted $$$ on hardware that you don't need  :-/ )

Hmm, postgres doesn't try to save on cycles. the philosophy is to get
it right first, then make it fast. The entire fmgr interface is slower
than the original design (old-style functions), but this design works
on all platforms whereas the old one didn't.

I'd go for INT32, it's most likely to be an "int" which should be "the
most natural size for the machine".

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: PG_RETURN_?

От
Don Y
Дата:
Martijn van Oosterhout wrote:
> On Tue, May 02, 2006 at 10:06:19AM -0700, Don Y wrote:
>>> The type as declared determines the storage required to store it. That
>> Yes, but for a function returning a value that does not exceed
>> sizeof(Datum), there is no *space* consequence.  I would assume
>> most modern architectures use 32 bit (and larger) registers.
>
> When you return a Datum, it's always the same size. When you're
> returning a string, you're still returning a Datum, which may be 4 or 8
> bytes depending on the platform.

Yes.

> But what I was referring to was the space to store the data in a tuple
> on disk, or to send the data to a client. These are affected by the
> choice of representation.

So, as I had mentioned before, you marshall as a *byte* stream
and not a *Datum* stream?

>> OTOH, some machines incur a (tiny) penalty for casting char to long.
>> Returning INT32 *may* be better from that standpoint -- assuming
>> there is no added offsetting cost marshalling.
>
> Within the backend the only representations used are Datum and tuples.
> I don't think either of them would have a noticable difference between
> various pass-by-value formats.
>
>> ... I'd be annoyed to have
>> built dozens of functions ASSUMING "INT32" when a *better*
>> assumption might have been "CHAR"...  (I'm working in an
>> embedded environment where "spare CPU cycles" mean you've
>> wasted $$$ on hardware that you don't need  :-/ )
>
> Hmm, postgres doesn't try to save on cycles.

<grin> Yes, I noticed.  :>  But it's hard for me to get this
"attitude" out of the way I approach a problem.  :-(
(e.g., I wouldn't count people at a rally using a *float*!  :>)

 > the philosophy is to get
> it right first, then make it fast. The entire fmgr interface is slower
> than the original design (old-style functions), but this design works
> on all platforms whereas the old one didn't.

Exactly.  I could more "efficiently" replace postgres with
dedicated structures to do what I want.  But, that ties my
implementation down to one less portable (and maintainable).

> I'd go for INT32, it's most likely to be an "int" which should be "the
> most natural size for the machine".

(sigh)  Yes, I suppose so.  Though it can have a big impact
on transport delays (server to client) if things really
are marshalled as byte streams, etc.

<shrug>  I suppose I should just "do it" and let technology
catch up with my inefficiencies later!

Thanks!
--don