Обсуждение: "long" type is not appropriate for counting tuples

Поиск
Список
Период
Сортировка

"long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
Commit ab0dfc961b6 used a "long" variable within _bt_load() to count
the number of tuples entered into a B-Tree index as it is built. This
will not work as expected on Windows, even on 64-bit Windows, because
"long" is only 32-bits wide. It's far from impossible that you'd have
~2 billion index tuples when building a new index.

Programmers use "long" because it is assumed to be wider than "int"
(even though that isn't required by the standard, and isn't true
across all of the platforms we support). The use of "long" seems
inherently suspect given our constraints, except perhaps in the
context of sizing work_mem-based allocations, where it is used as part
of a semi-standard idiom...albeit one that only works because of the
restrictions on work_mem size on Windows.

ISTM that we should try to come up with a way of making code like this
work, rather than placing the burden on new code to get it right. This
exact issue has bitten users on a number of occasions that I can
recall. There is also a hidden landmine that we know about but haven't
fixed: logtape.c, which will break on Windows with very very large
index builds.

Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE
INDEX (issued fixed by commit aa551830). I suppose that "off_t" is
really a variant of the same problem.

--
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> Commit ab0dfc961b6 used a "long" variable within _bt_load() to count
> the number of tuples entered into a B-Tree index as it is built. This
> will not work as expected on Windows, even on 64-bit Windows, because
> "long" is only 32-bits wide.

Right.  "long" used to be our convention years ago, but these days
tuple counts should be int64 or perhaps uint64.  See e.g. 23a27b039.

> ISTM that we should try to come up with a way of making code like this
> work, rather than placing the burden on new code to get it right.

Other than "use the right datatype", I'm not sure what we can do?
In the meantime, somebody should fix ab0dfc961b6 ...

> Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE
> INDEX (issued fixed by commit aa551830). I suppose that "off_t" is
> really a variant of the same problem.

Hmm, why is this a problem?  We should only use off_t for actual file
access operations, and we don't use files greater than 1GB.  (There's a
reason for that.)

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
On Sun, Apr 28, 2019 at 4:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > ISTM that we should try to come up with a way of making code like this
> > work, rather than placing the burden on new code to get it right.
>
> Other than "use the right datatype", I'm not sure what we can do?

Ambiguity seems like the real problem here. If we could impose a
general rule that you cannot use "long" (perhaps with some limited
wiggle-room), then a lint-like tool could catch bugs like this. This
may not be that difficult. Nobody is particularly concerned about
performance on 32-bit platforms these days.

> In the meantime, somebody should fix ab0dfc961b6 ...

I'll leave this to Alvaro.

> Hmm, why is this a problem?  We should only use off_t for actual file
> access operations, and we don't use files greater than 1GB.  (There's a
> reason for that.)

The issue that was fixed by commit aa551830 showed this assumption to
be kind of brittle. Admittedly this is not as clear-cut as the "long"
issue, and might not be worth worrying about. I don't want to go as
far as requiring explicit width integer types in all situations, since
that seems totally impractical, and without any real upside. But it
would be nice to identify integer types where there is a real risk of
making an incorrect assumption, and then eliminate that risk once and
for all.

-- 
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On 2019-04-28 19:24:59 -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > ISTM that we should try to come up with a way of making code like this
> > work, rather than placing the burden on new code to get it right.
> 
> Other than "use the right datatype", I'm not sure what we can do?
> In the meantime, somebody should fix ab0dfc961b6 ...

I think we should start by just removing all uses of long. There's
really no excuse for them today, and a lot of them are bugs waiting to
happen.  And then either just add a comment to the coding style, or even
better a small script, to prevent them from being re-used.


> > Also, "off_t" is only 32-bits on Windows, which broke parallel CREATE
> > INDEX (issued fixed by commit aa551830). I suppose that "off_t" is
> > really a variant of the same problem.
> 
> Hmm, why is this a problem?  We should only use off_t for actual file
> access operations, and we don't use files greater than 1GB.  (There's a
> reason for that.)

We read from larger files in a few places though. E.g. pg_dump. Most
places really just should use pgoff_t...

Greetings,

Andres Freund



Re: "long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
On Mon, Apr 29, 2019 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
> I think we should start by just removing all uses of long. There's
> really no excuse for them today, and a lot of them are bugs waiting to
> happen.

I like the idea of banning "long" altogether. It will probably be hard
to keep it out of third party code that we vendor-in, or even code
that interfaces with libraries in some way, but it should be removed
from everything else. It actually doesn't seem particularly hard to do
so, based on a quick grep of src/backend/. Most uses of "long" is code
that sizes something in local memory, where "long" works for the same
reason as it works when calculating the size of a work_mem allocation
-- ugly, but correct. A few uses of "long" seem to be real live bugs,
albeit bugs that are very unlikely to ever hit.

_h_indexbuild() has the same bug as _bt_load(), also due to commit
ab0dfc961b6 -- I spotted that in passing when I used grep.

> We read from larger files in a few places though. E.g. pg_dump. Most
> places really just should use pgoff_t...

I wasn't even aware of pgoff_t. It is only used in frontend utilities
that I don't know that much about, whereas off_t is used all over the
backend code.

-- 
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On 2019-04-29 10:16:39 -0700, Peter Geoghegan wrote:
> On Mon, Apr 29, 2019 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
> > I think we should start by just removing all uses of long. There's
> > really no excuse for them today, and a lot of them are bugs waiting to
> > happen.
> 
> I like the idea of banning "long" altogether. It will probably be hard
> to keep it out of third party code that we vendor-in, or even code
> that interfaces with libraries in some way, but it should be removed
> from everything else.

I don't think any of the code we've vendored in where we also track
upstream, actually uses long in a meaningful amount.  And putside of
backward compatibility, I don't think there's many libraries that still
use it.


> > We read from larger files in a few places though. E.g. pg_dump. Most
> > places really just should use pgoff_t...
> 
> I wasn't even aware of pgoff_t. It is only used in frontend utilities
> that I don't know that much about, whereas off_t is used all over the
> backend code.

Yea, we've some delightful hackery to make that work:

 * WIN32 does not provide 64-bit off_t, but does provide the functions operating
 * with 64-bit offsets.
 */
#define pgoff_t __int64
#ifdef _MSC_VER
#define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin)
#define ftello(stream) _ftelli64(stream)
#else
#ifndef fseeko
#define fseeko(stream, offset, origin) fseeko64(stream, offset, origin)
#endif
#ifndef ftello
#define ftello(stream) ftello64(stream)
#endif
#endif

which also shows that the compatibility hackery is fairly limited.


Thomas, ISTM, that pg_pread() etc should rather take the offset as a
uint64 or such. And then actually initialize OFFSET.offsetHigh.

Greetings,

Andres Freund



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Mon, Apr 29, 2019 at 8:11 AM Andres Freund <andres@anarazel.de> wrote:
>> I think we should start by just removing all uses of long. There's
>> really no excuse for them today, and a lot of them are bugs waiting to
>> happen.

> I like the idea of banning "long" altogether. It will probably be hard
> to keep it out of third party code that we vendor-in, or even code
> that interfaces with libraries in some way, but it should be removed
> from everything else. It actually doesn't seem particularly hard to do
> so, based on a quick grep of src/backend/. Most uses of "long" is code
> that sizes something in local memory, where "long" works for the same
> reason as it works when calculating the size of a work_mem allocation
> -- ugly, but correct.

There's more to that than you might realize.  For example, guc.c
enforces a limit on work_mem that's designed to ensure that
expressions like "work_mem * 1024L" won't overflow, and there are
similar choices elsewhere.  I'm not sure if we want to go to the
effort of rethinking that; it's not really a bug, though it does
result in 64-bit Windows being more restricted than it has to be.

Trying to get rid of type-L constants along with more explicit
uses of "long" would be a PITA I'm afraid.

Another problem is that while "%lu" format specifiers are portable,
INT64_FORMAT is a *big* pain, not least because you can't put it into
translatable strings without causing problems.  To the extent that
we could go over to "%zu" instead, maybe this could be finessed,
but blind "s/long/int64/g" isn't going to be any fun.

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On 2019-04-29 13:32:13 -0400, Tom Lane wrote:
> There's more to that than you might realize.  For example, guc.c
> enforces a limit on work_mem that's designed to ensure that
> expressions like "work_mem * 1024L" won't overflow, and there are
> similar choices elsewhere.  I'm not sure if we want to go to the
> effort of rethinking that; it's not really a bug, though it does
> result in 64-bit Windows being more restricted than it has to be.

Hm, but why does that require the use of long? We could fairly trivially
define a type that's guaranteed to be 32 bit on 32 bit platforms, and 64
bit on 64 bit platforms.  Even a dirty hack like using intptr_t instead
of long would be better than using long.


> Another problem is that while "%lu" format specifiers are portable,
> INT64_FORMAT is a *big* pain, not least because you can't put it into
> translatable strings without causing problems.  To the extent that
> we could go over to "%zu" instead, maybe this could be finessed,
> but blind "s/long/int64/g" isn't going to be any fun.

Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in
translated strings. Perhaps we should implement them in our printf, and
then replace all use of INT64_FORMAT with that?

I've not tested the gettext code, but it's there:

/* Expand a system dependent string segment.  Return NULL if unsupported.  */
static const char *
get_sysdep_segment_value (const char *name)
{
  /* Test for an ISO C 99 section 7.8.1 format string directive.
     Syntax:
     P R I { d | i | o | u | x | X }
     { { | LEAST | FAST } { 8 | 16 | 32 | 64 } | MAX | PTR }  */
  /* We don't use a table of 14 times 6 'const char *' strings here, because
     data relocations cost startup time.  */
  if (name[0] == 'P' && name[1] == 'R' && name[2] == 'I')
...

Greetings,

Andres Freund



Re: "long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
On Mon, Apr 29, 2019 at 10:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There's more to that than you might realize.  For example, guc.c
> enforces a limit on work_mem that's designed to ensure that
> expressions like "work_mem * 1024L" won't overflow, and there are
> similar choices elsewhere.

I was aware of that, but I wasn't aware of how many places that bleeds
into until I checked just now.

It would be nice if we could figure out how to make it obvious that
the idioms around the use of long for work_mem stuff are idioms that
have a specific rationale. It's pretty confusing as things stand.

-- 
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-29 13:32:13 -0400, Tom Lane wrote:
>> There's more to that than you might realize.  For example, guc.c
>> enforces a limit on work_mem that's designed to ensure that
>> expressions like "work_mem * 1024L" won't overflow, and there are
>> similar choices elsewhere.  I'm not sure if we want to go to the
>> effort of rethinking that; it's not really a bug, though it does
>> result in 64-bit Windows being more restricted than it has to be.

> Hm, but why does that require the use of long? We could fairly trivially
> define a type that's guaranteed to be 32 bit on 32 bit platforms, and 64
> bit on 64 bit platforms.  Even a dirty hack like using intptr_t instead
> of long would be better than using long.

The point is that

(a) work_mem is an int; adding support to GUC for some other integer
width would be an unreasonable amount of overhead.

(b) "1024L" is a nice simple non-carpal-tunnel-inducing way to get
the right width of the product, for some value of "right".

If we don't want to rely on "L" constants then we'll have to write these
cases like "work_mem * (size_t) 1024" which is ugly, lots more keystrokes,
and prone to weird precedence problems unless you throw even more
keystrokes (parentheses) at it.  I'm not excited about doing that just
to allow larger work_mem settings on Win64.

(But if we do go in this direction, maybe some notation like
#define KILOBYTE ((size_t) 1024)
would help.)

I'm not suggesting that we don't need to fix uses of "long" for tuple
counts, and perhaps other things.  But I think getting rid of it in memory
size calculations might be a lot of work for not a lot of reward.

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
On Mon, Apr 29, 2019 at 11:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If we don't want to rely on "L" constants then we'll have to write these
> cases like "work_mem * (size_t) 1024" which is ugly, lots more keystrokes,
> and prone to weird precedence problems unless you throw even more
> keystrokes (parentheses) at it.  I'm not excited about doing that just
> to allow larger work_mem settings on Win64.

I don't think that anybody cares about Win64 very much. Simplifying
the code might lead to larger work_mem settings on that platform, but
that's not the end goal I have in mind. For me, the end goal is
simpler code.

> I'm not suggesting that we don't need to fix uses of "long" for tuple
> counts, and perhaps other things.  But I think getting rid of it in memory
> size calculations might be a lot of work for not a lot of reward.

Whether or not *fully* banning the use of "long" is something that
will simplify the code is debatable. However, we could substantially
reduce the use of "long" across the backend without any real downside.
The work_mem question can be considered later. Does that seem
reasonable?

-- 
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
Alvaro Herrera
Дата:
On 2019-Apr-28, Peter Geoghegan wrote:

> Commit ab0dfc961b6 used a "long" variable within _bt_load() to count
> the number of tuples entered into a B-Tree index as it is built. This
> will not work as expected on Windows, even on 64-bit Windows, because
> "long" is only 32-bits wide. It's far from impossible that you'd have
> ~2 billion index tuples when building a new index.

Agreed.  Here's a patch.  I see downthread that you also discovered the
same mistake in _h_indexbuild by grepping for "long"; I got to it by
examining callers of pgstat_progress_update_param and
pgstat_progress_update_multi_param.  I didn't find any other mistakes of
the same ilk.  Some codesites use "double" instead of "int64", but those
are not broken.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On 2019-04-29 11:18:49 -0700, Peter Geoghegan wrote:
> On Mon, Apr 29, 2019 at 11:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > If we don't want to rely on "L" constants then we'll have to write these
> > cases like "work_mem * (size_t) 1024" which is ugly, lots more keystrokes,
> > and prone to weird precedence problems unless you throw even more
> > keystrokes (parentheses) at it.  I'm not excited about doing that just
> > to allow larger work_mem settings on Win64.
> 
> I don't think that anybody cares about Win64 very much.

I seriously doubt this assertion. Note that the postgres packages on
https://www.postgresql.org/download/windows/ do not support 32bit
windows anymore (edb from 11 onwards, bigsql apparently always). And I
think there's a pretty substantial number of windows users out there.

Greetings,

Andres Freund



Re: "long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
On Mon, Apr 29, 2019 at 11:20 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Agreed.  Here's a patch.  I see downthread that you also discovered the
> same mistake in _h_indexbuild by grepping for "long"; I got to it by
> examining callers of pgstat_progress_update_param and
> pgstat_progress_update_multi_param.  I didn't find any other mistakes of
> the same ilk.  Some codesites use "double" instead of "int64", but those
> are not broken.

This seems fine, though FWIW I probably would have gone with int64
instead of uint64. There is generally no downside to using int64, and
being to support negative integers can be useful in some contexts
(though not this context).

-- 
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
Peter Geoghegan
Дата:
On Mon, Apr 29, 2019 at 11:24 AM Andres Freund <andres@anarazel.de> wrote:
> > I don't think that anybody cares about Win64 very much.
>
> I seriously doubt this assertion. Note that the postgres packages on
> https://www.postgresql.org/download/windows/ do not support 32bit
> windows anymore (edb from 11 onwards, bigsql apparently always). And I
> think there's a pretty substantial number of windows users out there.

I was talking about the motivation behind this thread, and I suppose
that I included you in that based on things you've said about Windows
in the past (apparently I shouldn't have done so).

I am interested in making the code less complicated. If we can remove
the work_mem kludge for Windows as a consequence of that, then so much
the better.

-- 
Peter Geoghegan



Re: "long" type is not appropriate for counting tuples

От
David Rowley
Дата:
On Tue, 30 Apr 2019 at 06:28, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Apr 29, 2019 at 11:20 AM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Agreed.  Here's a patch.  I see downthread that you also discovered the
> > same mistake in _h_indexbuild by grepping for "long"; I got to it by
> > examining callers of pgstat_progress_update_param and
> > pgstat_progress_update_multi_param.  I didn't find any other mistakes of
> > the same ilk.  Some codesites use "double" instead of "int64", but those
> > are not broken.
>
> This seems fine, though FWIW I probably would have gone with int64
> instead of uint64. There is generally no downside to using int64, and
> being to support negative integers can be useful in some contexts
> (though not this context).

CopyFrom() returns uint64. I think it's better to be consistent in the
types we use to count tuples in commands.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Alvaro Herrera
Дата:
On 2019-Apr-30, David Rowley wrote:

> On Tue, 30 Apr 2019 at 06:28, Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Mon, Apr 29, 2019 at 11:20 AM Alvaro Herrera
> > <alvherre@2ndquadrant.com> wrote:
> > > Agreed.  Here's a patch.  I see downthread that you also discovered the
> > > same mistake in _h_indexbuild by grepping for "long"; I got to it by
> > > examining callers of pgstat_progress_update_param and
> > > pgstat_progress_update_multi_param.  I didn't find any other mistakes of
> > > the same ilk.  Some codesites use "double" instead of "int64", but those
> > > are not broken.
> >
> > This seems fine, though FWIW I probably would have gone with int64
> > instead of uint64. There is generally no downside to using int64, and
> > being to support negative integers can be useful in some contexts
> > (though not this context).
> 
> CopyFrom() returns uint64. I think it's better to be consistent in the
> types we use to count tuples in commands.

That's not a bad argument ... but I still committed it as int64, mostly
because that's what pgstat_progress_update_param takes.  Anyway, these
are just local variables, not return values, so it's easily changeable
if we determine (??) that unsigned is better.

I don't know if anybody plans to do progress report for COPY, but I hope
we don't find ourselves in a problem when some user claims that they are
inserting more than 2^63 but less than 2^64 tuples.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I don't know if anybody plans to do progress report for COPY, but I hope
> we don't find ourselves in a problem when some user claims that they are
> inserting more than 2^63 but less than 2^64 tuples.

At one tuple per nanosecond, it'd take a shade under 300 years to
reach 2^63.  Seems like a problem for our descendants to worry about.

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-04-29 19:32, Tom Lane wrote:
> Another problem is that while "%lu" format specifiers are portable,
> INT64_FORMAT is a *big* pain, not least because you can't put it into
> translatable strings without causing problems.  To the extent that
> we could go over to "%zu" instead, maybe this could be finessed,
> but blind "s/long/int64/g" isn't going to be any fun.

Since we control our own snprintf now, this could probably be addressed
somehow, right?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-04-29 19:52, Andres Freund wrote:
> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in
> translated strings.

That won't work in non-GNU gettext.

> Perhaps we should implement them in our printf, and
> then replace all use of INT64_FORMAT with that?

But this might.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On May 22, 2019 7:39:41 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>On 2019-04-29 19:32, Tom Lane wrote:
>> Another problem is that while "%lu" format specifiers are portable,
>> INT64_FORMAT is a *big* pain, not least because you can't put it into
>> translatable strings without causing problems.  To the extent that
>> we could go over to "%zu" instead, maybe this could be finessed,
>> but blind "s/long/int64/g" isn't going to be any fun.
>
>Since we control our own snprintf now, this could probably be addressed
>somehow, right?

z is for size_t though? Not immediately first how It'd help us?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On May 22, 2019 7:39:41 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>> On 2019-04-29 19:32, Tom Lane wrote:
>>> Another problem is that while "%lu" format specifiers are portable,
>>> INT64_FORMAT is a *big* pain, not least because you can't put it into
>>> translatable strings without causing problems.  To the extent that
>>> we could go over to "%zu" instead, maybe this could be finessed,
>>> but blind "s/long/int64/g" isn't going to be any fun.

>> Since we control our own snprintf now, this could probably be addressed
>> somehow, right?

> z is for size_t though? Not immediately first how It'd help us?

Yeah, z doesn't reliably translate to int64 either, so it's only useful
when the number you're trying to print is a memory object size.

I don't really see how controlling snprintf is enough to get somewhere
on this.  Sure we could invent some new always-64-bit length modifier
and teach snprintf.c about it, but none of the other tools we use
would know about it.  I don't want to give up compiler cross-checking
of printf formats, do you?

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On May 22, 2019 7:40:28 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>On 2019-04-29 19:52, Andres Freund wrote:
>> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in
>> translated strings.
>
>That won't work in non-GNU gettext.

Which of those do currently work with postgres?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-05-22 18:59, Andres Freund wrote:
> On May 22, 2019 7:40:28 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>> On 2019-04-29 19:52, Andres Freund wrote:
>>> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in
>>> translated strings.
>>
>> That won't work in non-GNU gettext.
> 
> Which of those do currently work with postgres?

I don't know what the current situation is, but in the past we've been
getting complaints when using GNU-specific features, mostly from Solaris
I think.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-05-22 17:52, Tom Lane wrote:
> I don't really see how controlling snprintf is enough to get somewhere
> on this.  Sure we could invent some new always-64-bit length modifier
> and teach snprintf.c about it, but none of the other tools we use
> would know about it.  I don't want to give up compiler cross-checking
> of printf formats, do you?

Could we define int64 to be long long int on all platforms and just
always use %lld?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Could we define int64 to be long long int on all platforms and just
> always use %lld?

Hmmm ... maybe.  Once upon a time we had to cope with compilers
that didn't have "long long", but perhaps that time is past.

Another conceivable hazard is somebody deciding that the world
needs a platform where "long long" is 128 bits.  I don't know
how likely that is to happen.

As a first step, we could try asking configure to compute
sizeof(long long) and seeing what the buildfarm makes of that.

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-05-22 21:21, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> Could we define int64 to be long long int on all platforms and just
>> always use %lld?
> 
> Hmmm ... maybe.  Once upon a time we had to cope with compilers
> that didn't have "long long", but perhaps that time is past.

It's required by C99, and the configure test for C99 checks it.

> Another conceivable hazard is somebody deciding that the world
> needs a platform where "long long" is 128 bits.  I don't know
> how likely that is to happen.

Another option is that in cases where it doesn't affect storage layouts,
like the counting tuples case that started this thread, code could just
use long long int directly instead of int64.  Then if someone wants to
make it 128 bits or 96 bits or whatever it would not be a problem.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Robert Haas
Дата:
On Thu, May 23, 2019 at 5:31 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Another option is that in cases where it doesn't affect storage layouts,
> like the counting tuples case that started this thread, code could just
> use long long int directly instead of int64.  Then if someone wants to
> make it 128 bits or 96 bits or whatever it would not be a problem.

I think that sort of thing tends not to work out well, because at some
point it's likely to be sent out via the wire protocol; at that point
we'll need a value of a certain width.  Better to use that width right
from the beginning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-05-23 15:52, Robert Haas wrote:
> On Thu, May 23, 2019 at 5:31 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> Another option is that in cases where it doesn't affect storage layouts,
>> like the counting tuples case that started this thread, code could just
>> use long long int directly instead of int64.  Then if someone wants to
>> make it 128 bits or 96 bits or whatever it would not be a problem.
> 
> I think that sort of thing tends not to work out well, because at some
> point it's likely to be sent out via the wire protocol; at that point
> we'll need a value of a certain width.  Better to use that width right
> from the beginning.

Hmm, by that argument, we shouldn't ever use any integer type other than
int16, int32, and int64.

I'm thinking for example that pgbench makes a lot of use of int64 and
printing that out makes quite messy code.  Replacing that by long long
int would make this much nicer and should be pretty harmless relative to
your concern.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-05-23 15:52, Robert Haas wrote:
>> On Thu, May 23, 2019 at 5:31 AM Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> Another option is that in cases where it doesn't affect storage layouts,
>>> like the counting tuples case that started this thread, code could just
>>> use long long int directly instead of int64.  Then if someone wants to
>>> make it 128 bits or 96 bits or whatever it would not be a problem.

>> I think that sort of thing tends not to work out well, because at some
>> point it's likely to be sent out via the wire protocol; at that point
>> we'll need a value of a certain width.  Better to use that width right
>> from the beginning.

> Hmm, by that argument, we shouldn't ever use any integer type other than
> int16, int32, and int64.
> I'm thinking for example that pgbench makes a lot of use of int64 and
> printing that out makes quite messy code.  Replacing that by long long
> int would make this much nicer and should be pretty harmless relative to
> your concern.

It does seem attractive to use long long in cases where we're not too
fussed about the exact width.  OTOH, that reasoning was exactly why we
used "long" in a lot of places back in the day, and sure enough it came
back to bite us.

On the whole I think I could live with a policy that says "tuple counts
shall be 'long long' when being passed around in code, but for persistent
storage or wire-protocol transmission, use 'int64'".

An alternative and much narrower policy is to say it's okay to do this
with an int64 value:

    printf("processed %lld tuples", (long long) count);

In such code, all we're assuming is long long >= 64 bits, which
is completely safe per C99, and we dodge the need for a
platform-varying format string.

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-05-23 16:34, Tom Lane wrote:
> On the whole I think I could live with a policy that says "tuple counts
> shall be 'long long' when being passed around in code, but for persistent
> storage or wire-protocol transmission, use 'int64'".
> 
> An alternative and much narrower policy is to say it's okay to do this
> with an int64 value:
> 
>     printf("processed %lld tuples", (long long) count);
> 
> In such code, all we're assuming is long long >= 64 bits, which
> is completely safe per C99, and we dodge the need for a
> platform-varying format string.

Some combination of this seems quite reasonable.

Attached is a patch to implement this in a handful of cases that are
particularly verbose right now.  I think those are easy wins.

(Also a second patch that makes use of %zu for size_t where this was not
yet done.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: "long" type is not appropriate for counting tuples

От
Robert Haas
Дата:
On Thu, May 23, 2019 at 10:21 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Hmm, by that argument, we shouldn't ever use any integer type other than
> int16, int32, and int64.

I think we basically shouldn't.  I mean it's fine to use 'int' as a
flags argument as part of an internal API, or as a loop counter
private to a function or something.  But if you are passing around
values that involve on-disk compatibility or wire protocol
compatibility, it's just a recipe for bugs.  If the code has to
sometimes cast a value to some other type, somebody may do it wrong.
If there's a uniform rule that tuple counts are always int64, that's
pretty easy to understand.

In short, when a certain kind of value is widely-used, it should have
a clearly-declared width.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: "long" type is not appropriate for counting tuples

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Attached is a patch to implement this in a handful of cases that are
> particularly verbose right now.  I think those are easy wins.
> (Also a second patch that makes use of %zu for size_t where this was not
> yet done.)

I took a look through these and see nothing objectionable.  There are
probably more places that can be improved, but we need not insist on
getting every such place in one go.

Per Robert's position that variables ought to have well-defined widths,
there might be something to be said for not touching the variable
declarations that you changed from int64 to long long, and instead
casting them to long long in the sprintf calls.  But I'm not really
convinced that that's better than what you've done.

Marked CF entry as ready-for-committer.

            regards, tom lane



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-07-02 22:56, Tom Lane wrote:
> I took a look through these and see nothing objectionable.  There are
> probably more places that can be improved, but we need not insist on
> getting every such place in one go.
> 
> Per Robert's position that variables ought to have well-defined widths,
> there might be something to be said for not touching the variable
> declarations that you changed from int64 to long long, and instead
> casting them to long long in the sprintf calls.  But I'm not really
> convinced that that's better than what you've done.
> 
> Marked CF entry as ready-for-committer.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: "long" type is not appropriate for counting tuples

От
Andres Freund
Дата:
Hi,

On 2019-05-22 16:40:28 +0200, Peter Eisentraut wrote:
> On 2019-04-29 19:52, Andres Freund wrote:
> > Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in
> > translated strings.
> 
> That won't work in non-GNU gettext.

Which of those do we actually support? We already depend on
bind_textdomain_codeset, which afaict wasn't present in older gettext
implementations.

- Andres



Re: "long" type is not appropriate for counting tuples

От
Peter Eisentraut
Дата:
On 2019-08-14 20:17, Andres Freund wrote:
> On 2019-05-22 16:40:28 +0200, Peter Eisentraut wrote:
>> On 2019-04-29 19:52, Andres Freund wrote:
>>> Hm. It appears that gettext supports expanding PRId64 PRIu64 etc in
>>> translated strings.
>>
>> That won't work in non-GNU gettext.
> 
> Which of those do we actually support? We already depend on
> bind_textdomain_codeset, which afaict wasn't present in older gettext
> implementations.

At least in theory we support Solaris gettext.  In the past we
occasionally got complaints when we broke something there.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services