Обсуждение: Why is time with timezone 12 bytes?

Поиск
Список
Период
Сортировка

Why is time with timezone 12 bytes?

От
Josh Berkus
Дата:
All,

I was just checking on our year-2027 compliance, and happened to notice
that time with time zone takes up 12 bytes.  This seems peculiar, given
that timestamp with time zone is only 8 bytes, and at my count we only
need 5 for the time with microsecond precision.  What's up with that?

Also, what is the real range of our 8-byte *integer* timestamp?

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Why is time with timezone 12 bytes?

От
Thom Brown
Дата:
On 22 September 2010 22:01, Josh Berkus <josh@agliodbs.com> wrote:
> All,
>
> I was just checking on our year-2027 compliance, and happened to notice
> that time with time zone takes up 12 bytes.  This seems peculiar, given
> that timestamp with time zone is only 8 bytes, and at my count we only
> need 5 for the time with microsecond precision.  What's up with that?
>
> Also, what is the real range of our 8-byte *integer* timestamp?

The time is 8 bytes, (1,000,000 microseconds * 60 minutes, * 24 hours
= 1,440,000,000 microseconds = 31 bits = 8 bytes).

The timezone displacement takes up to 12 bits, meaning 3 bytes.
(1460+1459 = 2919 = 12 bits = 3 bytes).  So that's 11 bytes.  Not sure
where the extra 1 byte comes from.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


Re: Why is time with timezone 12 bytes?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> I was just checking on our year-2027 compliance, and happened to notice
> that time with time zone takes up 12 bytes.  This seems peculiar, given
> that timestamp with time zone is only 8 bytes, and at my count we only
> need 5 for the time with microsecond precision.  What's up with that?

I think it's an 8-byte seconds count plus 4 bytes to indicate the
timezone.  If this datatype had any actual real-world use then it might
be worth worrying about how big it is, but AFAICS its only excuse for
existence is to satisfy the SQL standard.

> Also, what is the real range of our 8-byte *integer* timestamp?

See the fine manual.  I believe the limits have more to do with
calendar arithmetic than with the nominal range of 2^64 microseconds.
        regards, tom lane


Re: Why is time with timezone 12 bytes?

От
Kenneth Marshall
Дата:
On Wed, Sep 22, 2010 at 10:54:53PM +0100, Thom Brown wrote:
> On 22 September 2010 22:01, Josh Berkus <josh@agliodbs.com> wrote:
> > All,
> >
> > I was just checking on our year-2027 compliance, and happened to notice
> > that time with time zone takes up 12 bytes. ?This seems peculiar, given
> > that timestamp with time zone is only 8 bytes, and at my count we only
> > need 5 for the time with microsecond precision. ?What's up with that?
> >
> > Also, what is the real range of our 8-byte *integer* timestamp?
> 
> The time is 8 bytes, (1,000,000 microseconds * 60 minutes, * 24 hours
> = 1,440,000,000 microseconds = 31 bits = 8 bytes).
> 

31 bits = approx. 4 bytes at 8 bits/byte, not 8 bytes.

> The timezone displacement takes up to 12 bits, meaning 3 bytes.
> (1460+1459 = 2919 = 12 bits = 3 bytes).  So that's 11 bytes.  Not sure
> where the extra 1 byte comes from.
> 
This would yield 7 bytes.

Ken


Re: Why is time with timezone 12 bytes?

От
Josh Berkus
Дата:
>> Also, what is the real range of our 8-byte *integer* timestamp?
> 
> See the fine manual.  I believe the limits have more to do with
> calendar arithmetic than with the nominal range of 2^64 microseconds.

I'm asking based on that.  The docs only give the limits for a *float*
timestamp.  I'd like to fix the docs, but I can only do it if I have the
data ...

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Why is time with timezone 12 bytes?

От
Thom Brown
Дата:
On 22 September 2010 22:58, Kenneth Marshall <ktm@rice.edu> wrote:
> On Wed, Sep 22, 2010 at 10:54:53PM +0100, Thom Brown wrote:
>> On 22 September 2010 22:01, Josh Berkus <josh@agliodbs.com> wrote:
>> > All,
>> >
>> > I was just checking on our year-2027 compliance, and happened to notice
>> > that time with time zone takes up 12 bytes. ?This seems peculiar, given
>> > that timestamp with time zone is only 8 bytes, and at my count we only
>> > need 5 for the time with microsecond precision. ?What's up with that?
>> >
>> > Also, what is the real range of our 8-byte *integer* timestamp?
>>
>> The time is 8 bytes, (1,000,000 microseconds * 60 minutes, * 24 hours
>> = 1,440,000,000 microseconds = 31 bits = 8 bytes).
>>
>
> 31 bits = approx. 4 bytes at 8 bits/byte, not 8 bytes.
>
>> The timezone displacement takes up to 12 bits, meaning 3 bytes.
>> (1460+1459 = 2919 = 12 bits = 3 bytes).  So that's 11 bytes.  Not sure
>> where the extra 1 byte comes from.
>>
> This would yield 7 bytes.

I think it's clear I should get some sleep. :S

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


Re: Why is time with timezone 12 bytes?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> Also, what is the real range of our 8-byte *integer* timestamp?
>> 
>> See the fine manual.  I believe the limits have more to do with
>> calendar arithmetic than with the nominal range of 2^64 microseconds.

> I'm asking based on that.  The docs only give the limits for a *float*
> timestamp.  I'd like to fix the docs, but I can only do it if I have the
> data ...

It's the same, because the limits are calendar based (particularly,
the Julian-date functions) and not dependent on the representation.
        regards, tom lane


Re: Why is time with timezone 12 bytes?

От
Josh Berkus
Дата:
> It's the same, because the limits are calendar based (particularly,
> the Julian-date functions) and not dependent on the representation.

Hmmm?  Just storing dates for the range described (until the year
294,000) takes 8bytes by my calculations.  And that's without the 3
bytes for the time zone.  Is my math off?

And, of course, this doesn't answer at all why time with time zone is so
huge.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Why is time with timezone 12 bytes?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
>> It's the same, because the limits are calendar based (particularly,
>> the Julian-date functions) and not dependent on the representation.

> Hmmm?  Just storing dates for the range described (until the year
> 294,000) takes 8bytes by my calculations.  And that's without the 3
> bytes for the time zone.  Is my math off?

timestamptz stores GMT; it doesn't store timezone separately.
(If it did, we'd need more than 8 bytes...)

> And, of course, this doesn't answer at all why time with time zone is so
> huge.

Because we haven't lifted a finger to optimize it.
        regards, tom lane


Re: Why is time with timezone 12 bytes?

От
Josh Berkus
Дата:
> timestamptz stores GMT; it doesn't store timezone separately.
> (If it did, we'd need more than 8 bytes...)

Oh, yeah.  Duh.

> Because we haven't lifted a finger to optimize it.

Well, that's a direct answer.  Ok, will put it in the list of "TODO next
time we change the on-disk format".

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Why is time with timezone 12 bytes?

От
Robert Haas
Дата:
On Wed, Sep 22, 2010 at 7:20 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> timestamptz stores GMT; it doesn't store timezone separately.
>> (If it did, we'd need more than 8 bytes...)
>
> Oh, yeah.  Duh.
>
>> Because we haven't lifted a finger to optimize it.
>
> Well, that's a direct answer.  Ok, will put it in the list of "TODO next
> time we change the on-disk format".

Technically, there's no reason why we can't do this for 9.1.  What we
can do is change the name of the "time with timezone" type to
something like "oldtimetz", keeping the current OID.  And then we can
add a new type called "time with timezone".  Then, with some suitable
hacking of pg_dump --binary-upgrade, I think it should be possible to
make timetz columns from pre-9.1 databases turn into oldtimetz columns
when pg_upgrade is used.  New applications will of course get the new
data type.  Then, in a future release (let's call it 10.0), we could
remove the oldtimetz types.  pg_upgrade would then forbid upgrades to
the release that used the old type, by complaining about (1) upgrades
from 8.4/9.0 with timetz columns, or (2) upgrades from 9.1+ with
oldtimetz columns.  Users would instead be instructed to upgrade to a
9.1+ release, use ALTER TABLE ... TYPE timetz, and then upgrade again.Obviously this could still be inconvenient for
someusers, but it's a 
lot better than breaking everything all at once: you have 5 or 10
years to find time to rewrite the table.

If someone decides to attack this, it would also be good to see about
reducing the typalign to something less than "d".  I am concerned that
all of that alignment is wasting a great deal of space on disk, which
is becoming ever more of a problem as people start to use PostgreSQL
with larger and larger databases.  It seems like the performance
benefit is likely pretty limited, too.  Copying a 64-bit value that is
only 4-byte aligned rather than 8-byte aligned should only be very
slightly slower if you do it as two 4-byte fetches rather than a
single 8-byte fetch, I would think, and it seems like a small price to
pay to avoid inserting as many as 7 padding bytes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Why is time with timezone 12 bytes?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Technically, there's no reason why we can't do this for 9.1.  What we
> can do is change the name of the "time with timezone" type to
> something like "oldtimetz", keeping the current OID.  And then we can
> add a new type called "time with timezone".  [ with large amounts of
> consequent work ]

I think you missed the point of my response, which is that there are
easily 10^6 more-pressing things to work on than the size of timetz.
Do you know of any actual use cases for it?
        regards, tom lane


Re: Why is time with timezone 12 bytes?

От
Robert Haas
Дата:
On Wed, Sep 22, 2010 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Technically, there's no reason why we can't do this for 9.1.  What we
>> can do is change the name of the "time with timezone" type to
>> something like "oldtimetz", keeping the current OID.  And then we can
>> add a new type called "time with timezone".  [ with large amounts of
>> consequent work ]
>
> I think you missed the point of my response, which is that there are
> easily 10^6 more-pressing things to work on than the size of timetz.
> Do you know of any actual use cases for it?

Well, I wasn't responding to you - I was responding to Josh.
Regardless of the merits of redesigning this particular data type, I
think it's important for us to cultivate a mindset of figuring out how
we can make gradual improvements to the on-disk format without
earth-shattering consequences for pg_upgrade.  Mind you, I don't
currently have the time to hack on this for, uh, more or less the
reason you state.  But if my boss told me he double my pay if I got it
done, I wouldn't tell him it's technically impossible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Why is time with timezone 12 bytes?

От
Josh Berkus
Дата:
On 9/22/10 6:00 PM, Tom Lane wrote:
> I think you missed the point of my response, which is that there are
> easily 106 more-pressing things to work on than the size of timetz.
> Do you know of any actual use cases for it?

It would be a good project to add to the list of "easy TODOs to get
started with."

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Why is time with timezone 12 bytes?

От
Bruce Momjian
Дата:
Josh Berkus wrote:
> On 9/22/10 6:00 PM, Tom Lane wrote:
> > I think you missed the point of my response, which is that there are
> > easily 106 more-pressing things to work on than the size of timetz.
> > Do you know of any actual use cases for it?
> 
> It would be a good project to add to the list of "easy TODOs to get
> started with."

Except for the pg_upgrade issue.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Why is time with timezone 12 bytes?

От
Robert Haas
Дата:
On Thu, Sep 23, 2010 at 1:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Josh Berkus wrote:
>> On 9/22/10 6:00 PM, Tom Lane wrote:
>> > I think you missed the point of my response, which is that there are
>> > easily 106 more-pressing things to work on than the size of timetz.
>> > Do you know of any actual use cases for it?
>>
>> It would be a good project to add to the list of "easy TODOs to get
>> started with."
>
> Except for the pg_upgrade issue.

Which is a big "except".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Why is time with timezone 12 bytes?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 23, 2010 at 1:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Josh Berkus wrote:
>>> It would be a good project to add to the list of "easy TODOs to get
>>> started with."

>> Except for the pg_upgrade issue.

> Which is a big "except".

Yeah.  That constraint is what leads me to think that the return on
effort is just not worth it.  Maybe we should file this in the category
of "things to look at next time we break on-disk compatibility".
        regards, tom lane


Re: Why is time with timezone 12 bytes?

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Sep 23, 2010 at 1:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> Josh Berkus wrote:
> >>> It would be a good project to add to the list of "easy TODOs to get
> >>> started with."
> 
> >> Except for the pg_upgrade issue.
> 
> > Which is a big "except".
> 
> Yeah.  That constraint is what leads me to think that the return on
> effort is just not worth it.  Maybe we should file this in the category
> of "things to look at next time we break on-disk compatibility".

Yes, I would like to see such a category on the TODO list.  Should I do
it?

FYI, I am please at the lack of serious problems with pg_upgrade. 
People are obviously using it because the are filing bug reports, but
none of them are serious, and relate to pilot error or odd
configurations.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Why is time with timezone 12 bytes?

От
Robert Haas
Дата:
On Thu, Sep 23, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Sep 23, 2010 at 1:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> Josh Berkus wrote:
>>>> It would be a good project to add to the list of "easy TODOs to get
>>>> started with."
>
>>> Except for the pg_upgrade issue.
>
>> Which is a big "except".
>
> Yeah.  That constraint is what leads me to think that the return on
> effort is just not worth it.  Maybe we should file this in the category
> of "things to look at next time we break on-disk compatibility".

I'm worried about how we're going to manage that.  First, as
pg_upgrade becomes more mature, the penalty for breaking on-disk
compatibility gets a LOT bigger.  I'd like to think that "the next
time we break on-disk compatibility" means approximately "never", or
at least "not for a very long time".  Second, if we do decide to break
it, how and when will we make that decision?  Are we just going to
decide to break it when we run into a feature that we really want that
can't be had any other way?  If we want to make breaking on-disk
compatibility something that only happens every 5 years or so, we had
better give people - I don't know, a year's notice - so that we can
really knock out everything people have any interest in fixing in one
release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Why is time with timezone 12 bytes?

От
Alvaro Herrera
Дата:
Excerpts from Robert Haas's message of jue sep 23 14:33:06 -0400 2010:

> I'm worried about how we're going to manage that.  First, as
> pg_upgrade becomes more mature, the penalty for breaking on-disk
> compatibility gets a LOT bigger.  I'd like to think that "the next
> time we break on-disk compatibility" means approximately "never", or
> at least "not for a very long time".  Second, if we do decide to break
> it, how and when will we make that decision?

I liked your earlier suggestion: if somebody wants to pg_upgrade, he
needs to go to the latest minor release of their branch, run some
command to upgrade the on-disk format (say ALTER TABLE / SET TYPE), and
*then* upgrade.

Now if it was workable to handle floating-point datetimes to integer
datetimes this way, it would be excellent.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pg_upgrade pain; was Re: Why is time with timezone 12 bytes?

От
Bruce Momjian
Дата:
Robert Haas wrote:
> On Thu, Sep 23, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Thu, Sep 23, 2010 at 1:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >>> Josh Berkus wrote:
> >>>> It would be a good project to add to the list of "easy TODOs to get
> >>>> started with."
> >
> >>> Except for the pg_upgrade issue.
> >
> >> Which is a big "except".
> >
> > Yeah. ?That constraint is what leads me to think that the return on
> > effort is just not worth it. ?Maybe we should file this in the category
> > of "things to look at next time we break on-disk compatibility".
> 
> I'm worried about how we're going to manage that.  First, as
> pg_upgrade becomes more mature, the penalty for breaking on-disk
> compatibility gets a LOT bigger.  I'd like to think that "the next
> time we break on-disk compatibility" means approximately "never", or
> at least "not for a very long time".  Second, if we do decide to break
> it, how and when will we make that decision?  Are we just going to
> decide to break it when we run into a feature that we really want that
> can't be had any other way?  If we want to make breaking on-disk
> compatibility something that only happens every 5 years or so, we had
> better give people - I don't know, a year's notice - so that we can
> really knock out everything people have any interest in fixing in one
> release.

Let me come clean and explain that I am worried pg_upgrade has limited
our ability to make data format changes.  

pg_upgrade is much more accepted now than I think anyone expected a year
ago.  Our users are now going to complain if pg_upgrade upgrades are not
supported in future releases, which eventually is going to cause us
problems.

I think having binary upgrades for 9.0 was a big features, and got
mentioned in the press release, but let's not kid ourselves that we
aren't going down a road that might be paved with pain.

We have explored all sorts of ideas to mitigate the pain, like new data
type oids and reading (writing?) old data format pages, but that is all
untested territory.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade pain; was Re: Why is time with timezone 12 bytes?

От
"Joshua D. Drake"
Дата:
On Thu, 2010-09-23 at 15:20 -0400, Bruce Momjian wrote:

> > decide to break it when we run into a feature that we really want that
> > can't be had any other way?  If we want to make breaking on-disk
> > compatibility something that only happens every 5 years or so, we had
> > better give people - I don't know, a year's notice - so that we can
> > really knock out everything people have any interest in fixing in one
> > release.
>
> Let me come clean and explain that I am worried pg_upgrade has limited
> our ability to make data format changes.

It is nice to see hackers finally realizing that this is true (and
required).

>
> pg_upgrade is much more accepted now than I think anyone expected a year
> ago.  Our users are now going to complain if pg_upgrade upgrades are not
> supported in future releases, which eventually is going to cause us
> problems.

"us" being -hackers yes, but it will only help the community.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: pg_upgrade pain; was Re: Why is time with timezone 12 bytes?

От
Bruce Momjian
Дата:
Joshua D. Drake wrote:
> On Thu, 2010-09-23 at 15:20 -0400, Bruce Momjian wrote:
> 
> > > decide to break it when we run into a feature that we really want that
> > > can't be had any other way?  If we want to make breaking on-disk
> > > compatibility something that only happens every 5 years or so, we had
> > > better give people - I don't know, a year's notice - so that we can
> > > really knock out everything people have any interest in fixing in one
> > > release.
> > 
> > Let me come clean and explain that I am worried pg_upgrade has limited
> > our ability to make data format changes.  
> 
> It is nice to see hackers finally realizing that this is true (and
> required).

It is like credit card companies offering customer perks to encourage
vendors to accept credit cards.  It is not something vendors set out to
do, but it becomes a customer disappointment if they don't comply.

> > pg_upgrade is much more accepted now than I think anyone expected a year
> > ago.  Our users are now going to complain if pg_upgrade upgrades are not
> > supported in future releases, which eventually is going to cause us
> > problems.
> 
> "us" being -hackers yes, but it will only help the community.

Right.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: pg_upgrade pain; was Re: Why is time with timezone12 bytes?

От
"Joshua D. Drake"
Дата:
On Thu, 2010-09-23 at 15:20 -0400, Bruce Momjian wrote:

> > decide to break it when we run into a feature that we really want that
> > can't be had any other way?  If we want to make breaking on-disk
> > compatibility something that only happens every 5 years or so, we had
> > better give people - I don't know, a year's notice - so that we can
> > really knock out everything people have any interest in fixing in one
> > release.
> 
> Let me come clean and explain that I am worried pg_upgrade has limited
> our ability to make data format changes.  

It is nice to see hackers finally realizing that this is true (and
required).

> 
> pg_upgrade is much more accepted now than I think anyone expected a year
> ago.  Our users are now going to complain if pg_upgrade upgrades are not
> supported in future releases, which eventually is going to cause us
> problems.

"us" being -hackers yes, but it will only help the community.

Sincerely,

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt



Re: Why is time with timezone 12 bytes?

От
Martijn van Oosterhout
Дата:
On Thu, Sep 23, 2010 at 02:33:06PM -0400, Robert Haas wrote:
> I'm worried about how we're going to manage that.  First, as
> pg_upgrade becomes more mature, the penalty for breaking on-disk
> compatibility gets a LOT bigger.  I'd like to think that "the next
> time we break on-disk compatibility" means approximately "never", or
> at least "not for a very long time".  Second, if we do decide to break
> it, how and when will we make that decision?  Are we just going to
> decide to break it when we run into a feature that we really want that
> can't be had any other way?

I don't think we should be discouraging people from trying to solve the
problems though. Given that you know that eventually there will come a
time that you need to change the format, it seems it would be better to
have one release that changes a datatype, one that changes the page
layout, etc, rather than doing one release that breaks everything at
once.

From a pure testing point of view, if someone coded up a multi-OID
approach to data type upgrades and it was actually tested in a release
that would give a lot more confidence than changing everything at once.

So, regular small changes rather than one big change that nobody wants
to try.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle