Обсуждение: nulls

Поиск

Список

Период

Сортировка

nulls

От

"James B. Byrne"

Дата:

12 марта 2009 г., 15:42:56

I am confronting a design decision involving null values and I
cannot seem to discern which way to go.  Therefore, I would like
some commentary based on real world experience.

The basic issue is episodic duration, expressed as columns named
dt_effective_from and dt_superseded_after.  Both are datetime types
containing values normalized to utc.  You see where this is going.

The issue is what to enter when the value is known to be unknown, as
in some indeterminate future date, which may be never.  I read that
relational set values should never be null, as null is indeterminate
for WHERE clauses and may result in unexpected results.  On the
other hand, setting some artificially excessive future date seems in
its place seems, to me, to have its own problems.

Since this issue must have been dealt with time and time again in
the past I would like to know what, if any, consensus has been
reached on the matter.  What is the best way to proceed?

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

Re: nulls

От

Richard Huxton

Дата:

12 марта 2009 г., 16:00:40

James B. Byrne wrote:
> The basic issue is episodic duration, expressed as columns named
> dt_effective_from and dt_superseded_after.  Both are datetime types
> containing values normalized to utc.  You see where this is going.
>
> The issue is what to enter when the value is known to be unknown, as

If it's unknown use null.

> in some indeterminate future date, which may be never.

That's not unknown that's "in the future".

> I read that
> relational set values should never be null, as null is indeterminate
> for WHERE clauses and may result in unexpected results.

Only if you use it to mean something other than unknown. If you have an
event that starts '2001-01-01 01:01:01+01' and ends "null" then you can
confidently say "don't know" as to how long that event is.

>  On the
> other hand, setting some artificially excessive future date seems in
> its place seems, to me, to have its own problems.

Which is where you reach for the handy datetime literals as described below:
 select 'infinity'::timestamp without time zone;
 select '-infinity'::timestamp without time zone;

--
  Richard Huxton
  Archonet Ltd

Re: nulls

От

justin

Дата:

12 марта 2009 г., 16:04:48

Both ways will work.

Setting the superseded_after to a far off future will work but will have
to set to a real date when it actual does become superseded.

The same is true for nulls  It boils down to how you and the users want
to look at the data.

To me to get the all the records that has not been superseded yet is
simple either way.

Select * from some_table where superseded_after IS NULL

or
Select * from some_table where superseded_after > current_date

James B. Byrne wrote:
> I am confronting a design decision involving null values and I
> cannot seem to discern which way to go.  Therefore, I would like
> some commentary based on real world experience.
>
> The basic issue is episodic duration, expressed as columns named
> dt_effective_from and dt_superseded_after.  Both are datetime types
> containing values normalized to utc.  You see where this is going.
>
> The issue is what to enter when the value is known to be unknown, as
> in some indeterminate future date, which may be never.  I read that
> relational set values should never be null, as null is indeterminate
> for WHERE clauses and may result in unexpected results.  On the
> other hand, setting some artificially excessive future date seems in
> its place seems, to me, to have its own problems.
>
> Since this issue must have been dealt with time and time again in
> the past I would like to know what, if any, consensus has been
> reached on the matter.  What is the best way to proceed?
>
>

Re: nulls

От

Bill Moran

Дата:

12 марта 2009 г., 16:17:16

In response to "James B. Byrne" <byrnejb@harte-lyne.ca>:
>
> The basic issue is episodic duration, expressed as columns named
> dt_effective_from and dt_superseded_after.  Both are datetime types
> containing values normalized to utc.  You see where this is going.
>
> The issue is what to enter when the value is known to be unknown, as
> in some indeterminate future date, which may be never.  I read that
> relational set values should never be null, as null is indeterminate
> for WHERE clauses and may result in unexpected results.

Where'd you read that?  There are very specific rules for how NULL
behaves within WHERE clauses.  I expect that whoever wrote what you
read didn't know those rules, or wrote them based on a DB system that's
less well documented than PostgreSQL.

Of course, if you don't know the rules, you don't know what to expect,
thus the results can be unexpected.

> On the
> other hand, setting some artificially excessive future date seems in
> its place seems, to me, to have its own problems.

IMHO, the use of magic values should have been deprecated in the 70s.
NULL means NULL, infinity means infinity.  Trying to use NULL to mean
anything other than "no value provided" is begging for trouble.  Putting
magic values in like "infinity in this field means that we have not
yet calculated the value" or something similar is equally problematic.

> Since this issue must have been dealt with time and time again in
> the past I would like to know what, if any, consensus has been
> reached on the matter.  What is the best way to proceed?

Depends on what you're trying to accomplish.  However, the most
unambiguous way to solve the problem is to solve it unambiguously:
create a BOOLEAN column called superseded with a default
value of false, and create a trigger that sets it to true when
a value is inserted in dt_superseded_after.  You can now explicitly
filter on that column to avoid any unexpected NULLishness, and your
queries will read more like English than some constructed inside-
joke language.

Compare:

SELECT * FROM duration WHERE dt_superseded_after IS NULL;

to

SELECT * FROM duration WHERE NOT superseded;

In a larger, more complex query the difference in readability will
be even more marked.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: nulls

От

Jeff Davis

Дата:

12 марта 2009 г., 18:16:42

On Thu, 2009-03-12 at 11:32 -0400, James B. Byrne wrote:
> The basic issue is episodic duration, expressed as columns named
> dt_effective_from and dt_superseded_after.  Both are datetime types
> containing values normalized to utc.  You see where this is going.

I assume you're concern is NULL for the "superseded_after" field and not
the "effective_from" field.

Specifying an "unknown" superseded_after date is not really expressing
what you really want to express. First of all, if it is not superseded,
that is not the same thing as "it has been superseded, I just don't know
when it was superseded". Second, even if you know that it will be
superseded at some point in the future, you know (at a minimum) that
will be superseded beyond the effective_from date, so "unknown" is not
expressing everything that you do know.

An approach you might consider is horizontal partitioning, that is, use
two tables, one for episodes that are complete, and another for episodes
that are in progress. This will allow you to express everything that you
do know, and it's also convenient for writing queries that are easier to
understand. Additionally, it has the nice property that queries on
in-progress data will be more efficient.

You can use NULLs if there's some reason you want to combine the two
tables, but then you have to be careful to not use NULLs for other
purposes (e.g. be careful when using outer joins or NULL to really mean
"unknown" or "not applicable" in some other sense). In other words, if
you do use NULL, you are actually imparting more meaning on NULL than it
would ordinarily have, so you have to be careful not to confuse things.

I found the book "Temporal Data and the Relational Model" by C.J. Date,
et al., very helpful. There's a section about this problem specifically
in the book, but I don't have it nearby so I don't have a page number.

Regards,
    Jeff Davis

Re: nulls

От

"James B. Byrne"

Дата:

12 марта 2009 г., 21:51:13

On Thu, March 12, 2009 12:00, Richard Huxton wrote:
> James B. Byrne wrote:
>> The basic issue is episodic duration, expressed as columns named
>> dt_effective_from and dt_superseded_after.  Both are datetime
>> types containing values normalized to utc.
>
> If it's unknown use null.
>
>> in some indeterminate future date, which may be never.
>
> That's not unknown that's "in the future".
>
> If you have an event that starts '2001-01-01 01:01:01+01' and
> ends "null" then you can confidently say "don't know" as to how
> long that event is.
>
>> On the other hand, setting some artificially excessive future
>> date seems in its place seems, to me, to have its own problems.
>
> Which is where you reach for the handy datetime literals as
> described below:
>  select 'infinity'::timestamp without time zone;
>  select '-infinity'::timestamp without time zone;
>

Ahh. I did not know about infinity as a value for datetime columns.
This I will investigate further.

For brevity's sake, and because I was focused on my own situation, I
did not give sufficient context for what is happening.  This issue
relates to taxation. In the regime I must deal with, taxation rates
are often set some considerable time before they come into effect.
They are also sometimes related to specific commodities for specific
times of the year.

For example, tomato forcings may be taxed at 5% from 20090101T0500Z
until 20090401T0359Z, then at 0% from 20090401T0400Z to ? (until the
minister decides to withdraw the preferential rate).

Sometime in January we may, in fact we will, know when the
preferential rate will come into effect. However, sometimes not even
the ministry itself knows when this will be revoked and the regular
tax rate reinstated.  On the other hand, there are instances when
the preferential treatment period is delimited from the outset so
that the effective and the superseded date are both known well in
advance.  The point being is that the necessary database updates can
be input well in advance of their being used.

Because of these real world constraints I do not see how we could
use a Boolean flag to indicate whether a rate is active or not.
What rate is in effect is entirely dependent upon the transaction
date and that date on occasion may be considerably in the past.  So
we are forced to determine the effective rate with a conditional
(assuming that we use null to mean unset) along the lines of:

select c.e_tx_rate
  from commodity_tax_rates c
  where c.commodity_class = <commodity_class>
    and c.date_effective_from <= <transaction_date>
    and
       (c.date_superseded_after IS NULL
        or
        c.date_superseded_after <= <transaction_date>)

Please forgive any syntax errors. This is obviously not a piece of
real code, for one thing the selection criteria involves far more
than commodity class and transaction_date. It serves but to
demonstrate one approach to the problem.

I infer that if instead we set the unknown superseded date to
infinity (or some arbitrary large date far in the future) then the
query is simplified to:

...
    and c.date_effective_from >= <transaction_date>
    and c.date_superseded_after <= <transaction_date>

Have I understood things aright?

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

Re: nulls

От

"A.M."

Дата:

12 марта 2009 г., 22:39:34

On Mar 12, 2009, at 5:50 PM, James B. Byrne wrote:
>
> ...
>    and c.date_effective_from >= <transaction_date>
>    and c.date_superseded_after <= <transaction_date>
>
> Have I understood things aright?

The one problem I foresee is that changes to the commodity_tax_rates
table may not reflect in transaction dates that have passed. What
happens if a tax is retroactively ended or applied outside these
barriers? Is this tax calculation frozen at the date of the "best
information we have"? If so, you might consider an insert-only table
and linking the tax decision to the row that happened to be in effect
at any time. This would likely require adding a column indicating when
the row was inserted.

Cheers,
M

Re: nulls

От

"James B. Byrne"

Дата:

13 марта 2009 г., 13:46:34

On: Thu, 12 Mar 2009 18:39:27 -0400, "A.M."
<agentm@themactionfaction.com> wrote:
>
> The one problem I foresee is that changes to the commodity_tax_rates
> table may not reflect in transaction dates that have passed. What
> happens if a tax is retroactively ended or applied outside these
> barriers? Is this tax calculation frozen at the date of the "best
> information we have"? If so, you might consider an insert-only table
> and linking the tax decision to the row that happened to be in
> effect at any time. This would likely require adding a column
> indicating when the row was inserted.

The only thing that can result in a change of rate after the point
of remittance is a re-classification of the commodity or treatment,
which would then attract the rate for the redetermined
classification in effect at the time of the original transaction.
This may occur years (up to 5 or 7 I believe) after the transaction.

All tax tables have audit columns that record the timestamp of
insertion and the timestamp of the last update and the respective
userids.  The changes themselves are logged separately.

Based on the discussion here I have decided to stick with nulls for
unspecified superseded dates.  There are very few places in the
system where nulls are permitted so each occurrence is exceptional
and has its own contextual meaning.

I appreciate very much the references given me to temporal SQL
writings.  I found these most illuminating.

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: nulls