Обсуждение: empty text fields

Поиск
Список
Период
Сортировка

empty text fields

От
"Leif B. Kristensen"
Дата:
On a tangent to the never-ending NULL debate, I've got a table:

CREATE TABLE events (
    event_id            INTEGER PRIMARY KEY,
    tag_fk              INTEGER REFERENCES tags (tag_id),
    place_fk            INTEGER REFERENCES places (place_id),
    event_date          CHAR(18) NOT NULL DEFAULT '000000003000000001',
    sort_date           DATE NOT NULL DEFAULT '40041024BC',
    event_note          TEXT NOT NULL DEFAULT ''
);

The event_note will contain text in roughly 1 out of 5 rows:

pgslekt=> select count(*) from events;
 count
-------
 29473
(1 row)

pgslekt=> select count(*) from events where event_note <> '';
 count
-------
  5572
(1 row)

I wonder if this is sane design, in theory and in practice, or should I
break out the event_note field in a separate table?
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE

Re: empty text fields

От
David Fetter
Дата:
On Wed, Jun 28, 2006 at 05:14:42PM +0200, Leif B. Kristensen wrote:
> On a tangent to the never-ending NULL debate, I've got a table:
>
> CREATE TABLE events (
>     event_id            INTEGER PRIMARY KEY,
>     tag_fk              INTEGER REFERENCES tags (tag_id),
>     place_fk            INTEGER REFERENCES places (place_id),
>     event_date          CHAR(18) NOT NULL DEFAULT '000000003000000001',
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is Bad™.  What's wrong with TIMESTAMPTZ?

>     sort_date           DATE NOT NULL DEFAULT '40041024BC',
>     event_note          TEXT NOT NULL DEFAULT ''

This is generally bad, too.  It's got MySQL goo all over it.  Do you
want NOT NULL, or do you want a DEFAULT that's meaningful?

> );
>
> The event_note will contain text in roughly 1 out of 5 rows:
>
> pgslekt=> select count(*) from events;
>  count
> -------
>  29473
> (1 row)
>
> pgslekt=> select count(*) from events where event_note <> '';
>  count
> -------
>   5572
> (1 row)
>
> I wonder if this is sane design, in theory and in practice, or should I
> break out the event_note field in a separate table?

Only if it's a 1:N relationship.  In this case, I'd say scrap the NOT
NULL requirement and replace the empty strings with NULLs.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: empty text fields

От
"Leif B. Kristensen"
Дата:
On Wednesday 28. June 2006 17:37, David Fetter wrote:
>On Wed, Jun 28, 2006 at 05:14:42PM +0200, Leif B. Kristensen wrote:
>> On a tangent to the never-ending NULL debate, I've got a table:
>>
>> CREATE TABLE events (
>>     event_id            INTEGER PRIMARY KEY,
>>     tag_fk              INTEGER REFERENCES tags (tag_id),
>>     place_fk            INTEGER REFERENCES places (place_id),
>>     event_date          CHAR(18) NOT NULL DEFAULT
>> '000000003000000001',
>
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>This is Bad™.  What's wrong with TIMESTAMPTZ?

What's Bad™?

The event_date field is a "fuzzy date" construct. It will allow the
storage of such dates as "1784", "ca. 1810", "May 1852", "1798 or
1799", "between 1820 and 1830" and so on. It's very useful in
historical research to handle such dates meaningfully.

By the way, I was also going to ask sometime if there's a better way to
handle such a construct than an unspecified CHAR(18) column.

>>     sort_date           DATE NOT NULL DEFAULT '40041024BC',
>>     event_note          TEXT NOT NULL DEFAULT ''
>
>This is generally bad, too.  It's got MySQL goo all over it.  Do you
>want NOT NULL, or do you want a DEFAULT that's meaningful?

Sure. I came to PostgreSQL about a year ago. I've been using MySQL for
read-only data since 2002, but when I started writing my own data-entry
routines, I found the "quiet truncation" misfeature of MySQL to render
it all but useless.

>> I wonder if this is sane design, in theory and in practice, or
>> should I break out the event_note field in a separate table?
>
>Only if it's a 1:N relationship.  In this case, I'd say scrap the NOT
>NULL requirement and replace the empty strings with NULLs.

Thanks. I'll think it over.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE

Re: empty text fields

От
Karsten Hilbert
Дата:
On Wed, Jun 28, 2006 at 06:25:22PM +0200, Leif B. Kristensen wrote:

> >>     event_date          CHAR(18) NOT NULL DEFAULT

> The event_date field is a "fuzzy date" construct. It will allow the
> storage of such dates as "1784", "ca. 1810", "May 1852", "1798 or
> 1799", "between 1820 and 1830" and so on. It's very useful in
> historical research to handle such dates meaningfully.
How and where do you handle the fuzziness of it ? In the
application ? We have to deal with the same thing in medical
history data and haven't yet satisfactorily solved it.

> By the way, I was also going to ask sometime if there's a better way to
> handle such a construct than an unspecified CHAR(18) column.

A composite type comes to mind. Tagged types (google for
"tagged_type") would probably help, too.

A "full-blown" implementation of a fuzzy timestamp type which

a) preserves the input/update timestamp
b) allows setting the accuracy of the value per row
c) allows for known "modifiers" and "terms" ("mid-summer", "second half of ...")
d) allows for an arbitrary textual addition ("ca.", "probably")

would be great. I know I should be helping to write one
instead of hoping someone does it for me. But I lack the
expertise to do it properly. I am willing to help, at any
rate, though.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: empty text fields

От
Karsten Hilbert
Дата:
self-replying with additional information

On Wed, Jun 28, 2006 at 09:33:18PM +0200, Karsten Hilbert wrote:

> A "full-blown" implementation of a fuzzy timestamp type which
>
> a) preserves the input/update timestamp
which tagged_types is able to handle

> b) allows setting the accuracy of the value per row
which I have a Python (application) class for wrapping mx.DateTime.DateTime

> c) allows for known "modifiers" and "terms" ("mid-summer", "second half of ...")

> d) allows for an arbitrary textual addition ("ca.", "probably")
which tagged_types should be able to handle as well

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: empty text fields

От
"Leif B. Kristensen"
Дата:
On Wednesday 28. June 2006 21:33, Karsten Hilbert wrote:
>On Wed, Jun 28, 2006 at 06:25:22PM +0200, Leif B. Kristensen wrote:
>> >>     event_date          CHAR(18) NOT NULL DEFAULT
>>
>> The event_date field is a "fuzzy date" construct. It will allow the
>> storage of such dates as "1784", "ca. 1810", "May 1852", "1798 or
>> 1799", "between 1820 and 1830" and so on. It's very useful in
>> historical research to handle such dates meaningfully.
>
>How and where do you handle the fuzziness of it ? In the
>application ? We have to deal with the same thing in medical
>history data and haven't yet satisfactorily solved it.

For now, I'm handling it in the application. But as I'm very much into
moving as much as possible into views, functions and triggers of the
db, I'm investigating how to do it in a more efficient way.

The construct has been lifted pretty much unmodified from "The Master
Genealogist" (TMG), the genealogy program I was using until I decided
to write my own. The format of the string is YYYYMMDDAYYYYMMDDB. The
YYYMMDD is a "normal" date, where the day may be set to 00 if the month
is known, and the month likewise set to 00 if only the year is known.
The A is a qualifier, which can take the following values:

     0 = before (date1),
     1 = say (date1),
     2 = ca. (date1),
     3 = exact (date1),
     4 = after (date1),
     5 = between (date1) and (date2),
     6 = (date1) or (date2),
     7 = from (date1) to (date2).

The B can be either 0 for BC or 1 for AD.

>> By the way, I was also going to ask sometime if there's a better way
>> to handle such a construct than an unspecified CHAR(18) column.
>
>A composite type comes to mind. Tagged types (google for
>"tagged_type") would probably help, too.
>
>A "full-blown" implementation of a fuzzy timestamp type which
>
>a) preserves the input/update timestamp
>b) allows setting the accuracy of the value per row
>c) allows for known "modifiers" and "terms" ("mid-summer", "second
> half of ...") d) allows for an arbitrary textual addition ("ca.",
> "probably")
>
>would be great. I know I should be helping to write one
>instead of hoping someone does it for me. But I lack the
>expertise to do it properly. I am willing to help, at any
>rate, though.

I haven't felt any need to enter more irregular dates than those
described above. I think that if a date is that much undetermined, it
would be better to render it textually in a free-text note.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE

Re: empty text fields

От
Alban Hertroys
Дата:
Leif B. Kristensen wrote:
>      0 = before (date1),
>      1 = say (date1),
>      2 = ca. (date1),
>      3 = exact (date1),
>      4 = after (date1),
>      5 = between (date1) and (date2),
>      6 = (date1) or (date2),
>      7 = from (date1) to (date2).

This kind of reeks like a begin/end date and an accuracy quantifier,
though that wouldn't account for option 6.

Your cases 0 to 5 and 7 would be transformed into something like:

    date A    date B     accuracy
0: NULL      date1      5
1: date1     date1      3
2: date1     date1      2   (is 'ca.' less accurate than 'say'?)
3: date1     date1      5
4: date1     NULL       5
5: date1     date2      2
7: date1     date2      5

Where I defined '5' as being accurate, and lower values less accurate.
You may want to use values with a wider spread, it'll allow more
fuzziness about how sure you are about a certain date.

Just my 2 centims.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

timestamp with definable accuracy, was: Re: empty text fields

От
Karsten Hilbert
Дата:
On Thu, Jun 29, 2006 at 12:02:40PM +0200, Alban Hertroys wrote:

> This kind of reeks like a begin/end date and an accuracy quantifier,
> though that wouldn't account for option 6.
>
> Your cases 0 to 5 and 7 would be transformed into something like:
...
> Where I defined '5' as being accurate, and lower values less accurate.
> You may want to use values with a wider spread, it'll allow more
> fuzziness about how sure you are about a certain date.

Just for your information:

In our Python implementation of a fuzzy timestamp type we
used accuracy values ranging from 1 to 7 denoting the
"precision" of a complete timestamp definition:

7 - full subsecond accuracy (7 "digits" precision)
6 - seconds
5 - minutes
4 - hours
3 - days
2 - months
1 - years

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: timestamp with definable accuracy, was: Re: empty text

От
Alban Hertroys
Дата:
Karsten Hilbert wrote:
> Just for your information:
>
> In our Python implementation of a fuzzy timestamp type we
> used accuracy values ranging from 1 to 7 denoting the
> "precision" of a complete timestamp definition:
>
> 7 - full subsecond accuracy (7 "digits" precision)
> 6 - seconds
> 5 - minutes
> 4 - hours
> 3 - days
> 2 - months
> 1 - years

This is getting off topic, but let me add that IMO you chose the wrong
direction.
Your timestamps won't get any more accurate than seconds (your highest
value), but it could get less accurate. What are you going to use for
ranges larger than a year? 0? -1? -2? You locked yourself out...

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //