Re: How to setup default value "0000-00-00" for "date"
От | Marco Colombo |
---|---|
Тема | Re: How to setup default value "0000-00-00" for "date" |
Дата | |
Msg-id | Pine.LNX.4.44.0408201201510.3512-100000@Megathlon.ESI обсуждение исходный текст |
Ответ на | Re: How to setup default value "0000-00-00" for "date" (Michal Taborsky <michal@taborsky.cz>) |
Список | pgsql-general |
On Fri, 20 Aug 2004, Michal Taborsky wrote: > Richard Huxton wrote: > > Where you don't have a valid date to store you should use NULL. This > > business of storing zeroes is a horrible MySQL design mistake. > > Well, yes and no. It certainly is a design mistake and introduces > incosistency into the database, but after I was bitten several times by > NULL values I'd go for solution like this any day. Let me explain. He refers to MySQL design, not _your_ design. But by letting you insert zeros, MySQL misguided you in your design... > We had a table of messages, which was inserted to randomly and every few > minutes we'd walk through the unprocessed messages and perform some work > on them. I, trying to have the database as clean as possible, used this > table definition (simplified): > > messages ( > id serial, > data text, > arrived timestamp default now, > processed timestamp) > > So after the message arrived, it had the processed field set to null, > which was perfectly consistent and represented what it realy was--an > unknown value. No. You're using one field to emulate two. First, you're using processed as a flag, to indicate the unprocesses/processed status. Then, you use is to store the timestamp. To clarify, you use that field to put two different questions to the system: 1) has this message been processed? 2) _when_ was this message processed? This design mistake introduces the need of an 'invalid' value you have to put in the field, cause that's the 'flag' part I've mentioned. And NULL does not mean 'invalid', it means 'not available'. A timestamp field can be used to answer only to one question: 'when'. Be careful what NULL means here: it does not mean 'never', it means 'I don't know when it was processed'. You cannot consider a message with NULL timestamp as 'not processes yet', because that's only one case. It could have been processed, but for some reason someone forgot to store the timestamp. So your claim: "...which was perfectly consistent and represented what it realy was--an unknown value." does not hold. It's a subtle design mistake. Because you seem to be sure that if a message was processed, then the timestamp must be available, you're implying that if the field is NULL, the message is unprocessed. This kind of 'knowledge' is external to the DB. You'd better let the DB know the full story if you want it to provide nice answers to your queries. It's not just an index problem. [...] > SELECT * FROM messages WHERE processed='-infinity'. There! Here you have to introduce your 'flag' value again. The dual nature of the field has to be represented somehow... see? Since NULL failed in its role of 'flag', you're using another value. Now ask yourself what the meaning of processed is here: 1) if it's NULL, you can't tell _when_ (not _if_) the message was processed; 2) if it's -infinity (call it SPECIAL_FLAG_VALUE), the message was NOT processed; 3) if it's different from -infinity, then the message was processed, and you can tell when... Compare it to the natural meaning of a timestamp field: 1) if it's NULL, you can't tell _when_ the message was processed; 2) otherwise you can tell _when_ the message was processed. that simple. [...] > 1.1.1970 00:00). Just make sure you make a note of it somewhere and my > suggestion is you write a COMMENT ON that column for future generations. :) If you want to take care of future generations, my suggestion is to design the database the right way from the start. If you need to check for unprocessed messages, and thus you want the db to answer to the question: 'has this message been processed?' you should introduce a way to represent that kind of knowledge explicitly, with a boolean field. Once you have two fields: is_processed boolean, processed_ts timestamp, you can express queries more naturally (and they'll run fast). 'Saving' one field may lead to horrors at query time. And of course, you may want to place contraints on those fields: - if is_processed is false, then processed_ts must be NULL; - if is_processed is true, then processed_ts must be NOT NULL; - is_processed may be NOT NULL. The first is natural, the second tricky: the system will force anyone to specify a timestamp when changing the state of the message to 'processed'. It means no partial records, no 'I will fill it in later'. The third one is tricker: it places a contraint on the knowledge you need to have before you can insert a message in the system... if someone walks in and tells you: 'let's put message XXX into the system, but I don't know if it was processed or not' you'll have to say 'I'm sorry no, I can't', and hope that answer gets accepted. That message will have to stay outside the system, until someone discovers if it was processed or not. If that message _has_ to be in (maybe just because your boss expects it to be in) you'll have to provide a value... that means the DB will provide potentially wrong answers (think of count() on processed messages). .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
В списке pgsql-general по дате отправления: