Обсуждение: How to handle bogus nulls from ActiveRecord
It is required for application data verification filters that default values for table columns are known to ActiveRecord when creating a new row. So ActiveRecord obtains the default values from the tables dynamically and assigns them to their appropriate column attributes. The problem we encounter arises because ActiveRecord then uses those column assignments when inserting a row even if the column is not otherwise referenced. I am developing a web application using the Ruby on Rails framework with PostgreSQL as the back-end store. In one of our tables we have a column called expected_by which is a time-stamp. It is set to NOT NULL DEFAULT 'INFINITY'. However, Ruby has no concept of infinity and whatever the PostgreSQL adapter is returning for it ActiveRecord receives as nil which is converted to NULL. So, the real fix to this is to alter the persistence class so that columns with default values are not explicitly set to those values on insert. This is unlikely to happen in the short term and will take some time to be integrated into the framework even when it is completed, if ever, So solve this for the moment what I think I require is a trigger on expected_at which tests for NULL on insert and converts it to infinity. The other alternative is to simply set the default to some valid, but unreachable, date like 9999-12-31. I would like other opinions about how to best handle this situation and observations on what other significant concerns I may not be aware of but should provide for. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of James B. Byrne > Sent: Thursday, May 12, 2011 9:12 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] How to handle bogus nulls from ActiveRecord > > > So solve this for the moment what I think I require is a trigger on > expected_at which tests for NULL on insert and converts it to infinity. The > other alternative is to simply set the default to some valid, but unreachable, > date like 9999-12-31. > Not a huge fan of Infinity as a value...but that just may be lack of experience. I'd probably remove the NOT NULL constraint on expected_at and deal with tri-value logic; or also include a boolean (is_expected) and form queries like NOT is_expected OR (is_expected AND expected_at <op> timestamp) "is_expected" could be a calculated value in a view to make things somewhat easier; otherwise you'd need a table constraint to ensure non-null expected has a true is_expected. Without more info as to how you use "expected_at" other advice is difficult but can you user a meaningful value (say now()+'30 days'::interval) for the default? David J.
On Thu, May 12, 2011 12:40, David Johnston wrote: > > Not a huge fan of Infinity as a value...but that just may be lack of > experience. > > I'd probably remove the NOT NULL constraint on expected_at and deal > with tri-value logic; or also include a boolean (is_expected) and > form queries like Well, actually, the reason for the NOT NULL constraint is to catch application errors exactly like this one. Removing it is not contemplated. I had no idea that AR actually 'copied' and used default values on columns that were not referenced in the application code until I encountered this. And had it gone undetected this would have been a major problem later on. As it was, our tests brought it to our attention quite early which is why we can contemplate several solutions. > Without more info as to how you use "expected_at" other > advice is difficult but can you user a meaningful value > (say now()+'30 days'::interval) for the default? The column expected_by contains an estimated time of arrival for a particular conveyance. When a row is initialized this value is unknown some of the time. The expected_by value is reset to the arrived_at value on UPDATE if and only if expected_by is greater than arrived_at. Conveyances that have +infinite expected_by time-stamps are considered pending. At some point conveyance rows that are never going to arrive are otherwise flagged. On the other hand, rows with overdue expected_by values are given somewhat more attention, to put it mildly. So, we either fix the problem with AR, possibly by moving to Sequel ORM for this case, although I have not yet received an answer as to whether it does any better; Or we trap and override NULL values with infinity in a trigger; Or we choose for the default value a fixed date far, far into the future. +Infinity was chosen as a default to avoid the complexities of dealing with NULL logic in SELECTS. I suppose that the simplest solution is to go with a date of 9999-12-31 and treat that value like infinity. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
> The column expected_by contains an estimated time of arrival for a particular > conveyance. When a row is initialized this value is unknown some of the > time. The expected_by value is reset to the arrived_at value on UPDATE if > and only if expected_by is greater than arrived_at. > > Conveyances that have +infinite expected_by time-stamps are considered > pending. At some point conveyance rows that are never going to arrive are > otherwise flagged. On the other hand, rows with overdue expected_by > values are given somewhat more attention, to put it mildly. So, we either fix > the problem with AR, possibly by moving to Sequel ORM for this case, > although I have not yet received an answer as to whether it does any better; > Or we trap and override NULL values with infinity in a trigger; Or we choose > for the default value a fixed date far, far into the future. > > +Infinity was chosen as a default to avoid the complexities of > dealing with NULL logic in SELECTS. I suppose that the simplest solution is to > go with a date of 9999-12-31 and treat that value like infinity. The "just make it work" solution has many merits - I would also probably just use 9999-12-31 as a close approximation for +infinity; which itself is just there because you are avoiding "estimate is unknown". Why bother updating the "expected_by" value once the conveyance is no longer pending? Do you not really care if something arrived early? Even if you do not currently it seems a waste to throw out the data when you can readily get the same result as-needed (CASE WHEN expected_by <= arrived_at THEN arrived_at ELSE expected_by END) without giving up the ability to calculate early-ness. It would make more sense to set expected = arrived if and only if expected = 'Infinity'. Still, it would at least seem reasonable to guess a reasonable expected date if one is not otherwise provided - possibly with a flag indicating that it is a true guestimate instead of a estimate. David J.
On Thu, May 12, 2011 15:51, David Johnston wrote: >> >> +Infinity was chosen as a default to avoid the complexities of >> dealing with NULL logic in SELECTS. I suppose that the simplest >> solution is to go with a date of 9999-12-31 and treat that value >> like infinity. > > The "just make it work" solution has many merits - I would > also probably just use 9999-12-31 as a close approximation > for +infinity; which itself is just there because you are > avoiding "estimate is unknown". > > Why bother updating the "expected_by" value once the conveyance > is no longer pending? Do you not really care if something > arrived early? Even if you do not currently it seems a waste > to throw out the data when you can readily get the same result > as-needed (CASE WHEN expected_by <= arrived_at THEN arrived_at > ELSE expected_by END) without giving up the ability to calculate The main reason to update expected_by is that sometimes the conveyance arrives without the expected_by ever being set. Leaving the expected_by value at infinity, or 99991231, or NULL, complicates other parts of the system. However, leaving untouched expected_by values that are less than the infinite value is doable and is a better approach. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
David suggested using a guesstimate default date along with a boolean to indicate when you're using guesstimates. I think this is a solid approach, but if the default expected_by idea doesn't work for you, a boolean would still make this a lot easier on the Rails side.
It sounds like you're using a setup for Heroku, so I checked the postgreSQL 8.3 manual--a boolean is 1 byte. If this isn't for Heroku, other postgreSQL version probably implement booleans the same way. Your database size should go up by # records * 1 byte + indexing overhead.
Though I don't know how many records you're working with, this seems relatively cheap given that it will make your code more readable ("if expected_date_estimated?"). It should also simplify any remaining code you have to write, as you won't have to think about writing elaborate "if" or "case" statements to determine if expected_by was explicitly set.
It sounds like you're using a setup for Heroku, so I checked the postgreSQL 8.3 manual--a boolean is 1 byte. If this isn't for Heroku, other postgreSQL version probably implement booleans the same way. Your database size should go up by # records * 1 byte + indexing overhead.
Though I don't know how many records you're working with, this seems relatively cheap given that it will make your code more readable ("if expected_date_estimated?"). It should also simplify any remaining code you have to write, as you won't have to think about writing elaborate "if" or "case" statements to determine if expected_by was explicitly set.
On Thu, May 12, 2011 at 1:06 PM, James B. Byrne <byrnejb@harte-lyne.ca> wrote:
The main reason to update expected_by is that sometimes the
On Thu, May 12, 2011 15:51, David Johnston wrote:
>>
>> +Infinity was chosen as a default to avoid the complexities of
>> dealing with NULL logic in SELECTS. I suppose that the simplest
>> solution is to go with a date of 9999-12-31 and treat that value
>> like infinity.
>
> The "just make it work" solution has many merits - I would
> also probably just use 9999-12-31 as a close approximation
> for +infinity; which itself is just there because you are
> avoiding "estimate is unknown".
>
> Why bother updating the "expected_by" value once the conveyance
> is no longer pending? Do you not really care if something
> arrived early? Even if you do not currently it seems a waste
> to throw out the data when you can readily get the same result
> as-needed (CASE WHEN expected_by <= arrived_at THEN arrived_at
> ELSE expected_by END) without giving up the ability to calculate
conveyance arrives without the expected_by ever being set. Leaving
the expected_by value at infinity, or 99991231, or NULL, complicates
other parts of the system. However, leaving untouched expected_by
values that are less than the infinite value is doable and is a
better approach.
--
*** E-Mail is NOT a SECURE channel ***
James B. Byrne mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited http://www.harte-lyne.ca
9 Brockley Drive vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada L8E 3C3
--Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, May 12, 2011 20:30, Eric Hu wrote: > David suggested using a guesstimate default date along with > a boolean to indicate when you're using guesstimates. > I think this is a solid approach, but if the default > expected_by idea doesn't work for you, a boolean > would still make this a lot easier on the Rails side. Since this part of the project is still in development, albeit partially in use for some purposes, all those approaches are under cosnideration considered. In fact some things are implemented in that fashion for other parts of the system, particularly with respect to current status. However, the main problem to be solved is this issue over the default values for time stamp columns, all of which must have a NOT NULL constraint since we simply cannot allow an errant application overwrite valid data, as would have happened in the case under consideration. I am coming to the conclusion that an arbitrary value of 99991231 is a better approach than using PGs built-in idea of 'infinity' since I was told that concept is not supported in Ruby. Actually, it turn out that 'infinity' is supported in Ruby. Apparently infinity can be represented by assigning the value obtained by dividing a float by zero. $ irb ruby-1.8.7-p334 :001 > infinity = 1.0/0 => Infinity ruby-1.8.7-p334 :002 > ninfinity = -1.0/0 => -Infinity ruby-1.8.7-p334 :003 > So, I guess this now qualifies as a bug in the Ruby pg adapter gem. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
2011/5/13 James B. Byrne <byrnejb@harte-lyne.ca>: > Actually, it turn out that 'infinity' is supported in Ruby. > Apparently infinity can be represented by assigning the value > obtained by dividing a float by zero. > > $ irb > ruby-1.8.7-p334 :001 > infinity = 1.0/0 > => Infinity > ruby-1.8.7-p334 :002 > ninfinity = -1.0/0 > => -Infinity > ruby-1.8.7-p334 :003 > > > So, I guess this now qualifies as a bug in the Ruby pg adapter gem. humm.. interesting... i did some tests here... [testdb] SELECT * from infinity_date_test; id | created_at ----+------------ 1 | infinity 2 | infinity 3 | infinity 4 | infinity 5 | 2011-05-13 --- test.rb --- require 'active_record' ActiveRecord::Base.establish_connection({ :adapter => 'postgresql', :database => 'testdb', :port => 5434, :host => 'localhost', :username => 'guedes', :password => 'guedes' }) class InfinityDateTest < ActiveRecord::Base set_table_name 'infinity_date_test' end InfinityDateTest.all.each do |row| puts "#{row.id} | #{row.created_at} | #{row.created_at.class}" end i = InfinityDateTest.new i.created_at = Date::Infinity.new i.save ---- and the output 1 | | NilClass 2 | | NilClass 3 | | NilClass 4 | | NilClass 5 | 2011-05-13 | Date /home/dba/.rvm/gems/ruby-1.9.2-p0@rails3/gems/activerecord-3.0.5/lib/active_record/connection_adapters/abstract_adapter.rb:207:in `rescue in log': PGError: ERRO: sintaxe de entrada é inválida para tipo date: "--- !ruby/object:Date::Infinity (ActiveRecord::StatementInvalid) d: 1 " LINE 1: ... INTO "infinity_date_test" ("created_at") VALUES ('--- !ruby... ^ : INSERT INTO "infinity_date_test" ("created_at") VALUES ('--- !ruby/object:Date::Infinity d: 1 ') RETURNING "id" ... Well, fetching from database it came nil and when saved into, it was trying to save a serialized object. From postgresql_adapter.rb [1] you can see that it returns the correct internal type based when field type is datetime, but i can't see the same thing for 'date', i suppose that it going [2] to 'super' [3], so I suppose that this method [4] should be override in 'postgresql_adapter.rb'. [1] https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L85-86 [2] https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L117-118 [3] https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/column.rb#L253-254 [4] https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/column.rb#L197-219 Best regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Fri, May 13, 2011 11:50, Dickson S. Guedes wrote: > > Well, fetching from database it came nil and when saved into, it was > trying to save a serialized object. From postgresql_adapter.rb [1] > you > can see that it returns the correct internal type based when field > type is datetime, but i can't see the same thing for 'date', i > suppose > that it going [2] to 'super' [3], so I suppose that this method [4] > should be override in 'postgresql_adapter.rb'. > I have opened an issue for this with the ActiveRecord folks. https://github.com/rails/rails/issues/544 -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Fri, May 13, 2011 13:04, James B. Byrne wrote: > > I have opened an issue for this with the ActiveRecord folks. > > https://github.com/rails/rails/issues/544 > This has been addressed by the AR team and is committed to master. +-Infinity support for dates is slated for general release with RoR-3.0.8. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On 18/05/2011 4:02 AM, James B. Byrne wrote: > > On Fri, May 13, 2011 13:04, James B. Byrne wrote: >> >> I have opened an issue for this with the ActiveRecord folks. >> >> https://github.com/rails/rails/issues/544 >> > > This has been addressed by the AR team and is committed to master. > +-Infinity support for dates is slated for general release with > RoR-3.0.8. ... now if only Java would support infinite dates, too. Alas, unlike RoR, I doubt there'll be a quick "we've fixed this, grab the next point release" post for Java. Not even the 3rd party JodaTime date/time library supports infinite intervals, the +infinity date, etc, so I have to represent unbounded intervals with some distant future date, or use null. Neither option is very palatable. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/