On Jun 18, 2013, at 12:23 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 6/18/2013 12:17 PM, Steven Schlansker wrote:
>> 1) The common value is not known at schema definition time, and may change (very slowly) over time.
>
> how could a value thats constant in 95% of the rows change, unless you added 20 times more rows with a new value (and
fora big portion of the time, no value would meet your 95% criteria).
The table is a denormalized version of some packed data. The packed data is constant, but the extractor code changes
overtime. The value in question is a "extractor version used to create this row".
There is a periodic job that attempts to find batches of rows that have fields extracted by an old version of the
extractor. These rows are re-extracted from the packed data.
So, most of the time the vast majority of rows will have CURRENT_VERSION as their version, and a small percentage of
rowswill have a previous version. The job will select rows where extracted_version != CURRENT_VERSION. If this query
isnot indexed, even doing a periodic check if any rows exist takes an absurd amount of time.
At some point, the code changes, and CURRENT_VERSION gets incremented. Rows then slowly (over a period of days /
weeks)get "upgraded" to the new current version, in batches of thousands.
This is what I mean by a very slowly changing mostly-constant value.
Hope that makes sense,
Steven