On Mar 4, 2012, at 22:31, Chris Angelico <rosuav@gmail.com> wrote:
> (Hoping you meant for that reply to be on-list as I'm here responding on-list.)
>
> On Mon, Mar 5, 2012 at 2:16 PM, A.M. <agentm@themactionfaction.com> wrote:
>>
>> On Mar 4, 2012, at 9:13 PM, Chris Angelico wrote:
>>
>>> One of our tables has a few columns that may be interpreted as strings
>>> or may be numbers (data type is varchar, numbers are stored as
>>> decimal). Generally, operations are performed on the string, but
>>> sometimes we need to parse out a number - without it failing on error.
>>> I wrote the following function to approximate to the semantics of
>>> atoi:
>>
>> I would recommend against such a schema since different data types should warrant their own columns, but if you are
leftwith no choice...
>
> The values have to be strings for other reasons (eg '' is valid
> everywhere, and this is subsequently processed by a script that
> expects all strings). So yeah, no choice there. But I agree that
> normally you DO want integers stored in integer columns, and we're
> paying a performance penalty for this.
>
>>>
>>> create or replace function str2int(val varchar) returns bigint immutable as $$
>>> begin
>>> val=substring(val from '[0-9]*');
>>> if length(val) between 1 and 19 then return val::bigint; end if;
>>> return 0;
>>> end;
>>> $$ language plpgsql;
>>
>> This can be written as:
>> select substring('35345345.45645654' from '\d{1,19}')::bigint;
>>
>> Be aware that this does not account for:
>> 3dogs
>> 3 dogs
>> 3,dogs
>> 3.5.6.7
>>
>> all of which will return 3::bigint, but I assume that the column is not completely free-form or maybe this is what
youwant. (?)
>>
>> Cheers,
>> M
>
> That is in fact the behaviour I want. Trouble is, that simpler version
> returns NULL if given 'asdf' as its input - I want it to return 0.
COALESCE is your friend
> It's also about the same performance (slightly slower in a quick test)
> than the original; it's still doing a regular expression parse. I was
> hoping very much to avoid the regex altogether.
>
> ChrisA
>
Any efficient, non-RegEx, alternative would require more context to evaluate than you provide. Mainly, would it be
fasterto have a separate field to store the parsed (at input) number and then query that field directly (even if it is
atext field as well)? Basically cache the parse.
David J..