Обсуждение: finding columns that have three or fewer distinct characters
I've got an interesting one...I'm trying to find columns that have three or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if I need to write a function or if someone has an idea how to do it with built in functions and/or pattern matching? I think the thing to do would be to lowercase everything, then remove all duplicate chars and spaces, then use length() on that, but it's not obvious to me how I might remove the duplicate chars with the pattern matching support in the docs. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote:
> I've got an interesting one...I'm trying to find columns that have three
> or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if
> I need to write a function or if someone has an idea how to do it with
> built in functions and/or pattern matching?
>
> I think the thing to do would be to lowercase everything, then remove
> all duplicate chars and spaces, then use length() on that, but it's not
> obvious to me how I might remove the duplicate chars with the pattern
> matching support in the docs.
It's interesting, indeed.
Here's how you might do it with a PL/Perl function. :)
CREATE OR REPLACE FUNCTION remove_duplicates(TEXT) RETURNS TEXT AS
$$ my ($text) = @_; while ($text =~ s/(.)(.*)\1/$1$2/g != 0) {}; return $text;
$$ LANGUAGE plperl;
cww=# SELECT
remove_duplicates('ffffoooooooobbbbaaaarrrr.!@#$.foobar-baz-qux'); remove_duplicates
------------------- fobar.!@#$-zqux
(1 row)
Colin
Jeff Frost escreveu:
> I've got an interesting one...I'm trying to find columns that have three
> or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if
> I need to write a function or if someone has an idea how to do it with
> built in functions and/or pattern matching?
>
> I think the thing to do would be to lowercase everything, then remove
> all duplicate chars and spaces, then use length() on that, but it's not
> obvious to me how I might remove the duplicate chars with the pattern
> matching support in the docs.
>
Try:
SELECT array_to_string(array( SELECT DISTINCT lower(substr('aabbcdddef AB',i,1)) FROM
generate_series(1,length('aabbcdddef AB')) s(i)) , '');
Osvaldo
On Wed, 5 Mar 2008, Osvaldo Rosario Kussama wrote:
> Jeff Frost escreveu:
>> I've got an interesting one...I'm trying to find columns that have three or
>> fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if I need
>> to write a function or if someone has an idea how to do it with built in
>> functions and/or pattern matching?
>>
>> I think the thing to do would be to lowercase everything, then remove all
>> duplicate chars and spaces, then use length() on that, but it's not obvious
>> to me how I might remove the duplicate chars with the pattern matching
>> support in the docs.
>>
>
> Try:
> SELECT
> array_to_string(array(
> SELECT DISTINCT lower(substr('aabbcdddef AB',i,1))
> FROM generate_series(1,length('aabbcdddef AB'))
> s(i))
> , '');
>
Osvaldo, that appears to work perfectly! Thank you much!
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954