Good morning,
I am searching for a better more efficient way to extract the domain portion of an email address. The two I have been using are very expensive.
The one extract I mostly use: substring(email from '@(.*)$')
also has an index on the email column which is the full email address:
btree (email DESC) WHERE email::text = "substring"(email::text, '@(.*)$'::text) AND length(email::text) > 0,
The other extract:
substr(e.email,(strpos(e.email, '@') + 1))
Currently has no index.
The referenced table has 72 million rows. The email column can be empty (ergo the length = 0 check. I am running postgresql 9.4.
Any insights/observations welcome.
Not sure of the exact problem you are trying to solve; is it just querying the table via domain name? Does the index that you created work?
A few things; the regex given does not get domain name for some valid emails addresses;
Assuming the emails are valid:
create or replace function get_domainname(_value text)
returns text
as $$
begin
_value := reverse(_value);
return nullif(reverse(substring(_value, 0, strpos(_value, '@'))), '');
end;
$$ language plpgsql
immutable returns null on null input
;
create index table_name_idx1 on table_name (get_domainname(email));
Then in the where clause:
p.s.
If you want to validate that the email address conforms to the RFC 2822 you can use this function which makes use of Perl’s Email::Address module. You would need to install plperl and Email::Address module.
create or replace function is_valid_email_address(eaddr text)
returns boolean
as
$body$
use Email::Address;
return Email::Address->parse($_[0]);
$body$
language 'plperlu' immutable
;