Обсуждение: PostgreSQL SQL Tricks: faster urldecode
Hi, here is a function which is about 8 x faster than the one described in the PostgreSQL SQL Tricks ( http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Function_for_decoding_of_url_code ) The idea is to handle each encoded/not_encoded parts in bulk rather than spliting on each character. urldecode_arr: Seq Scan on lt_referrer (actual time=1.966..17623.979 rows=65717 loops=1) urldecode: Seq Scan on lt_referrer (actual time=4.846..144445.292 rows=65717 loops=1) regards, Marc Mamin CREATE OR REPLACE FUNCTION urldecode_arr(url text) RETURNS text AS $BODY$ DECLARE ret text; BEGIN BEGIN WITH STR AS ( SELECT -- array with all non encoded parts, prepend with '' when the string start is encoded case when $1 ~ '^%[0-9a-fA-F][0-9a-fA-F]' then array[''] end || regexp_split_to_array ($1,'(%[0-9a-fA-F][0-9a-fA-F])+', 'i') plain, -- array with all encoded parts array(select (regexp_matches ($1,'((?:%[0-9a-fA-F][0-9a-fA-F])+)', 'gi'))[1]) encoded ) SELECT string_agg(plain[i] || coalesce( convert_from(decode(replace(encoded[i], '%',''), 'hex'), 'utf8'),''),'') FROM STR, (SELECT generate_series(1, array_upper(encoded,1)+2) i FROM STR)blah INTO ret; EXCEPTION WHEN OTHERS THEN raise notice 'failed: %',url; return $1; END; RETURN coalesce(ret,$1); -- when the string has no encoding; END; $BODY$ LANGUAGE plpgsql IMMUTABLE STRICT
On Fri, Sep 20, 2013 at 10:26 AM, Marc Mamin <M.Mamin@intershop.de> wrote: > Hi, > here is a function which is about 8 x faster than the one described in the PostgreSQL SQL Tricks > ( http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Function_for_decoding_of_url_code ) > > The idea is to handle each encoded/not_encoded parts in bulk rather than spliting on each character. > > urldecode_arr: > Seq Scan on lt_referrer (actual time=1.966..17623.979 rows=65717 loops=1) > > urldecode: > Seq Scan on lt_referrer (actual time=4.846..144445.292 rows=65717 loops=1) very nice. Basically it comes down to this: all non-trivial regex replacements require decomposition of the string into an array because regexp_replace() is unable to do any kind of transformation on the string. This is a crippling limitation relative to first-class regex languages like perl; postgres string translation functions are invisible to the regex engine. I have no idea if this is fixable (I dimly recall Tom explaining why it might not be). merlin
> Von: Merlin Moncure [mmoncure@gmail.com] > Gesendet: Freitag, 20. September 2013 17:43 > > > On Fri, Sep 20, 2013 at 10:26 AM, Marc Mamin <M.Mamin@intershop.de> wrote: > > Hi, > > here is a function which is about 8 x faster than the one described in the PostgreSQL SQL Tricks > > ( http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Function_for_decoding_of_url_code ) > > > > The idea is to handle each encoded/not_encoded parts in bulk rather than spliting on each character. > > > > urldecode_arr: > > Seq Scan on lt_referrer (actual time=1.966..17623.979 rows=65717 loops=1) > > > > urldecode: > > Seq Scan on lt_referrer (actual time=4.846..144445.292 rows=65717 loops=1) > > very nice. Basically it comes down to this: all non-trivial regex > replacements require decomposition of the string into an array because > regexp_replace() is unable to do any kind of transformation on the > string. This is a crippling limitation relative to first-class regex > languages like perl; postgres string translation functions are > invisible to the regex engine. I have no idea if this is fixable (I > dimly recall Tom explaining why it might not be). > > merlin yes, a possible(?) assistance for such problems would be a new variant of regexp_split_to_table that would return two columns: - the splitted parts (as currently) - the separator matches (new) Marc