RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)

Поиск
Список
Период
Сортировка
От ldh@laurent-hasson.com
Тема RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)
Дата
Msg-id MN2PR15MB25601B28AF5A3096A646E1DF85CB9@MN2PR15MB2560.namprd15.prod.outlook.com
обсуждение исходный текст
Ответ на RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4  ("ldh@laurent-hasson.com" <ldh@laurent-hasson.com>)
Список pgsql-performance

   >  -----Original Message-----
   >  From: Justin Pryzby <pryzby@telsasoft.com>
   >  Sent: Sunday, August 29, 2021 23:17
   >  To: Pavel Stehule <pavel.stehule@gmail.com>
   >  Cc: ldh@laurent-hasson.com; Tom Lane <tgl@sss.pgh.pa.us>; Ranier
   >  Vilela <ranier.vf@gmail.com>; Andrew Dunstan
   >  <andrew@dunslane.net>; pgsql-performance@postgresql.org
   >  Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
   >  and 13.4 (workarounds)
   >  
   >  On Mon, Aug 30, 2021 at 04:43:23AM +0200, Pavel Stehule wrote:
   >  > po 30. 8. 2021 v 2:44 odesílatel ldh@laurent-hasson.com napsal:
   >  > > At this point, I am not sure how to proceed except to rethink that
   >  > > toFloat() function and many other places where we use exceptions.
   >  We
   >  > > get such dirty data that I need a "safe" way to convert a string to
   >  > > float without throwing an exception. BTW, I tried other
   >  combinations
   >  > > in case there may have been some weird interactions with the ::REAL
   >  > > conversion operator, but nothing made any change. Could you
   >  > > recommend another approach off the top of your head? I could use
   >  > > regexes for testing etc... Or maybe there is another option like a
   >  > > no-throw conversion that's built in or in some extension that you
   >  may know of? Like the "SAFE." Prefix in BigQuery.
   >  >
   >  > CREATE OR REPLACE FUNCTION safe_to_double_precision(t text)
   >  RETURNS
   >  > double precision AS $$ BEGIN
   >  >   IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN
   >  >     RETURN $1::double precision;
   >  >   ELSE
   >  >     RETURN NULL;
   >  >   END IF;
   >  > END;
   >  > $$ LANGUAGE plpgsql IMMUTABLE STRICT;
   >  
   >  This tries to use a regex to determine if something is a "Number" or not.
   >  Which has all the issues enumerated in painful detail by long answers on
   >  stack overflow, and other wiki/blog/forums.
   >  
   >  Rather than trying to define Numbers using regex, I'd try to avoid only
   >  the most frequent exceptions and get 90% of the performance back.  I
   >  don't know what your data looks like, but you might try things like this:
   >  
   >  IF $1 IS NULL THEN RETURN $2
   >  ELSE IF $1 ~ '^$' THEN RETURN $2
   >  ELSE IF $1 ~ '[[:alpha:]]{2}' THEN RETURN $2 ELSE IF $1 !~ '[[:digit:]]' THEN
   >  RETURN $2
   >  BEGIN
   >     RETURN $1::float;
   >  EXCEPTION WHEN OTHERS THEN
   >     RETURN $2;
   >  END;
   >  
   >  You can check the stackoverflow page for ideas as to what kind of thing
   >  to reject, but it may depend mostly on your data (what is the most
   >  common string?
   >  The most common exceptional string?).
   >  
   >  I think it's possible that could even be *faster* than the original, since it
   >  avoids the exception block for values which are for sure going to cause
   >  an exception anyway.  It might be that using alternation (|) is faster (if
   >  less
   >  readable) than using a handful of IF branches.
   >  
   >  --
   >  Justin

That's exactly where my head was at. I have looked different way to test for a floating point number and recognize the
challenge😊
 

The data is very messy with people entering data by hand. We have seen alpha and punctuation, people copy/pasting from
excelso large numbers get the "e" notation. It's a total mess. The application that authors that data is a piece of
crapand we have no chance to change it unfortunately. Short of rolling out an ETL process, which is painful for the way
ourdata comes in, I need an in-db solution.
 

Thank you!
Laurent.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Следующее
От: "ldh@laurent-hasson.com"
Дата:
Сообщение: RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4