RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)
От | ldh@laurent-hasson.com |
---|---|
Тема | RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds) |
Дата | |
Msg-id | MN2PR15MB25601B28AF5A3096A646E1DF85CB9@MN2PR15MB2560.namprd15.prod.outlook.com обсуждение исходный текст |
Ответ на | RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 ("ldh@laurent-hasson.com" <ldh@laurent-hasson.com>) |
Список | pgsql-performance |
> -----Original Message----- > From: Justin Pryzby <pryzby@telsasoft.com> > Sent: Sunday, August 29, 2021 23:17 > To: Pavel Stehule <pavel.stehule@gmail.com> > Cc: ldh@laurent-hasson.com; Tom Lane <tgl@sss.pgh.pa.us>; Ranier > Vilela <ranier.vf@gmail.com>; Andrew Dunstan > <andrew@dunslane.net>; pgsql-performance@postgresql.org > Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2 > and 13.4 (workarounds) > > On Mon, Aug 30, 2021 at 04:43:23AM +0200, Pavel Stehule wrote: > > po 30. 8. 2021 v 2:44 odesílatel ldh@laurent-hasson.com napsal: > > > At this point, I am not sure how to proceed except to rethink that > > > toFloat() function and many other places where we use exceptions. > We > > > get such dirty data that I need a "safe" way to convert a string to > > > float without throwing an exception. BTW, I tried other > combinations > > > in case there may have been some weird interactions with the ::REAL > > > conversion operator, but nothing made any change. Could you > > > recommend another approach off the top of your head? I could use > > > regexes for testing etc... Or maybe there is another option like a > > > no-throw conversion that's built in or in some extension that you > may know of? Like the "SAFE." Prefix in BigQuery. > > > > CREATE OR REPLACE FUNCTION safe_to_double_precision(t text) > RETURNS > > double precision AS $$ BEGIN > > IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN > > RETURN $1::double precision; > > ELSE > > RETURN NULL; > > END IF; > > END; > > $$ LANGUAGE plpgsql IMMUTABLE STRICT; > > This tries to use a regex to determine if something is a "Number" or not. > Which has all the issues enumerated in painful detail by long answers on > stack overflow, and other wiki/blog/forums. > > Rather than trying to define Numbers using regex, I'd try to avoid only > the most frequent exceptions and get 90% of the performance back. I > don't know what your data looks like, but you might try things like this: > > IF $1 IS NULL THEN RETURN $2 > ELSE IF $1 ~ '^$' THEN RETURN $2 > ELSE IF $1 ~ '[[:alpha:]]{2}' THEN RETURN $2 ELSE IF $1 !~ '[[:digit:]]' THEN > RETURN $2 > BEGIN > RETURN $1::float; > EXCEPTION WHEN OTHERS THEN > RETURN $2; > END; > > You can check the stackoverflow page for ideas as to what kind of thing > to reject, but it may depend mostly on your data (what is the most > common string? > The most common exceptional string?). > > I think it's possible that could even be *faster* than the original, since it > avoids the exception block for values which are for sure going to cause > an exception anyway. It might be that using alternation (|) is faster (if > less > readable) than using a handful of IF branches. > > -- > Justin That's exactly where my head was at. I have looked different way to test for a floating point number and recognize the challenge😊 The data is very messy with people entering data by hand. We have seen alpha and punctuation, people copy/pasting from excelso large numbers get the "e" notation. It's a total mess. The application that authors that data is a piece of crapand we have no chance to change it unfortunately. Short of rolling out an ETL process, which is painful for the way ourdata comes in, I need an in-db solution. Thank you! Laurent.
В списке pgsql-performance по дате отправления:
Предыдущее
От: Pavel StehuleДата:
Сообщение: Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Следующее
От: "ldh@laurent-hasson.com"Дата:
Сообщение: RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4