Обсуждение: text parsing function

Поиск
Список
Период
Сортировка

text parsing function

От
"Keith Worthington"
Дата:
Hi All,

I am trying to create software that will parse a string of varchar(160) and
write different parts of it into various locations in a database.

My thought process is something like this.
Create one or more functions to parse the data.
Create a trigger function that calls parsing function(s) and writes the data
into the appropriate locations.
Create after trigger on table that contains the entire string.

I have written the first function using pl/pgsql.  As one might expect text
processing in pl/pgsql is messy.

Unfortunately I do not know Perl which is probably better suited to the text
munging part of my problem.  Is there any penalty for using pl/pgsql for this
function?  Given the inputs below does anyone see a problem with my code?  Can
anyone suggest a better course of action?


-- Function: func_valid_item(varchar)

-- DROP FUNCTION func_extract_border_id("varchar");

CREATE OR REPLACE FUNCTION func_extract_border_id("varchar")
  RETURNS varchar AS
$BODY$
-- A function to extract and check the validity of a border item id.
-- One input argument.  description  Case insensitive.
   DECLARE
      v_description ALIAS FOR $1;
      v_border_id varchar(20);
   BEGIN
      SELECT tbl_item.id INTO v_border_id
        FROM tbl_item
       WHERE lower(tbl_item.id) =
             lower(
                    CASE
--                     Check for properly formatted description string.
                       WHEN lower(v_description)::text
                          NOT LIKE '%border:%size:%tag:%'
                          THEN NULL
--                     Check for borderless description.
                       WHEN "substring"(lower(v_description::text),
                                        'border: *none'::text)
                          IS NOT NULL
                          THEN NULL
--                     Description contains valid border format.
                       ELSE trim(both
                                 ' '
                                 from "substring"( v_description::text,
--source column

"position"(lower(v_description)::text, 'border:'::text) + 7, --start position

"position"(lower(v_description)::text, 'size:'::text) -
("position"(lower(v_description)::text, 'border:'::text) + 7) --string length
                                                 )
                                )
                    END
                  );
      IF FOUND THEN
         RETURN v_border_id;
      ELSE
         RETURN NULL;
      END IF;
   END;
$BODY$
  LANGUAGE 'plpgsql' STABLE STRICT;

CREATE TABLE tbl_item
(
  id varchar(20) NOT NULL,
  CONSTRAINT tbl_item_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

INSERT INTO tbl_item VALUES ('RMFP025BK');
INSERT INTO tbl_item VALUES ('WNY200BK');
INSERT INTO tbl_item VALUES ('TW84NYBK');

Example Input:
SELECT * FROM func_extract_border_id(quote_ident('815 HTPP Black 2in sq
Border: RMFP025BK Size: 7'10" x 16' Tag:  None'));

SELECT * FROM func_extract_border_id(quote_ident(' 3000 HTPP Black 4in sq
Border:  WNY200BK Size:  17' x 50' Tag:  None'));

SELECT * FROM func_extract_border_id(quote_ident(' 1250 HTPP Multi-Color
Bonded 2in sq Border:None Size: 5' X 90' Tag: NONE'));

SELECT * FROM func_extract_border_id(quote_ident(' 3000 HTPP Black 4in sq
Border:  WNY200BK Size:  12' x 12'2" Tag:  None'));

SELECT * FROM func_extract_border_id(quote_ident(' 1250 HTPP Black Bonded 2in
sq Border: RMFP025BK Size: 39" X 100' Tag:  None'));

SELECT * FROM func_extract_border_id(quote_ident(' 1250 HTPP Black Bonded 2in
sq Border: RMFP025BK Size: 83" X 40' Tag:  None'));

SELECT * FROM func_extract_border_id(quote_ident(' 500 HTPP Black 1.75in sq
Border: TW84NYBK Size: 12' x 14' Tag:  None'));

SELECT * FROM func_extract_border_id(quote_ident(' Netform Net Size: 5' X 32'
W & L Body Length:24''));

SELECT * FROM func_extract_border_id(quote_ident('500WH HTPP White 1.75in sq
Border: WNY200BK Size: 5'x6' Tag: None'));

Kind Regards,
Keith

Re: text parsing function

От
John DeSoi
Дата:
On Apr 18, 2005, at 11:41 AM, Keith Worthington wrote:

> I am trying to create software that will parse a string of
> varchar(160) and
> write different parts of it into various locations in a database.
>
> My thought process is something like this.
> Create one or more functions to parse the data.
> Create a trigger function that calls parsing function(s) and writes
> the data
> into the appropriate locations.
> Create after trigger on table that contains the entire string.

You could also use the COPY command to copy the unparsed data to a
temporary table (one text column) and then call a function to parse the
rows in the temporary table. The pgEdit distribution has an example of
this for processing HTTP logs.

>
> I have written the first function using pl/pgsql.  As one might expect
> text
> processing in pl/pgsql is messy.


>
> Unfortunately I do not know Perl which is probably better suited to
> the text
> munging part of my problem.  Is there any penalty for using pl/pgsql
> for this
> function?  Given the inputs below does anyone see a problem with my
> code?  Can
> anyone suggest a better course of action?


I have not done any benchmarks or optimization, but text processing in
psql seems relatively slow.


John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL