Обсуждение: Alternative to UPDATE (As COPY to INSERT)

Поиск
Список
Период
Сортировка

Alternative to UPDATE (As COPY to INSERT)

От
Yan Cheng CHEOK
Дата:
I realize update operation speed in PostgreSQL doesn't meet my speed expectation.

Is there any fast alternative to UPDATE? as using fast COPY to INSERT operation.

Thanks!

I am using update in the following case :

CREATE OR REPLACE FUNCTION update_or_insert_statistic(integer, text[], text[], double precision[])
  RETURNS void AS
$BODY$DECLARE
  _lotID ALIAS FOR $1;
  _measurementTypes ALIAS FOR $2;
  _statisticTypes ALIAS FOR $3;
  _values ALIAS FOR $4;
  _row_count int;
  i int;
BEGIN
    -- Parameters validation.
    IF array_upper(_measurementTypes, 1) != array_upper(_statisticTypes, 1) OR array_upper(_measurementTypes, 1) !=
array_upper(_values,1) THEN 
        RAISE EXCEPTION 'Inconsistency in array size';
    END IF;

    FOR i IN SELECT generate_subscripts(_measurementTypes, 1)
    LOOP
        EXECUTE 'UPDATE statistic SET value = $1 WHERE fk_lot_id = $2 AND measurement_type = $3 AND statistic_type =
$4'
        USING _values[i], _lotID, _measurementTypes[i], _statisticTypes[i];

        GET DIAGNOSTICS _row_count = ROW_COUNT;

        IF _row_count = 0 THEN
            EXECUTE 'INSERT INTO statistic(fk_lot_id, "value", measurement_type, statistic_type) VALUES ($1, $2, $3,
$4)'
            USING _lotID, _value, _measurementType, _statisticType;
        END IF;
    END LOOP;
END;$BODY$

I use the following "minimal" version

SELECT * FROM update_or_insert_statistic(1,array['Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1
Area','Pad1 
Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Pitch','Pad1 Pitch','Pad1
Pitch','Pad1Pitch','Pad1  
Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch'],

array['LSL','USL','Nominal','Average','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance','LSL','USL','Nominal','Av
erage','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance'],

array[0,0,0,4.94422589800714,3.16063453753607,0,0,1.01620532853175,9.98406933805353,20,20,0,9.98961067986587,0,0,0,6.56297341837825,2.512
73949943937,0,0,1.69188512833033,9.56794946134831,20,20,0,6.31385979204282])

It takes around 20ms :(

I am expecting < 1ms

Or shall I just go back to plain text in this case?

Thanks and Regards
Yan Cheng CHEOK





Re: Alternative to UPDATE (As COPY to INSERT)

От
Alban Hertroys
Дата:
On 23 Feb 2010, at 10:26, Yan Cheng CHEOK wrote:

> I realize update operation speed in PostgreSQL doesn't meet my speed expectation.
>
> Is there any fast alternative to UPDATE? as using fast COPY to INSERT operation.

Well, since an UPDATE is just a DELETE + INSERT and you're already doing this in one transaction, you could use DELETE
+COPY instead. That's not as easy to do with your current approach though - converting those arrays to something COPY
understandswill probably take about as much time as your function is taking now. 

It would probably be convenient to use a staging table (maybe temporary) to put the new values in before you act on
them,so you don't need that hassle with arrays. I don't know where your data is coming from (I recall you work with a
measurementmachine in a capacitor plant?), but you could use COPY to fill the staging table (no constraints, yay!) and
then:

  BEGIN;
  DELETE FROM statistics WHERE (fk_lot_id, measurement_type, statistic_type) IN (SELECT fk_lot_id, measurement_type,
statistic_typeFROM staging_table); 
  INSERT INTO statistics (value, fk_lot_id, measurement_type, statistic_type) SELECT value, fk_lot_id,
measurement_type,statistic_type FROM staging_table); 
  TRUNCATE staging_table;
  COMMIT;

This isn't concurrency-safe, so you need to make sure no values are added to the staging table while you're doing the
above.

I'm not sure you'll get to <1 ms doing this, that's a pretty steep requirement, but considering you can do it the
"slow"way in 20ms it might just work. 

Of course, if you have a staging table you could choose to operate on it less frequently; that would give you more time
tooperate on it and reduces the amount of overhead a little. I'm guessing your data comes in 24/7, so finding the right
batch-sizeis part of your problem. 

> Thanks!
>
> I am using update in the following case :
>
> CREATE OR REPLACE FUNCTION update_or_insert_statistic(integer, text[], text[], double precision[])
>  RETURNS void AS
> $BODY$DECLARE
>  _lotID ALIAS FOR $1;
>  _measurementTypes ALIAS FOR $2;
>  _statisticTypes ALIAS FOR $3;
>  _values ALIAS FOR $4;
>  _row_count int;
>  i int;
> BEGIN
>    -- Parameters validation.
>    IF array_upper(_measurementTypes, 1) != array_upper(_statisticTypes, 1) OR array_upper(_measurementTypes, 1) !=
array_upper(_values,1) THEN 
>        RAISE EXCEPTION 'Inconsistency in array size';
>    END IF;
>
>    FOR i IN SELECT generate_subscripts(_measurementTypes, 1)
>    LOOP
>        EXECUTE 'UPDATE statistic SET value = $1 WHERE fk_lot_id = $2 AND measurement_type = $3 AND statistic_type =
$4'
>        USING _values[i], _lotID, _measurementTypes[i], _statisticTypes[i];
>
>        GET DIAGNOSTICS _row_count = ROW_COUNT;
>
>        IF _row_count = 0 THEN
>            EXECUTE 'INSERT INTO statistic(fk_lot_id, "value", measurement_type, statistic_type) VALUES ($1, $2, $3,
$4)'
>            USING _lotID, _value, _measurementType, _statisticType;
>        END IF;
>    END LOOP;
> END;$BODY$
>
> I use the following "minimal" version
>
> SELECT * FROM update_or_insert_statistic(1,array['Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1
Area','Pad1 
> Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Pitch','Pad1 Pitch','Pad1
Pitch','Pad1Pitch','Pad1  
> Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch'],
>
array['LSL','USL','Nominal','Average','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance','LSL','USL','Nominal','Av
> erage','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance'],
>
array[0,0,0,4.94422589800714,3.16063453753607,0,0,1.01620532853175,9.98406933805353,20,20,0,9.98961067986587,0,0,0,6.56297341837825,2.512
> 73949943937,0,0,1.69188512833033,9.56794946134831,20,20,0,6.31385979204282])
>
> It takes around 20ms :(
>
> I am expecting < 1ms
>
> Or shall I just go back to plain text in this case?
>
> Thanks and Regards
> Yan Cheng CHEOK
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b83c7c410447773417439!



Re: Alternative to UPDATE (As COPY to INSERT)

От
Richard Huxton
Дата:
On 23/02/10 09:26, Yan Cheng CHEOK wrote:
> I realize update operation speed in PostgreSQL doesn't meet my speed expectation.
>
> Is there any fast alternative to UPDATE? as using fast COPY to INSERT operation.

No. But you haven't said where the limit is on your operation.
>          EXECUTE 'UPDATE statistic SET value = $1 WHERE fk_lot_id = $2 AND measurement_type = $3 AND statistic_type =
$4'
>          USING _values[i], _lotID, _measurementTypes[i], _statisticTypes[i];

There's no need to do EXECUTE ... USING here - just do the update
   UPDATE statistic SET value = _values[i] WHERE fk_log_id = _lotID ...

> It takes around 20ms :(
>
> I am expecting<  1ms

This might be impractical, depending on exactly what you hope to achieve.

If you wish to have individual transactions take no more than 1ms and be
safely on disk, then you will need a disk controller with battery-backed
write cache. Disks just don't spin fast enough.

You posted a few questions, but I don't see anything saying exactly what
you are trying to achieve and what sort of server you have to do it
with. Perhaps some background would be useful.

--
   Richard Huxton
   Archonet Ltd