Обсуждение: Simulating sequences

Поиск
Список
Период
Сортировка

Simulating sequences

От
"Vilson farias"
Дата:
Greetings,

  I'm getting a big performance problem and I would like to ask you what
would be the reason, but first I need to explain how it happens.

  Let's suppose I can't use sequences (it seams impossible but my boss
doesn't like specific database features like this one).

  For sequence simulation I had created a table called cnfg_key_generation
and each tuple holds information for one of my tables (tablename,
current_sequencial_number). Lets check :

CREATE TABLE cnfg_key_generation (
       department  integer     NOT NULL,
       table_name  varchar(20) NOT NULL,
       current_key integer NOT NULL,
       CONSTRAINT XPKcnfg_key_generation PRIMARY KEY (department,
table_name)
);


Per example, for a table called 'my_test' I would have the following values
:
  department  = 1
  table_name  = 'my_test'
  current_key = 1432

Everytime I want a new key to use in my_test primary-key I just increment
current_key value. For this job, I've created a simple stored procedure
called key_generation


CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer AS'
DECLARE
  the_department ALIAS FOR $1;
  the_table_name ALIAS FOR $2;
  new_key_value  integer;
  err_num        integer;
BEGIN
  new_value := 0;

  LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;

  SELECT current_value INTO new_value
    FROM cnfg_key_generation
    WHERE the_department = department AND the_table_name = table_name;

  IF NOT FOUND THEN
    new_key_value := 1;
    INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,
new_key_value);
  ELSE
    new_key_value := new_key_value + 1;

    UPDATE cnfg_key_generation
      SET current_key_value = new_key_value
      WHERE department = the_department AND
            table_name = the_table_name;
  END IF;

  RETURN new_key_value;

END;
'
LANGUAGE 'plpgsql';


Data insertion is done by the following way :

INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
fields...);

Ok, this works, but has a huge performance problem. I read in pg docs that
everytime I make an UPDATE, a new tuple is created and the old is marked as
invalid. For a tuple that holds sequencial key generation, everytime a new
key is generated, a new tuple is created inside cfg_key_generation. It means
after 2million key generations for same table, performance will be completly
degradated because there will be 2million of old versions of same tuple. For
instance, I have a table called 'cham_chamada' that actually holds
1.5Million of tuples. The response time for key_generation execution for
this table is more than 5seconds. In this same case if I execute
key_generation for a table that has just few values (current key = 5 per
example), response time is just some miliseconds (30 to 50ms).

I tryied to fix this problem with a VACUUM and it was completly ineffective.
After execution the problem was still there. Later, after execution of every
kind of vacuum I knew (with and without ANALYZE, especific for that table,
vacuumdb from shell ...) I gave up and tryied a COPY TO/DROP/CREATE/COPY
FROM. At this time, the performance problem was fixed.

What can I do to solve this problem without table reconstruction? Is there a
performance degradation in PostgreSQL that can't be solved? If a have a huge
table with millions of data being update constantly, what can I do to keep a
good performance if vacuum isn't 100%?

Does PostgreSQL sequences deal with these kinds performance questions? Is it
clean, fast and secury to use sequences? Maybe I still can make up my boss
mind about this subject if I get good arguments here to justify the use of
sequences.

Am I doing some stupid thing?

Best regards,

-----------------------------------------------------------------
José Vilson de Mello de Farias
Software Engineer

Dígitro Tecnologia Ltda - www.digitro.com.br
APC - Customer Oriented Applications
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179


Re: Simulating sequences

От
Dennis Gearon
Дата:
I would collect the way that 5-10 of the top databases handle unique id's (sequences) for columns and compare them in a
smallpaper. Show your boss that sequences are fairly standard and he should come around. 

Vilson farias wrote:
> Greetings,
>
>   I'm getting a big performance problem and I would like to ask you what
> would be the reason, but first I need to explain how it happens.
>
>   Let's suppose I can't use sequences (it seams impossible but my boss
> doesn't like specific database features like this one).
>
>   For sequence simulation I had created a table called cnfg_key_generation
> and each tuple holds information for one of my tables (tablename,
> current_sequencial_number). Lets check :
>
> CREATE TABLE cnfg_key_generation (
>        department  integer     NOT NULL,
>        table_name  varchar(20) NOT NULL,
>        current_key integer NOT NULL,
>        CONSTRAINT XPKcnfg_key_generation PRIMARY KEY (department,
> table_name)
> );
>
>
> Per example, for a table called 'my_test' I would have the following values
> :
>   department  = 1
>   table_name  = 'my_test'
>   current_key = 1432
>
> Everytime I want a new key to use in my_test primary-key I just increment
> current_key value. For this job, I've created a simple stored procedure
> called key_generation
>
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer AS'
> DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
>   new_key_value  integer;
>   err_num        integer;
> BEGIN
>   new_value := 0;
>
>   LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;
>
>   SELECT current_value INTO new_value
>     FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name;
>
>   IF NOT FOUND THEN
>     new_key_value := 1;
>     INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,
> new_key_value);
>   ELSE
>     new_key_value := new_key_value + 1;
>
>     UPDATE cnfg_key_generation
>       SET current_key_value = new_key_value
>       WHERE department = the_department AND
>             table_name = the_table_name;
>   END IF;
>
>   RETURN new_key_value;
>
> END;
> '
> LANGUAGE 'plpgsql';
>
>
> Data insertion is done by the following way :
>
> INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
> fields...);
>
> Ok, this works, but has a huge performance problem. I read in pg docs that
> everytime I make an UPDATE, a new tuple is created and the old is marked as
> invalid. For a tuple that holds sequencial key generation, everytime a new
> key is generated, a new tuple is created inside cfg_key_generation. It means
> after 2million key generations for same table, performance will be completly
> degradated because there will be 2million of old versions of same tuple. For
> instance, I have a table called 'cham_chamada' that actually holds
> 1.5Million of tuples. The response time for key_generation execution for
> this table is more than 5seconds. In this same case if I execute
> key_generation for a table that has just few values (current key = 5 per
> example), response time is just some miliseconds (30 to 50ms).
>
> I tryied to fix this problem with a VACUUM and it was completly ineffective.
> After execution the problem was still there. Later, after execution of every
> kind of vacuum I knew (with and without ANALYZE, especific for that table,
> vacuumdb from shell ...) I gave up and tryied a COPY TO/DROP/CREATE/COPY
> FROM. At this time, the performance problem was fixed.
>
> What can I do to solve this problem without table reconstruction? Is there a
> performance degradation in PostgreSQL that can't be solved? If a have a huge
> table with millions of data being update constantly, what can I do to keep a
> good performance if vacuum isn't 100%?
>
> Does PostgreSQL sequences deal with these kinds performance questions? Is it
> clean, fast and secury to use sequences? Maybe I still can make up my boss
> mind about this subject if I get good arguments here to justify the use of
> sequences.
>
> Am I doing some stupid thing?
>
> Best regards,
>
> -----------------------------------------------------------------
> Jos? Vilson de Mello de Farias
> Software Engineer
>
> D?gitro Tecnologia Ltda - www.digitro.com.br
> APC - Customer Oriented Applications
> E-mail: vilson.farias@digitro.com.br
> Tel.: +55 48 281 7158
> ICQ 11866179
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>


Re: Simulating sequences

От
Paul Ramsey
Дата:
Vilson farias wrote:

> Does PostgreSQL sequences deal with these kinds performance questions? Is it
> clean, fast and secury to use sequences? Maybe I still can make up my boss
> mind about this subject if I get good arguments here to justify the use of
> sequences.

Sorry, but you just outlined a sequence replacement which is a big hunk
of PL/PgSQL! How is that not completely specific to PostgreSQL? The
PgSQL 'serial' type is close enough to other RDBMS autoincrement types
that porting to a different DB should be trivial. Porting your PL/PgSQL,
that will be hard (particularly if you decide to go to something like
MySQL, which doesn't even support procedural languages).

--
       __
      /
      | Paul Ramsey
      | Refractions Research
      | Email: pramsey@refractions.net
      | Phone: (250) 885-0632
      \_


Re: Simulating sequences

От
Jan Wieck
Дата:
Vilson farias wrote:

> Greetings,
>
>   I'm getting a big performance problem and I would like to ask you what
> would be the reason, but first I need to explain how it happens.
>
>   Let's suppose I can't use sequences (it seams impossible but my boss
> doesn't like specific database features like this one).

I don't see how using PL/pgSQL is any better with respect to specific
database feature, but that aside for a moment.

Your function not only misses the required FOR UPDATE when reading the
(possibly existing) current value, it also contains a general race
condition. Multiple concurrent transactions could try inserting the new
key and every but one would error out with a duplicate key error.

Sequence values are int8 and are by default safe against integer rollover.

Sequences do not rollback and therefore don't need to wait for
concurrent transactions to finish. Your table based replacement is a
major concurrency bottleneck. As soon as a transaction did an insert to
a table, it blocks out every other transaction from inserting into that
table until it either commits or rolls back.

Your VACUUM theory is only partial correct. A frequent VACUUM will
prevent the key table from growing. You'd have to do so very often since
the excess number of obsolete index entries pointing to dead tuples also
degrades your performance. Additionally if there is a very low number of
keys (sequences) in that table, an ANALYZE run might cause the planner
to go for a sequential scan and ignore the index on the table at which
point your function will actually cause "two" sequential scan over all
live and dead tuples of all sequences per call.

Sequences are specially designed to overcome all these issues.

If you cannot convice your boss to use sequences, he is a good example
for why people having difficulties understanding technical issues should
not assume leadership positions in IT projects.


Jan

>
>   For sequence simulation I had created a table called cnfg_key_generation
> and each tuple holds information for one of my tables (tablename,
> current_sequencial_number). Lets check :
>
> CREATE TABLE cnfg_key_generation (
>        department  integer     NOT NULL,
>        table_name  varchar(20) NOT NULL,
>        current_key integer NOT NULL,
>        CONSTRAINT XPKcnfg_key_generation PRIMARY KEY (department,
> table_name)
> );
>
>
> Per example, for a table called 'my_test' I would have the following values
> :
>   department  = 1
>   table_name  = 'my_test'
>   current_key = 1432
>
> Everytime I want a new key to use in my_test primary-key I just increment
> current_key value. For this job, I've created a simple stored procedure
> called key_generation
>
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer AS'
> DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
>   new_key_value  integer;
>   err_num        integer;
> BEGIN
>   new_value := 0;
>
>   LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;
>
>   SELECT current_value INTO new_value
>     FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name;
>
>   IF NOT FOUND THEN
>     new_key_value := 1;
>     INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,
> new_key_value);
>   ELSE
>     new_key_value := new_key_value + 1;
>
>     UPDATE cnfg_key_generation
>       SET current_key_value = new_key_value
>       WHERE department = the_department AND
>             table_name = the_table_name;
>   END IF;
>
>   RETURN new_key_value;
>
> END;
> '
> LANGUAGE 'plpgsql';
>
>
> Data insertion is done by the following way :
>
> INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
> fields...);
>
> Ok, this works, but has a huge performance problem. I read in pg docs that
> everytime I make an UPDATE, a new tuple is created and the old is marked as
> invalid. For a tuple that holds sequencial key generation, everytime a new
> key is generated, a new tuple is created inside cfg_key_generation. It means
> after 2million key generations for same table, performance will be completly
> degradated because there will be 2million of old versions of same tuple. For
> instance, I have a table called 'cham_chamada' that actually holds
> 1.5Million of tuples. The response time for key_generation execution for
> this table is more than 5seconds. In this same case if I execute
> key_generation for a table that has just few values (current key = 5 per
> example), response time is just some miliseconds (30 to 50ms).
>
> I tryied to fix this problem with a VACUUM and it was completly ineffective.
> After execution the problem was still there. Later, after execution of every
> kind of vacuum I knew (with and without ANALYZE, especific for that table,
> vacuumdb from shell ...) I gave up and tryied a COPY TO/DROP/CREATE/COPY
> FROM. At this time, the performance problem was fixed.
>
> What can I do to solve this problem without table reconstruction? Is there a
> performance degradation in PostgreSQL that can't be solved? If a have a huge
> table with millions of data being update constantly, what can I do to keep a
> good performance if vacuum isn't 100%?
>
> Does PostgreSQL sequences deal with these kinds performance questions? Is it
> clean, fast and secury to use sequences? Maybe I still can make up my boss
> mind about this subject if I get good arguments here to justify the use of
> sequences.
>
> Am I doing some stupid thing?
>
> Best regards,
>
> -----------------------------------------------------------------
> José Vilson de Mello de Farias
> Software Engineer
>
> Dígitro Tecnologia Ltda - www.digitro.com.br
> APC - Customer Oriented Applications
> E-mail: vilson.farias@digitro.com.br
> Tel.: +55 48 281 7158
> ICQ 11866179
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Simulating sequences

От
Bruno Wolff III
Дата:
On Mon, Aug 18, 2003 at 11:27:14 -0300,
  Vilson farias <vilson.farias@digitro.com.br> wrote:
>
> I tryied to fix this problem with a VACUUM and it was completly ineffective.
> After execution the problem was still there. Later, after execution of every
> kind of vacuum I knew (with and without ANALYZE, especific for that table,
> vacuumdb from shell ...) I gave up and tryied a COPY TO/DROP/CREATE/COPY
> FROM. At this time, the performance problem was fixed.

Did you try VACUUM FULL?

If you are doing just a normal vacuum and waited until there were over
a million tuples in the table, your FSM setting probably wasn't high
enough to let you recover the space.

> What can I do to solve this problem without table reconstruction? Is there a
> performance degradation in PostgreSQL that can't be solved? If a have a huge
> table with millions of data being update constantly, what can I do to keep a
> good performance if vacuum isn't 100%?

You want to vacuum the table a lot more often. I remember a post (that should
be in the archives) where someone calculated how many updates you could go
before the dead tuples took up more than one block. The suggestion was that
that was the point where you want to vacuum the table.

> Does PostgreSQL sequences deal with these kinds performance questions? Is it
> clean, fast and secury to use sequences? Maybe I still can make up my boss
> mind about this subject if I get good arguments here to justify the use of
> sequences.

Besides solving a dead tuple problem, using sequences also avoids contention
by not having to hold locks for the duration of a transaction.


Re: Simulating sequences

От
Tino Wildenhain
Дата:
Hi Vilson,

Vilson farias wrote:

> Does PostgreSQL sequences deal with these kinds performance questions? Is it
> clean, fast and secury to use sequences? Maybe I still can make up my boss
> mind about this subject if I get good arguments here to justify the use of
> sequences.

Yes, exactly. Its clean, fast and secure.
The trick is, sequences life outside of transactions
and nextval() is never rolled back. So you dont have
to lock and you dont have to worry about duplicate
keys.

Its not quite possible to not use database specific
code when wanting a great performance the same time.

Fortunately postgresql is very close to SQL-spec,
so you arent so much walking on the dark side
if you adopt postgres style SQL.

Regards
Tino Wildenhain


Re: Simulating sequences

От
Дата:
>   I'm getting a big performance problem and I would like to ask you
> what
> would be the reason, but first I need to explain how it happens.
>
>   Let's suppose I can't use sequences (it seams impossible but my boss
> doesn't like specific database features like this one).

I can't help you with the details of the performance problem, but I did
have a situation similar in that I had to maintain sequences "manually",
rather than use the PostgreSQL serial data type. The advice I got here
was to "update first, then select". Two important points I learned from
the gurus in this forum were

1) since in my case I was manipulating my simulated sequence inside a
trigger, there is an implicit transaction around the trigger associated
with the insert or update statement that fires the trigger

2) an update statement locks the record until the transaction commits.


With those items in mind, your function could become:

CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
AS'
DECLARE
  the_department ALIAS FOR $1;
  the_table_name ALIAS FOR $2;
BEGIN
  IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
    WHERE the_department = department AND the_table_name = table_name) THEN
   INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,0);
  END IF;


    UPDATE cnfg_key_generation
      SET current_key_value = 1 + current_key_value
      WHERE department = the_department AND
            table_name = the_table_name;

  RETURN (SELECT current_value INTO new_value
     FROM cnfg_key_generation
    WHERE the_department = department AND the_table_name = table_name);
END;



> Per example, for a table called 'my_test' I would have the following
> values :
>   department  = 1
>   table_name  = 'my_test'
>   current_key = 1432
>
> Everytime I want a new key to use in my_test primary-key I just
> increment current_key value. For this job, I've created a simple stored
> procedure called key_generation
>
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
> AS' DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
>   new_key_value  integer;
>   err_num        integer;
> BEGIN
>   new_value := 0;
>
>   LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;
>
>   SELECT current_value INTO new_value
>     FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name;
>
>   IF NOT FOUND THEN
>     new_key_value := 1;
>     INSERT INTO cnfg_key_generation VALUES
> (the_department,the_table_name,
> new_key_value);
>   ELSE
>     new_key_value := new_key_value + 1;
>
>     UPDATE cnfg_key_generation
>       SET current_key_value = new_key_value
>       WHERE department = the_department AND
>             table_name = the_table_name;
>   END IF;
>
>   RETURN new_key_value;
>
> END;
> '
> LANGUAGE 'plpgsql';
>
>
> Data insertion is done by the following way :
>
> INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
> fields...);
>


~Berend Tober




Re: Simulating sequences

От
Stephan Szabo
Дата:
On Mon, 18 Aug 2003, Vilson farias wrote:

>   I'm getting a big performance problem and I would like to ask you what
> would be the reason, but first I need to explain how it happens.
>
>   Let's suppose I can't use sequences (it seams impossible but my boss
> doesn't like specific database features like this one).
>
>   For sequence simulation I had created a table called cnfg_key_generation
> and each tuple holds information for one of my tables (tablename,
> current_sequencial_number). Lets check :
>
> CREATE TABLE cnfg_key_generation (
>        department  integer     NOT NULL,
>        table_name  varchar(20) NOT NULL,
>        current_key integer NOT NULL,
>        CONSTRAINT XPKcnfg_key_generation PRIMARY KEY (department,
> table_name)
> );
>
>
> Per example, for a table called 'my_test' I would have the following values
> :
>   department  = 1
>   table_name  = 'my_test'
>   current_key = 1432
>
> Everytime I want a new key to use in my_test primary-key I just increment
> current_key value. For this job, I've created a simple stored procedure
> called key_generation
>
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer AS'
> DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
>   new_key_value  integer;
>   err_num        integer;
> BEGIN
>   new_value := 0;
>
>   LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;
>
>   SELECT current_value INTO new_value
>     FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name;
>
>   IF NOT FOUND THEN
>     new_key_value := 1;
>     INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,
> new_key_value);
>   ELSE
>     new_key_value := new_key_value + 1;
>
>     UPDATE cnfg_key_generation
>       SET current_key_value = new_key_value
>       WHERE department = the_department AND
>             table_name = the_table_name;
>   END IF;
>
>   RETURN new_key_value;
>
> END;
> '
> LANGUAGE 'plpgsql';
>
>
> Data insertion is done by the following way :
>
> INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
> fields...);
>
> Ok, this works, but has a huge performance problem. I read in pg docs that
> everytime I make an UPDATE, a new tuple is created and the old is marked as
> invalid. For a tuple that holds sequencial key generation, everytime a new
> key is generated, a new tuple is created inside cfg_key_generation. It means
> after 2million key generations for same table, performance will be completly
> degradated because there will be 2million of old versions of same tuple. For
> instance, I have a table called 'cham_chamada' that actually holds
> 1.5Million of tuples. The response time for key_generation execution for
> this table is more than 5seconds. In this same case if I execute
> key_generation for a table that has just few values (current key = 5 per
> example), response time is just some miliseconds (30 to 50ms).
>

> I tryied to fix this problem with a VACUUM and it was completly ineffective.

From that state, you might want to try a REINDEX (in case you're running
into index growth problems), and look at the output of VACUUM VERBOSE.
Also, you might want to see what kind of plans are being generated for the
queries inside the function (rather than what would be generated on the
command line with values inserted).  I believe Tom Lane sent an example of
doing this within the last couple of months to one of the mailing lists,
you may be able to find it in the archives.

One other thing is that you should probably be vacuuming this table alot
more frequently than you appear to be. Depending on the frequency of calls
to this, you may also want to raise your free space map settings (you can
get some guidelines later based on the results of vacuum verbose).

> Does PostgreSQL sequences deal with these kinds performance questions? Is it
> clean, fast and secury to use sequences? Maybe I still can make up my boss
> mind about this subject if I get good arguments here to justify the use of
> sequences.

Generally, yes.  They also don't force you down to only having one
transaction that attempts to get the value at a time, but don't actually
guarantee sequential values, merely unique ones (within their range).


Re: Simulating sequences

От
Dennis Gearon
Дата:
wouldn't a better situation be ADDING a record that is one higher, and then doing a select MAX()?

The different triggers could do delete on the old records.

btober@seaworthysys.com wrote:

>>  I'm getting a big performance problem and I would like to ask you
>>what
>>would be the reason, but first I need to explain how it happens.
>>
>>  Let's suppose I can't use sequences (it seams impossible but my boss
>>doesn't like specific database features like this one).
>
>
> I can't help you with the details of the performance problem, but I did
> have a situation similar in that I had to maintain sequences "manually",
> rather than use the PostgreSQL serial data type. The advice I got here
> was to "update first, then select". Two important points I learned from
> the gurus in this forum were
>
> 1) since in my case I was manipulating my simulated sequence inside a
> trigger, there is an implicit transaction around the trigger associated
> with the insert or update statement that fires the trigger
>
> 2) an update statement locks the record until the transaction commits.
>
>
> With those items in mind, your function could become:
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
> AS'
> DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
> BEGIN
>   IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name) THEN
>    INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,0);
>   END IF;
>
>
>     UPDATE cnfg_key_generation
>       SET current_key_value = 1 + current_key_value
>       WHERE department = the_department AND
>             table_name = the_table_name;
>
>   RETURN (SELECT current_value INTO new_value
>      FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name);
> END;
>
>
>
>
>>Per example, for a table called 'my_test' I would have the following
>>values :
>>  department  = 1
>>  table_name  = 'my_test'
>>  current_key = 1432
>>
>>Everytime I want a new key to use in my_test primary-key I just
>>increment current_key value. For this job, I've created a simple stored
>>procedure called key_generation
>>
>>
>>CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
>>AS' DECLARE
>>  the_department ALIAS FOR $1;
>>  the_table_name ALIAS FOR $2;
>>  new_key_value  integer;
>>  err_num        integer;
>>BEGIN
>>  new_value := 0;
>>
>>  LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;
>>
>>  SELECT current_value INTO new_value
>>    FROM cnfg_key_generation
>>    WHERE the_department = department AND the_table_name = table_name;
>>
>>  IF NOT FOUND THEN
>>    new_key_value := 1;
>>    INSERT INTO cnfg_key_generation VALUES
>>(the_department,the_table_name,
>>new_key_value);
>>  ELSE
>>    new_key_value := new_key_value + 1;
>>
>>    UPDATE cnfg_key_generation
>>      SET current_key_value = new_key_value
>>      WHERE department = the_department AND
>>            table_name = the_table_name;
>>  END IF;
>>
>>  RETURN new_key_value;
>>
>>END;
>>'
>>LANGUAGE 'plpgsql';
>>
>>
>>Data insertion is done by the following way :
>>
>>INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
>>fields...);
>>
>
>
>
> ~Berend Tober
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Simulating sequences

От
"Vilson farias"
Дата:
First I would like to thanks everybody that helped me so far.

Mr. Wiek, you are right, the stored procedure I've been using has a lot of
problems related with concurrency control. You gave me valuable information
and it's gonna help me making good arguments against this technique.

> Your VACUUM theory is only partial correct. A frequent VACUUM will
> prevent the key table from growing. You'd have to do so very often since
> the excess number of obsolete index entries pointing to dead tuples also
> degrades your performance. Additionally if there is a very low number of
> keys (sequences) in that table, an ANALYZE run might cause the planner
> to go for a sequential scan and ignore the index on the table at which
> point your function will actually cause "two" sequential scan over all
> live and dead tuples of all sequences per call.

About vacuum, when does analyze make planner to go for sequencial scan
instead indexed scan? How can I prevent planner from make this mistake? I
have some tables that have this problem when a SQL is running against their
primaries keys. For these tables, at creation time they are all "Index Scan"
from explain but after first vacuum is ran, they will be "Sequencial Scan"
forever. They are tables with few data, but there are data enough to slow
down the whole application when a sequencial scan is used. Where can I find
more information about it wrong sequencial scan prevention? This is one of
hardier to understand itens about PostgreSQL. Is there some tech docs
available around that you guys would like to suggest?

> If you cannot convice your boss to use sequences, he is a good example
> for why people having difficulties understanding technical issues should
> not assume leadership positions in IT projects.

That's my point of view. But sometimes is very hard to fight against people
that have no arguments but only influence as weapon.

Best Regards,
Vilson



Re: Simulating sequences

От
Дата:
> wouldn't a better situation be ADDING a record that is one higher, and
> then doing a select MAX()?
>
> The different triggers could do delete on the old records.
>

In my case that would not apply, because what I had was a need to keep a
"sequence" counter for each employee, so I added a column
("expense_report_seq") to the employee table:

CREATE TABLE paid.employee (
  employee_pk serial,
  person_pk int4 NOT NULL,
  employee_identifier varchar(24),
  hire_date date,
  termination_date date,
  health_insurance_code_pk int4,
  performance_review_date date,
  emergency_contact_pk int4,
  labor_category_pk int4,
  expense_report_seq int4 DEFAULT 0);

The incremented value of the expense_report_seq column is then inserted
in the expense_pk column for a new row in the expense table, thus keeping
a separate sequence for each employee:

CREATE TABLE paid.expense (
  project_pk int4 NOT NULL,
  organization_pk int4 NOT NULL,
  employee_pk int4 NOT NULL,
  expense_pk int4 NOT NULL,
  expense_report_date date DEFAULT now() NOT NULL,
  expense_date date DEFAULT now() NOT NULL,
  CONSTRAINT expense_pkey PRIMARY KEY (project_pk, organization_pk,
employee_pk, expense_pk),
  CONSTRAINT expense_fkey FOREIGN KEY (employee_pk) REFERENCES employee
(employee_pk)
) WITHOUT OIDS;


Then there is the trigger:

CREATE TRIGGER expense_bit BEFORE INSERT ON paid.expense FOR EACH ROW
EXECUTE PROCEDURE expense_bit();

where

CREATE FUNCTION paid.expense_bit() RETURNS trigger AS '
BEGIN
  SELECT INTO NEW.expense_pk expense_report_next(new.employee_pk);
  RETURN new;
END;
'  LANGUAGE 'plpgsql' VOLATILE;

where

CREATE FUNCTION paid.expense_report_next(int4) RETURNS int4 AS '
DECLARE
  l_employee_pk ALIAS FOR $1;
BEGIN
  UPDATE employee
    SET expense_report_seq = (expense_report_seq + 1)
    WHERE employee_pk = l_employee_pk;

RETURN (SELECT expense_report_seq FROM employee  WHERE employee_pk =
l_employee_pk) ;
END;'  LANGUAGE 'plpgsql' VOLATILE;


Seems to work o.k., but this is not a large database with gazillions of
transactions.

~Berend Tober




Re: Simulating sequences

От
Tom Lane
Дата:
"Vilson farias" <vilson.farias@digitro.com.br> writes:
>   Let's suppose I can't use sequences (it seams impossible but my boss
> doesn't like specific database features like this one).

If sequences could be effectively replaced by standard SQL operations,
we would not have bothered to invent them.  Nor would other databases
have bothered to invent their comparable features (autoincrement in
MySQL, etc).  Your boss has got his head screwed on backwards on this
point --- writing a sequence replacement will not work well, and will
not be markedly more portable to other databases.

            regards, tom lane

Re: Simulating sequences

От
Dustin Sallings
Дата:
On Monday, Aug 18, 2003, at 07:27 US/Pacific, Vilson farias wrote:

> Greetings,
>
>   I'm getting a big performance problem and I would like to ask you
> what
> would be the reason, but first I need to explain how it happens.
>
>   Let's suppose I can't use sequences (it seams impossible but my boss
> doesn't like specific database features like this one).

    You're just replacing the postgres implementation of sequences with
your own that has a similar API.  The postgres will have a few
advantages over what you'll be able to write in plpgsql.  :)

    That said, I don't use native sequences for large projects.  I do
something with a similar table to yours, but I have a key cache size on
each key.  A container refetches keys when it runs out.  I think I
wrote it originally because it made work easier (it makes complex
object relations easier if you can give things unique IDs before
storing them), but it turns out that it performs really well because
it's just an update statement to adjust the frequency of the key table
access.

    If you're working in java, you can use my stuff from here:

    http://bleu.west.spy.net/~dustin/projects/spyjar.xtp

    If you're implementing your own, you can read the particular class
docs here:

    http://bleu.west.spy.net/~dustin/spyjar/j2/doc/net/spy/db/GetPK.html

>   For sequence simulation I had created a table called
> cnfg_key_generation
> and each tuple holds information for one of my tables (tablename,
> current_sequencial_number). Lets check :
>
> CREATE TABLE cnfg_key_generation (
>        department  integer     NOT NULL,
>        table_name  varchar(20) NOT NULL,
>        current_key integer NOT NULL,
>        CONSTRAINT XPKcnfg_key_generation PRIMARY KEY (department,
> table_name)
> );
>
>
> Per example, for a table called 'my_test' I would have the following
> values
> :
>   department  = 1
>   table_name  = 'my_test'
>   current_key = 1432
>
> Everytime I want a new key to use in my_test primary-key I just
> increment
> current_key value. For this job, I've created a simple stored procedure
> called key_generation
>
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
> AS'
> DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
>   new_key_value  integer;
>   err_num        integer;
> BEGIN
>   new_value := 0;
>
>   LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;
>
>   SELECT current_value INTO new_value
>     FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name;
>
>   IF NOT FOUND THEN
>     new_key_value := 1;
>     INSERT INTO cnfg_key_generation VALUES
> (the_department,the_table_name,
> new_key_value);
>   ELSE
>     new_key_value := new_key_value + 1;
>
>     UPDATE cnfg_key_generation
>       SET current_key_value = new_key_value
>       WHERE department = the_department AND
>             table_name = the_table_name;
>   END IF;
>
>   RETURN new_key_value;
>
> END;
> '
> LANGUAGE 'plpgsql';
>
>
> Data insertion is done by the following way :
>
> INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
> fields...);
>
> Ok, this works, but has a huge performance problem. I read in pg docs
> that
> everytime I make an UPDATE, a new tuple is created and the old is
> marked as
> invalid. For a tuple that holds sequencial key generation, everytime a
> new
> key is generated, a new tuple is created inside cfg_key_generation. It
> means
> after 2million key generations for same table, performance will be
> completly
> degradated because there will be 2million of old versions of same
> tuple. For
> instance, I have a table called 'cham_chamada' that actually holds
> 1.5Million of tuples. The response time for key_generation execution
> for
> this table is more than 5seconds. In this same case if I execute
> key_generation for a table that has just few values (current key = 5
> per
> example), response time is just some miliseconds (30 to 50ms).
>
> I tryied to fix this problem with a VACUUM and it was completly
> ineffective.
> After execution the problem was still there. Later, after execution of
> every
> kind of vacuum I knew (with and without ANALYZE, especific for that
> table,
> vacuumdb from shell ...) I gave up and tryied a COPY
> TO/DROP/CREATE/COPY
> FROM. At this time, the performance problem was fixed.
>
> What can I do to solve this problem without table reconstruction? Is
> there a
> performance degradation in PostgreSQL that can't be solved? If a have
> a huge
> table with millions of data being update constantly, what can I do to
> keep a
> good performance if vacuum isn't 100%?
>
> Does PostgreSQL sequences deal with these kinds performance questions?
> Is it
> clean, fast and secury to use sequences? Maybe I still can make up my
> boss
> mind about this subject if I get good arguments here to justify the
> use of
> sequences.
>
> Am I doing some stupid thing?
>
> Best regards,
>
> -----------------------------------------------------------------
> José Vilson de Mello de Farias
> Software Engineer
>
> Dígitro Tecnologia Ltda - www.digitro.com.br
> APC - Customer Oriented Applications
> E-mail: vilson.farias@digitro.com.br
> Tel.: +55 48 281 7158
> ICQ 11866179
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>
--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: Simulating sequences

От
Dustin Sallings
Дата:
On Monday, Aug 18, 2003, at 09:01 US/Pacific, <btober@seaworthysys.com>
wrote:

> With those items in mind, your function could become:
>
> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
> AS'
> DECLARE
>   the_department ALIAS FOR $1;
>   the_table_name ALIAS FOR $2;
> BEGIN
>   IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
>     WHERE the_department = department AND the_table_name = table_name)
> THEN
>    INSERT INTO cnfg_key_generation VALUES
> (the_department,the_table_name,0);
>   END IF;

    I would get the insert out of there, too.  If it doesn't exist, throw
an exception.  I don't believe sequences should automatically create
themselves (the tables and columns don't).

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: Simulating sequences

От
Дата:
>
> On Monday, Aug 18, 2003, at 09:01 US/Pacific, <btober@seaworthysys.com>
>  wrote:
>
>> With those items in mind, your function could become:
>>
>> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
>> AS'
>> DECLARE
>>   the_department ALIAS FOR $1;
>>   the_table_name ALIAS FOR $2;
>> BEGIN
>>   IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
>>     WHERE the_department = department AND the_table_name =
>> table_name)
>> THEN
>>    INSERT INTO cnfg_key_generation VALUES
>> (the_department,the_table_name,0);
>>   END IF;
>
>     I would get the insert out of there, too.  If it doesn't exist, throw
> an exception.  I don't believe sequences should automatically create
> themselves (the tables and columns don't).
>

I agree. In my own case I need a sequence for each employee, and the
simulated sequence is defined as a column in the employee table, so I'm
guaranteed to have a a place to do the incrementing when the need arises.
Also, I used a "DEFAULT 0" clause on the column definition for the
sequence value, rather than explicitly inserting a zero. I left the
insert statement in place for compatibility with the original inquirer's
definition.

~Berend Tober




Re: Simulating sequences

От
Dennis Gearon
Дата:
so has the final implemention become:

A/ A column value per employee
B/ A trigger to implement the incrementing of the value
C/ A row/record lock to enforce atomicity

btober@seaworthysys.com wrote:

>>On Monday, Aug 18, 2003, at 09:01 US/Pacific, <btober@seaworthysys.com>
>> wrote:
>>
>>
>>>With those items in mind, your function could become:
>>>
>>>CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
>>>AS'
>>>DECLARE
>>>  the_department ALIAS FOR $1;
>>>  the_table_name ALIAS FOR $2;
>>>BEGIN
>>>  IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
>>>    WHERE the_department = department AND the_table_name =
>>>table_name)
>>>THEN
>>>   INSERT INTO cnfg_key_generation VALUES
>>>(the_department,the_table_name,0);
>>>  END IF;
>>
>>    I would get the insert out of there, too.  If it doesn't exist, throw
>>an exception.  I don't believe sequences should automatically create
>>themselves (the tables and columns don't).
>>
>
>
> I agree. In my own case I need a sequence for each employee, and the
> simulated sequence is defined as a column in the employee table, so I'm
> guaranteed to have a a place to do the incrementing when the need arises.
> Also, I used a "DEFAULT 0" clause on the column definition for the
> sequence value, rather than explicitly inserting a zero. I left the
> insert statement in place for compatibility with the original inquirer's
> definition.
>
> ~Berend Tober
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>


Re: Simulating sequences

От
Дата:
> so has the final implemention become:
> A/ A column value per employee

Almost. More precisely, it is, as you would expect, one ROW per employee
(in the employee table). The employee table includes a COLUMN defined to
store the most-recently-issued number in the sequence for that paricular
employee.

> B/ A trigger to implement the incrementing of the value

Yes. When an expense report is created for an employee, i.e., a row is
inserted in the expense_report table, that trigger fires, updating the
employee's sequence column value and using that new value in the
expense_report table.

> C/ A row/record lock to enforce atomicity

The lock is not explicite. I'm told from good sources that the UPDATE
statement creates a lock implicitely in the updated row, and that there
is an implicite transaction around the trigger as part of the originating
INSERT statement.

>
> btober@seaworthysys.com wrote:
>
>>>On Monday, Aug 18, 2003, at 09:01 US/Pacific,
>>> <btober@seaworthysys.com>
>>> wrote:
>>>
>>>
>>>>With those items in mind, your function could become:
>>>>
>>>>CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
>>>> AS'
>>>>DECLARE
>>>>  the_department ALIAS FOR $1;
>>>>  the_table_name ALIAS FOR $2;
>>>>BEGIN
>>>>  IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
>>>>    WHERE the_department = department AND the_table_name =
>>>>table_name)
>>>>THEN
>>>>   INSERT INTO cnfg_key_generation VALUES
>>>>(the_department,the_table_name,0);
>>>>  END IF;
>>>
>>>    I would get the insert out of there, too.  If it doesn't exist,
>>> throw
>>>an exception.  I don't believe sequences should automatically create
>>> themselves (the tables and columns don't).
>>>
>>
>>
>> I agree. In my own case I need a sequence for each employee, and the
>> simulated sequence is defined as a column in the employee table, so
>> I'm guaranteed to have a a place to do the incrementing when the need
>> arises. Also, I used a "DEFAULT 0" clause on the column definition
>> for the sequence value, rather than explicitly inserting a zero. I
>> left the insert statement in place for compatibility with the
>> original inquirer's definition.
>>

~Berend Tober




Re: Simulating sequences

От
Jan Wieck
Дата:
Dustin Sallings wrote:

> On Monday, Aug 18, 2003, at 09:01 US/Pacific, <btober@seaworthysys.com>
> wrote:
>
>> With those items in mind, your function could become:
>>
>> CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer
>> AS'
>> DECLARE
>>   the_department ALIAS FOR $1;
>>   the_table_name ALIAS FOR $2;
>> BEGIN
>>   IF NOT EXISTS(SELECT 1 FROM cnfg_key_generation
>>     WHERE the_department = department AND the_table_name = table_name)
>> THEN
>>    INSERT INTO cnfg_key_generation VALUES
>> (the_department,the_table_name,0);
>>   END IF;
>
>     I would get the insert out of there, too.  If it doesn't exist, throw
> an exception.  I don't believe sequences should automatically create
> themselves (the tables and columns don't).

Not in PostgreSQL or any other serious database. But given that MySQL
tries hard to never error out and mangles over the input in an even
destructive way by concept, you might have a fair chance to get a
misspelled column name in an INSERT statement interpreted as implicit
ALTER TABLE ...


Jan

>
> --
> SPY                      My girlfriend asked me which one I like better.
> pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
> |    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
> L_______________________ I hope the answer won't upset her. ____________
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #