Обсуждение: Does a connection support multiple transactions.

Поиск

Список

Период

Сортировка

Does a connection support multiple transactions.

От

johnf

Дата:

27 марта 2006 г., 15:29:03

Hi,

Suse 10.0 using the Free Pascal Compiler connection version 2.1.1.

I have opened a connection to a database (FPC uses the postgres c interface).
When I attempt to make two queries using the same connection I get an error
that suggest I have to close the first query (it actually sends a 'begin').
The question is NOT how Pascal works but if Postgres will accept multiple
queries via one connection or do queries have to be serial. Will postgres
allow two queries to overlap?  I think postgres will allow nested
'begins' (not sure about that).   I'm not sure I'm asking the question
correctly.  But in the windows world I only open one connection (normally via
ODBC) and can make several queries (one right after the other and sometimes
in the same query i.e 'select * from customers;select * from contacts').
Each of the queries are live and I'm able to update if required.  I have to
save the data with a commit but I still can have the two queries available.


I'm sure this question is not very clear but I hope some guru will figure it
out.


Thanks

John

Time consuming process ...3 million records please help

От

Srinivas Iyyer

Дата:

27 марта 2006 г., 21:06:31

Dear group,
  I have two tables and a temp table where I uploaded
data using \copy. This temp table has "3,348,107"
lines of data.

I wrote a plpgsql function to read each record in
temp_table, take the firt col. data (temp_refseq_id)
and get corresponding 'seq_id' from table B and insert
into table A.

I started this process 8 hrs back. It has been running
for last 8 hrs and yet it is not finished. the reason
i did this temp table thing was to speedup process by
writing a server side function.  I still did not get
to win over the time issue here. If this is a
continuous problem I will have to search for another
db system since my files from now on are huge and has
over mil records. I am betting a lot of time in this
case.

Could any one help writing a faster function.

thanks
looking forward to hear from people.

Temp_table:

 temp_refseq_id |  temp_imageid  | temp_genbankacc
----------------+----------------+-----------------
 NM_003604      | IMAGE:1099538  | AA594716
 NM_003604      | IMAGE:853047   | AA668250
 NM_001008860   | IMAGE:3640970  | BC011775
 NM_001008860   | IMAGE:3640970  | BE737509
 NM_001008860   | IMAGE:6040319  | BU079001
 NM_001008860   | IMAGE:6040319  | BU078725
 NM_001008860   | IMAGE:3451448  | BC000957
 NM_001008860   | IMAGE:3451448  | BE539334
 NM_001008860   | IMAGE:4794135  | BG708105
 NM_001008860   | IMAGE:5214087  | BI911674

Table A : (I want to upload data from temp to here)

 spota_id | seq_id | spota_imageid | spota_genbacc
----------+--------+---------------+--------
     23   |  54525 | IMAGE:1099538 | AA594716




Table B : This table is seqdump table where seq_id is
a FK in Table B

 seq_id |  seq_acc    | seq_name
------------------------------
  54519 | NM_152918   | EMR2
  54520 | NM_001008860| CGGBP1
  54521 | NM_020040   | TUBB4Q
  54522 | NM_017525   | CDC42BPG
  54523 | NM_032158   | WBSCR20C
  54524 | NM_004047   | ATP6V0B
  54525 | NM_003604   | PLCB3


Function:

CREATE FUNCTION tab_update() RETURNS integer AS '
DECLARE
        referrer_keys RECORD;
BEGIN
        FOR referrer_keys IN SELECT * from
temp_spotanno LOOP
                INSERT INTO spotanno(seq_id,
spotanno_imageid,spotanno_genbankacc) values((SELECT
seq_id from seqdump where seq_acc =
referrer_keys.temp_refseq_id),referrer_keys.temp_imageid,referrer_keys.temp_genbankacc);
        END LOOP;
return 0;

END;
' LANGUAGE plpgsql;


Thanks
Sri

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Time consuming process ...3 million records please help

От

Sean Davis

Дата:

27 марта 2006 г., 21:35:34

Srinivas Iyyer wrote:
> Dear group,
>   I have two tables and a temp table where I uploaded
> data using \copy. This temp table has "3,348,107"
> lines of data.
>
> I wrote a plpgsql function to read each record in
> temp_table, take the firt col. data (temp_refseq_id)
> and get corresponding 'seq_id' from table B and insert
> into table A.
>
> I started this process 8 hrs back. It has been running
> for last 8 hrs and yet it is not finished. the reason
> i did this temp table thing was to speedup process by
> writing a server side function.  I still did not get
> to win over the time issue here. If this is a
> continuous problem I will have to search for another
> db system since my files from now on are huge and has
> over mil records. I am betting a lot of time in this
> case.
>
> Could any one help writing a faster function.
>
> thanks
> looking forward to hear from people.
>
> Temp_table:
>
>  temp_refseq_id |  temp_imageid  | temp_genbankacc
> ----------------+----------------+-----------------
>  NM_003604      | IMAGE:1099538  | AA594716
>  NM_003604      | IMAGE:853047   | AA668250
>  NM_001008860   | IMAGE:3640970  | BC011775
>  NM_001008860   | IMAGE:3640970  | BE737509
>  NM_001008860   | IMAGE:6040319  | BU079001
>  NM_001008860   | IMAGE:6040319  | BU078725
>  NM_001008860   | IMAGE:3451448  | BC000957
>  NM_001008860   | IMAGE:3451448  | BE539334
>  NM_001008860   | IMAGE:4794135  | BG708105
>  NM_001008860   | IMAGE:5214087  | BI911674
>
> Table A : (I want to upload data from temp to here)
>
>  spota_id | seq_id | spota_imageid | spota_genbacc
> ----------+--------+---------------+--------
>      23   |  54525 | IMAGE:1099538 | AA594716
>
>
>
>
> Table B : This table is seqdump table where seq_id is
> a FK in Table B
>
>  seq_id |  seq_acc    | seq_name
> ------------------------------
>   54519 | NM_152918   | EMR2
>   54520 | NM_001008860| CGGBP1
>   54521 | NM_020040   | TUBB4Q
>   54522 | NM_017525   | CDC42BPG
>   54523 | NM_032158   | WBSCR20C
>   54524 | NM_004047   | ATP6V0B
>   54525 | NM_003604   | PLCB3
>
>
> Function:
>
> CREATE FUNCTION tab_update() RETURNS integer AS '
> DECLARE
>         referrer_keys RECORD;
> BEGIN
>         FOR referrer_keys IN SELECT * from
> temp_spotanno LOOP
>                 INSERT INTO spotanno(seq_id,
> spotanno_imageid,spotanno_genbankacc) values((SELECT
> seq_id from seqdump where seq_acc =
> referrer_keys.temp_refseq_id),referrer_keys.temp_imageid,referrer_keys.temp_genbankacc);
>         END LOOP;
> return 0;
>
> END;
> ' LANGUAGE plpgsql;
>
>
> Thanks
> Sri
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
Hi, Sri.

I would do this differently.  You don't need the function I don't
think.  I would do something like this (untested):

insert into spontanno (seq_id, spontanno_imageid, spontanno_genbankacc)
select seqdump.seq_id, referrer_keys.temp_imageid,
    referrer_keys.temp.genbankacc from referrer_keys, seqdump
    where referrer_keys.temp_refseq_id=seqdump.seq_acc;

Make sure there is an index on seq_acc and temp_refseq_id.  This should
be done in seconds to minutes, not hours.

Sean

Re: Time consuming process ...3 million records please help

От

Srinivas Iyyer

Дата:

27 марта 2006 г., 22:17:16

Hi Sean,
thanks for your reply. in my temp table,
temp_refseq_id column cannot be indexed due to
duplicate values. will that effect drastically.
I havent tested it yet. on top of that i killed 8 hrs
process running previously on tab_update() function.
:-)

thanks
sri


> Hi, Sri.
>
> I would do this differently.  You don't need the
> function I don't
> think.  I would do something like this (untested):
>
> insert into spontanno (seq_id, spontanno_imageid,
> spontanno_genbankacc)
> select seqdump.seq_id, referrer_keys.temp_imageid,
>     referrer_keys.temp.genbankacc from
> referrer_keys, seqdump
>     where
> referrer_keys.temp_refseq_id=seqdump.seq_acc;
>
> Make sure there is an index on seq_acc and
> temp_refseq_id.  This should
> be done in seconds to minutes, not hours.
>
> Sean

-



<snip>



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Time consuming process ...3 million records please help

От

Sean Davis

Дата:

27 марта 2006 г., 22:31:07

Srinivas Iyyer wrote:
> Hi Sean,
> thanks for your reply. in my temp table,
> temp_refseq_id column cannot be indexed due to
> duplicate values. will that effect drastically.
> I havent tested it yet. on top of that i killed 8 hrs
> process running previously on tab_update() function.
> :-)

Sri,

Columns with duplicate values can be indexed; only if you declare an
index to be "unique" will you then have a problem with inserting
duplicates.

You just need to do:

1.  Create your temp table structure.
2.  Copy in your data.
3.  Create an index for all fields that show up in a "where" or join
clause (THIS IS NECESSARY FOR ANY RDBMS TO WORK EFFICIENTLY!!!)

   create index temp_refseq_id_index on TABLENAME(temp_refseq_id);

4.  Vacuum analyze the table(s) after copying and creating the index
(thereby telling postgres what is in the table)

   vacuum analyze TABLENAME

5.  Do something along the lines of:

   insert into ....
   select from ....

That should do it, approximately.  If you still have problems, then you
will need to be more specific about what the problems are, I think.

As an aside, I have similar data stored in a postgres database, but in
some places on the order of several 10's of millions of records;
postgres is not limiting in this respect.

Sean

> thanks
> sri
>
>
>> Hi, Sri.
>>
>> I would do this differently.  You don't need the
>> function I don't
>> think.  I would do something like this (untested):
>>
>> insert into spontanno (seq_id, spontanno_imageid,
>> spontanno_genbankacc)
>> select seqdump.seq_id, referrer_keys.temp_imageid,
>>     referrer_keys.temp.genbankacc from
>> referrer_keys, seqdump
>>     where
>> referrer_keys.temp_refseq_id=seqdump.seq_acc;
>>
>> Make sure there is an index on seq_acc and
>> temp_refseq_id.  This should
>> be done in seconds to minutes, not hours.
>>
>> Sean
>
> -
>
>
>
> <snip>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Re: Non-Unique intems

От

Srinivas Iyyer

Дата:

28 марта 2006 г., 18:12:10

Hi Sean and group,
 thank you for your help. It worked.
However, I guess I stepped on a land mine of unique
and non-unique items.

Here is the problem:
(Example data) I have table A:

seq_id   seq_name
123        ITAM3
234        ITAR


Table B:

spot_id  seq_id   image_name
--------------------------------------------
849343    123       IMAGE: 12335
1348238   234       IMAGE: 12335




Table C:

exp_id |  spot_id  | spot_value
-------|-----------|-----------

Data to insert into Table C
IMAGE: 12335   98.03344






Here the log of query:

arraydb=# SELECT spotanno_id from spotanno
arraydb-# where spotanno_imageid = 'IMAGE:755402';
 spotanno_id
-------------
      849343
     1348238
(2 rows)

arraydb=# select * from spotanno where spotanno_id =
849343;
 spotanno_id | seq_id | spotanno_imageid
-------------+--------+------------------
      849343 |  75343 | IMAGE:755402
(1 row)

arraydb=# select * from spotanno where spotanno_id =
1348238;
 spotanno_id | seq_id | spotanno_imageid
-------------+--------+------------------
     1348238 |  50475 | IMAGE:755402
(1 row)

arraydb=# select * from seqdump where seq_id = 50475;
 seq_id |  seq_acc  | seq_name
--------+-----------+----------
  50475 | NM_005501 | ITGA3
(1 row)

arraydb=# select * from seqdump where seq_id = 75343;
 seq_id |  seq_acc  | seq_name
--------+-----------+----------
  75343 | NM_002204 | ITGA3
(1 row)


An instance of row of the data file that to be
uploaded:


IMAGE:755402    0.299781845119261
12.3638881597060



The question:
when I have a non-unique item (viz. IMAGE:755402 )
what is the approach one should generally take.

Do you have any suggestions/solution. Please help me.

Thanks again.

-sri

>
> That should do it, approximately.  If you still have
> problems, then you
> will need to be more specific about what the
> problems are, I think.
>
> As an aside, I have similar data stored in a
> postgres database, but in
> some places on the order of several 10's of millions
> of records;
> postgres is not limiting in this respect.
>
> Sean
>

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Non-Unique intems

От

Sean Davis

Дата:

28 марта 2006 г., 21:55:21

Srinivas Iyyer wrote:
> Hi Sean and group,
>  thank you for your help. It worked.
> However, I guess I stepped on a land mine of unique
> and non-unique items.
>
> Here is the problem:
> (Example data) I have table A:
>
> seq_id   seq_name
> 123        ITAM3
> 234        ITAR
>
>
> Table B:
>
> spot_id  seq_id   image_name
> --------------------------------------------
> 849343    123       IMAGE: 12335
> 1348238   234       IMAGE: 12335
>
>
>
>
> Table C:
>
> exp_id |  spot_id  | spot_value
> -------|-----------|-----------
>
> Data to insert into Table C
> IMAGE: 12335   98.03344
>
>
>
>
>
>
> Here the log of query:
>
> arraydb=# SELECT spotanno_id from spotanno
> arraydb-# where spotanno_imageid = 'IMAGE:755402';
>  spotanno_id
> -------------
>       849343
>      1348238
> (2 rows)
>
> arraydb=# select * from spotanno where spotanno_id =
> 849343;
>  spotanno_id | seq_id | spotanno_imageid
> -------------+--------+------------------
>       849343 |  75343 | IMAGE:755402
> (1 row)
>
> arraydb=# select * from spotanno where spotanno_id =
> 1348238;
>  spotanno_id | seq_id | spotanno_imageid
> -------------+--------+------------------
>      1348238 |  50475 | IMAGE:755402
> (1 row)
>
> arraydb=# select * from seqdump where seq_id = 50475;
>  seq_id |  seq_acc  | seq_name
> --------+-----------+----------
>   50475 | NM_005501 | ITGA3
> (1 row)
>
> arraydb=# select * from seqdump where seq_id = 75343;
>  seq_id |  seq_acc  | seq_name
> --------+-----------+----------
>   75343 | NM_002204 | ITGA3
> (1 row)
>
>
> An instance of row of the data file that to be
> uploaded:
>
>
> IMAGE:755402    0.299781845119261
> 12.3638881597060
>
>
>
> The question:
> when I have a non-unique item (viz. IMAGE:755402 )
> what is the approach one should generally take.
>
> Do you have any suggestions/solution. Please help me.
>
> Thanks again.
>
> -sri

Sri,

Unfortunately, the biological data that you are working with has
one-to-many and many-to-many relationships.  While one would like to
believe that there should not be such relationships, there are.
Therefore, you need to store the data in a manner that respects those
manifold relationships.  In other words, store the data in a table with
whatever is the primary key (in this case, it looks like an IMAGE ID)
and store the annotation separately, allowing for a one-to-many
relationship between IMAGE ID and gene.  There is no way around this and
to try to eliminate these "non-unique" situations in this particular
case won't be possible; instead, you have to understand where the data
are coming from and design your database to match, not the other way
around.

Sean

Re: Non-Unique intems

От

"Daniel T. Staal"

Дата:

29 марта 2006 г., 10:16:40

On Tue, March 28, 2006 8:54 pm, Sean Davis said:

> Unfortunately, the biological data that you are working with has
> one-to-many and many-to-many relationships.  While one would like to
> believe that there should not be such relationships, there are.
> Therefore, you need to store the data in a manner that respects those
> manifold relationships.  In other words, store the data in a table with
> whatever is the primary key (in this case, it looks like an IMAGE ID)
> and store the annotation separately, allowing for a one-to-many
> relationship between IMAGE ID and gene.  There is no way around this and
> to try to eliminate these "non-unique" situations in this particular
> case won't be possible; instead, you have to understand where the data
> are coming from and design your database to match, not the other way
> around.

Summary: It is easier to design databases than to design data. ;)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Re: Does a connection support multiple transactions.

От

george young

Дата:

30 марта 2006 г., 11:09:22

On Mon, 27 Mar 2006 11:28:52 -0800
johnf <jfabiani@yolo.com> threw this fish to the penguins:

> Suse 10.0 using the Free Pascal Compiler connection version 2.1.1.
>
> I have opened a connection to a database (FPC uses the postgres c interface).
> When I attempt to make two queries using the same connection I get an error
> that suggest I have to close the first query (it actually sends a 'begin').
> The question is NOT how Pascal works but if Postgres will accept multiple
> queries via one connection or do queries have to be serial. Will postgres
> allow two queries to overlap?  I think postgres will allow nested
> 'begins' (not sure about that).   I'm not sure I'm asking the question
> correctly.  But in the windows world I only open one connection (normally via
> ODBC) and can make several queries (one right after the other and sometimes
> in the same query i.e 'select * from customers;select * from contacts').
> Each of the queries are live and I'm able to update if required.  I have to
> save the data with a commit but I still can have the two queries available.

A transaction starts with a "begin" statement and concludes with
either an "commit" or "rollback".  Postgres does not (yet)
support nested transactions, so you must end the current transaction
before starting a new one.  Within a transaction you may certainly
execute many selects, updates, inserts, deletes, etc.

I don't know what the pascal DB API looks like, so its hard to discern
exactly what your problem is.  Could you post some specific code, and
explain what your intent is, and what doesn't work?  Also, please
always mention your postgres version and platform to help people help
you.

I seem to recall encountering some API that did not allow multiple ";"
separated statements in one call.  But any API must allow consecutive
calls within a transaction.

-- George Young

> I'm sure this question is not very clear but I hope some guru will figure it
> out.
>
>
> Thanks
>
> John
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--
"Are the gods not just?"  "Oh no, child.
What would become of us if they were?" (CSL)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Does a connection support multiple transactions.