Обсуждение: dynamic object creation

Поиск
Список
Период
Сортировка

dynamic object creation

От
Indraneel Majumdar
Дата:
Hi,

I'm not sure if the subject line has been proper. I have this following
problem which I hope PostgreSQL can handle.

I'm converting a complex flatfile where records are arranged serially.
some fields are as 'n' times repeating blocks of multiple lines. Some
subfields within these are also 'n' time repeating blocks of multiple
lines. So in my main table I do not know (until at run time) how many
fields to create (same for any sub tables). How can I do this dynamically?

I tried using arrays, but retrieval from that is causing some problems. I
have already checked the array utilities in the contrib section and have
extended the operator list for other types (I'll send the file to it's
original author so that he may include it if he wishes).

I think there must be some object-oriented way of doing this without
creating too many keys. or are keys the only and best method? Using this
is causing a performance hit. If it's any help, what I'm trying to convert
are biological databases distributed in 'SRS' flatfile format from
ftp.ebi.ac.uk/pub/databases/

Thank you,
Indraneel

/************************************************************************.
# Indraneel Majumdar                  ¡  E-mail: indraneel@123india.com  #
# Bioinformatics Unit (EMBNET node),  ¡  URL: http://scorpius.iwarp.com  #
# Centre for DNA Fingerprinting and Diagnostics,                         #
# Hyderabad, India - 500076                                              #
`************************************************************************/



Re: dynamic object creation

От
Mark Volpe
Дата:
You may want to think about creating your table like this (for example):

CREATE TABLE data
(key text,field_type char,value text
);

CREATE UNIQUE INDEX data_key ON data(key, field_type, value);

So this way each "record" takes up several rows in the table, and each "field"
can take up as many rows as you need. A table like this, with two columns
being arrays:

key  | field1  | field2 
-------------------------
a    | [x,y,z] | [a,d,f]
b    | [m,n]   | (NULL)

Can be represented like this instead:

key  | field_type | value
-------------------------
a    | 1          | x
a    | 1          | y
a    | 1          | z
a    | 2          | a
a    | 2          | d
a    | 2          | f
b    | 1          | m
b    | 1          | n


I'm not sure what your data looks like, but I hope this helps.

Mark

Indraneel Majumdar wrote:
> 
> Hi,
> 
> I'm not sure if the subject line has been proper. I have this following
> problem which I hope PostgreSQL can handle.
> 
> I'm converting a complex flatfile where records are arranged serially.
> some fields are as 'n' times repeating blocks of multiple lines. Some
> subfields within these are also 'n' time repeating blocks of multiple
> lines. So in my main table I do not know (until at run time) how many
> fields to create (same for any sub tables). How can I do this dynamically?
> 
> I tried using arrays, but retrieval from that is causing some problems. I
> have already checked the array utilities in the contrib section and have
> extended the operator list for other types (I'll send the file to it's
> original author so that he may include it if he wishes).
> 
> I think there must be some object-oriented way of doing this without
> creating too many keys. or are keys the only and best method? Using this
> is causing a performance hit. If it's any help, what I'm trying to convert
> are biological databases distributed in 'SRS' flatfile format from
> ftp.ebi.ac.uk/pub/databases/
> 
> Thank you,
> Indraneel
> 
> /************************************************************************.
> # Indraneel Majumdar                  ¡  E-mail: indraneel@123india.com  #
> # Bioinformatics Unit (EMBNET node),  ¡  URL: http://scorpius.iwarp.com  #
> # Centre for DNA Fingerprinting and Diagnostics,                         #
> # Hyderabad, India - 500076                                              #
> `************************************************************************/


Re: dynamic object creation

От
Indraneel Majumdar
Дата:
Thanks, this is what I'm currently using and want to change from. This
table is taking a long time to insert data into and to extract. My
smallest table of this sort has 68000 rows. In comparison arrays are 10
times faster but lack the required tools to work with (especially for
multidimensional arrays, even after I have extended the operators in the
contrib section). Also I cannot put field names for individual columns of
an array.

\Indraneel

On Thu, 12 Oct 2000, Mark Volpe wrote:

> You may want to think about creating your table like this (for example):
> 
> CREATE TABLE data
> (
>     key text,
>     field_type char,
>     value text
> );
> 
> CREATE UNIQUE INDEX data_key ON data(key, field_type, value);
> 
> So this way each "record" takes up several rows in the table, and each "field"
> can take up as many rows as you need. A table like this, with two columns
> being arrays:
> 
> key  | field1  | field2 
> -------------------------
> a    | [x,y,z] | [a,d,f]
> b    | [m,n]   | (NULL)
> 
> Can be represented like this instead:
> 
> key  | field_type | value
> -------------------------
> a    | 1          | x
> a    | 1          | y
> a    | 1          | z
> a    | 2          | a
> a    | 2          | d
> a    | 2          | f
> b    | 1          | m
> b    | 1          | n
> 
> 
> I'm not sure what your data looks like, but I hope this helps.
> 
> Mark
> 

/************************************************************************.
# Indraneel Majumdar                  ¡  E-mail: indraneel@123india.com  #
# Bioinformatics Unit (EMBNET node),  ¡  URL: http://scorpius.iwarp.com  #
# Centre for DNA Fingerprinting and Diagnostics,                         #
# Hyderabad, India - 500076                                              #
`************************************************************************/



Re: dynamic object creation

От
Indraneel Majumdar
Дата:
The example is below:

On Thu, 12 Oct 2000, Jean-Christophe Boggio wrote:

> Hi Indraneel,
> 
> IM> I'm converting a complex flatfile where records are arranged serially.
> IM> some fields are as 'n' times repeating blocks of multiple lines. Some
> IM> subfields within these are also 'n' time repeating blocks of multiple
> IM> lines. So in my main table I do not know (until at run time) how many
> IM> fields to create (same for any sub tables). How can I do this dynamically?
> 
> IM> I tried using arrays, but retrieval from that is causing some problems. I
> IM> have already checked the array utilities in the contrib section and have
> IM> extended the operator list for other types (I'll send the file to it's
> IM> original author so that he may include it if he wishes).
> 
> IM> I think there must be some object-oriented way of doing this without
> IM> creating too many keys. or are keys the only and best method? Using this
> IM> is causing a performance hit. If it's any help, what I'm trying to convert
> IM> are biological databases distributed in 'SRS' flatfile format from
> IM> ftp.ebi.ac.uk/pub/databases/
> 
> Could you be more precise, give an example of 2 "variable-length"
> records ?

-------------example---------------

ID   ACP_DOMAIN; MATRIX.
AC   PS50075;
DT   NOV-1997 (CREATED); NOV-1997 (DATA UPDATE); JUL-1998 (INFO UPDATE).
DE   Acyl carrier protein phosphopantetheine domain profile.
MA   /GENERAL_SPEC: ALPHABET='ABCDEFGHIKLMNPQRSTVWYZ'; LENGTH=71;
MA   /DISJOINT: DEFINITION=PROTECT; N1=6; N2=66;
MA   /NORMALIZATION: MODE=1; FUNCTION=LINEAR; R1=2.3; R2=.02281121; TEXT='NScore';
MA   /CUT_OFF: LEVEL=0; SCORE=271; N_SCORE=8.5; MODE=1;
MA   /CUT_OFF: LEVEL=-1; SCORE=184; N_SCORE=6.5; MODE=1;
MA   /DEFAULT: D=-20; I=-20; B1=-80; E1=-80; MI=-105; MD=-105; IM=-105; DM=-105; MM=1; M0=-1;
NR   /RELEASE=38,80000;
NR   /TOTAL=173(116); /POSITIVE=173(116); /UNKNOWN=0(0); /FALSE_POS=0(0);
NR   /FALSE_NEG=0; /PARTIAL=3;
CC   /TAXO-RANGE=??EP?; /MAX-REPEAT=5;

----------------end---------------------------

here the row starting with /CUT_OFF may be present 'ni' times and the
values for N_SCORE may be present 'nj' times (both 'ni' and 'nj' are
unknown till at run time). Similiarly for /NORMALIZATION (which may be
present multiple times) and Rx (R1, R2, etc). Also for /DEFAULT line
(which may occur more than 1 time) and the values it contains.

I can use an array, but retrieving the data will be a problem. One method
might be to define my own types but then in the end how do I face the
problem that number of entries are variable? Again another array? Or do I
have to use a lot of tables with indexes (which will also be bulky and
slow)?

thanks,
Indraneel

> --
> Jean-Christophe Boggio
> cat@thefreecat.org
> Independant Consultant and Developer
> Delphi, Linux, Oracle, Perl
> 
> 

/************************************************************************.
# Indraneel Majumdar                  ¡  E-mail: indraneel@123india.com  #
# Bioinformatics Unit (EMBNET node),  ¡  URL: http://scorpius.iwarp.com  #
# Centre for DNA Fingerprinting and Diagnostics,                         #
# Hyderabad, India - 500076                                              #
`************************************************************************/