Re: Designing a DB for storing biological data
От | Rich Shepard |
---|---|
Тема | Re: Designing a DB for storing biological data |
Дата | |
Msg-id | alpine.LNX.2.11.1406141107100.17243@localhost обсуждение исходный текст |
Ответ на | Designing a DB for storing biological data (Damir Dezeljin <damir.dezeljin@dezo.org>) |
Ответы |
Re: Designing a DB for storing biological data
|
Список | pgsql-general |
On Sat, 14 Jun 2014, Damir Dezeljin wrote: > This is more a theoretical or better to say, conceptual question; still, I > hope to get some feed backs from you folks. Additionally this is going to > be a very long post :) off-topic: I asked a similar question on the MySQL > forum as I'm still undecided if going with PostgreSQL or MySQL << I'm > tempted at PostGIS. Postgres. You can link it to spatial attributes with postgis if you want. But, how to structure your database and tables is independent of the analytical methods you use. > I am designing a database for storing various biological and ecological > data. Although there is no clear dividing line, it is possible to group > the data into two groups, namely the measured (physical) and quantitative > (mostly biological) data; I uploaded both a data sample and an initial > draft of a DB model to this link > <https://www.dropbox.com/sh/9gm2ezwrwhkz6xv/AAB3koD6Xzi48-2BhIEdwmlZa>. Of course there's a clear division between biological, chemical, physical, geomorphical, and other data. I may be wrong, but this looks like a question from a student for a project or master's degree. > *Biological / quantitative data* You have many choices: EPA's STORET or EMAP projects, ITIS (International Taxonomic Identification System), or your own. Each type of biological data should have its own table; e.g., vegegation, mammals, birds, fish, benthic macroinvertebrates, microbes. Each table should have a compound key: site_id, sample_date, species. This makes each row unique, when possible. Otherwise. use a sequential key. The table I use for fish data has this schema: Table "public.fish" Column | Type | Modifiers -------------+-----------------------+----------- site | character varying(12) | not null sampdate | date | not null tclass | character(13) | not null torder | character varying(16) | not null tfamily | character varying(12) | not null tgenus | character varying(12) | not null tspecies | character varying(12) | not null tsubspecies | character varying(12) | common_name | character varying(32) | quant | integer | not null comments | text | basin | character varying(10) | stream | character varying(20) | Indexes: "fish_pkey" PRIMARY KEY, btree (site, sampdate, tspecies) The table I use for benthos has this schema: Table "public.benthos" Column | Type | Modifiers ---------------+-----------------------+----------------------------------- ------------------------ sampid | integer | not null default nextval('macroinv _sampid_seq'::regclass) site | character varying(12) | not null sampdate | date | not null tclass | character varying(20) | not null torder | character varying(32) | not null tfamily | character varying(50) | tgenus | character varying(32) | tspecies | character varying(20) | func_feed_grp | character varying(32) | quant | integer | not null comments | text | stream | character varying(20) | basin | character varying(10) | Indexes: "macroinv_pkey" PRIMARY KEY, btree (sampid) > *Measured / physical data* > - Physical quantities may be outside the detection range of the measured > instrument; in such a case, this needs to be recorded. I still do not have > a clear idea how to do it. NULL’s do not seem to be a good choice to mark > such data. Here, too, you have choices. You can incorrectly record and analyze chemical data below detection limits like EPA, states, and most regulatory permit holders do, or you can do it correctly. This is a schema I use for water quality data: Table "public.surfchem" Column | Type | Modifiers ----------+-----------------------+----------- site | character varying(20) | not null sampdate | date | not null param | character varying(32) | not null quant | real | ceneq1 | integer | floor | real | ceiling | real | stream | character varying(20) | basin | character varying(10) | Indexes: "surfchem_pkey" PRIMARY KEY, btree (site, sampdate, param) The key is the set (site, sampdate, param) so you can have multiple samples, each of a different chemical constituent, at the same location and date. (This works for air, soil, and other media chemistry, too). The concentrations must be in the same units (e.g., mg/L). The column named 'ceneq1' is a flag: when the quant value is below the analytical laboratory's reporting limit, set the flag to '1'; otherwise 0. Now you have a consistent way to identify rows with less-than/non-detect/censored values. Since reporting levels change as instruments become more sensitive and analysts become better trained, you can have multiple reporting limits in your data. That's where the 'floor' and 'ceiling' columns come in. This should get you started. Rich -- Richard B. Shepard, Ph.D. Applied Ecosystem Services, Inc. | Troutdale, OR 97060 USA www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863
В списке pgsql-general по дате отправления:
Предыдущее
От: Kevin GrittnerДата:
Сообщение: Re: Using pg_start_backup() and pg_stop_backup() - using 9.1.2.2