Re: TEXT column > 1Gb
От | Joe Carlson |
---|---|
Тема | Re: TEXT column > 1Gb |
Дата | |
Msg-id | 131DA24F-23D6-4EA1-816F-ED0E6E5A219D@lbl.gov обсуждение исходный текст |
Ответ на | Re: TEXT column > 1Gb (Rob Sargent <robjsargent@gmail.com>) |
Ответы |
Re: TEXT column > 1Gb
(Rob Sargent <robjsargent@gmail.com>)
|
Список | pgsql-general |
On Apr 12, 2023, at 12:21 PM, Rob Sargent <robjsargent@gmail.com> wrote:On 4/12/23 13:02, Ron wrote:Must the genome all be in one big file, or can you store them one line per table row?
A typical use case might be from someone interested in seeing if they can identify the regulatory elements (the on or off switches) of a gene. The protein coding part of a gene can be predicted pretty reliably, but the upstream untranslated region and regulatory elements are tougher. So they might come to our web site and want to extract the 5 kb bit of sequence before the start of the gene and look for some of the common motifs that signify a protein binding site. Being able to quickly pull out a substring of the genome to drive a web app is something we want to do quickly.
yeah. it’s basically a warehouse. Stick data in, but then make all the connections between the functional elements, their products and the predictions on the products. It’s definitely more than a document store and we require a relational database.
Imagine a disciplne where some beleaguered grad student has to get something out the door by the end of the term. It gets published and the rest of the community say GREAT! we have a standard! Then the abuse of the standard happens. People who specialize in bioinformatics know just enough computer science, statistics and molecular biology to annoy experts in three different fields.
Not sure what OP is doing with plant genomes (other than some genomics) but the tools all use files and pipeline of sub-tools. In and out of tuples would be expensive. Very,very little "editing" done in the usual "update table set val where id" sense.
Lines in a vcf file can have thousands of colums fo nasty, cryptic garbage data that only really makes sense to tools, reader. Highly denormalized of course. (Btw, I hate sequencing :) )
В списке pgsql-general по дате отправления: