Re: TEXT column > 1Gb

Поиск

Список

Период

Сортировка

От	Joe Carlson
Тема	Re: TEXT column > 1Gb
Дата	13 апреля 2023 г. 00:03:36
Msg-id	131DA24F-23D6-4EA1-816F-ED0E6E5A219D@lbl.gov обсуждение исходный текст
Ответ на	Re: TEXT column > 1Gb (Rob Sargent <robjsargent@gmail.com>)
Ответы	Re: TEXT column > 1Gb
Список	pgsql-general

Дерево обсуждения

On Apr 12, 2023, at 12:21 PM, Rob Sargent <robjsargent@gmail.com> wrote:

On 4/12/23 13:02, Ron wrote:
Must the genome all be in one big file, or can you store them one line per table row?

The assumption in the schema I’m using is 1 chromosome per record. Chromosomes are typically strings of continuous sequence (A, C, G, or T) separated by gaps (N) of approximately known, or completely unknown size. In the past this has not been a problem since sequenced chromosomes were maybe 100 megabases. But sequencing is better now with the technology improvements and tackling more complex genomes. So gigabase chromosomes are common.

A typical use case might be from someone interested in seeing if they can identify the regulatory elements (the on or off switches) of a gene. The protein coding part of a gene can be predicted pretty reliably, but the upstream untranslated region and regulatory elements are tougher. So they might come to our web site and want to extract the 5 kb bit of sequence before the start of the gene and look for some of the common motifs that signify a protein binding site. Being able to quickly pull out a substring of the genome to drive a web app is something we want to do quickly.

Not sure what OP is doing with plant genomes (other than some genomics) but the tools all use files and pipeline of sub-tools. In and out of tuples would be expensive. Very,very little "editing" done in the usual "update table set val where id" sense.

yeah. it’s basically a warehouse. Stick data in, but then make all the connections between the functional elements, their products and the predictions on the products. It’s definitely more than a document store and we require a relational database.

Lines in a vcf file can have thousands of colums fo nasty, cryptic garbage data that only really makes sense to tools, reader. Highly denormalized of course. (Btw, I hate sequencing :) )

Imagine a disciplne where some beleaguered grad student has to get something out the door by the end of the term. It gets published and the rest of the community say GREAT! we have a standard! Then the abuse of the standard happens. People who specialize in bioinformatics know just enough computer science, statistics and molecular biology to annoy experts in three different fields.

В списке pgsql-general по дате отправления:

Предыдущее

От: Ron
Дата: 12 апреля 2023 г., 23:29:10
Сообщение: Re: TEXT column > 1Gb

Следующее

От: Rob Sargent
Дата: 13 апреля 2023 г., 00:29:34
Сообщение: Re: TEXT column > 1Gb

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: TEXT column > 1Gb

Предыдущее

Следующее