Re: large xml database
От | Rob Sargent |
---|---|
Тема | Re: large xml database |
Дата | |
Msg-id | 4CCDDEE9.9010803@gmail.com обсуждение исходный текст |
Ответ на | Re: large xml database (Viktor Bojović <viktor.bojovic@gmail.com>) |
Ответы |
Re: large xml database
|
Список | pgsql-sql |
Viktor Bojović wrote: > > > On Sun, Oct 31, 2010 at 9:42 PM, Rob Sargent <robjsargent@gmail.com > <mailto:robjsargent@gmail.com>> wrote: > > > > > Viktor Bojovic' wrote: > > > > On Sun, Oct 31, 2010 at 2:26 AM, James Cloos > <cloos@jhcloos.com <mailto:cloos@jhcloos.com> > <mailto:cloos@jhcloos.com <mailto:cloos@jhcloos.com>>> wrote: > > >>>>> "VB" == Viktor Bojovic' <viktor.bojovic@gmail.com > <mailto:viktor.bojovic@gmail.com> > > <mailto:viktor.bojovic@gmail.com > <mailto:viktor.bojovic@gmail.com>>> writes: > > VB> i have very big XML documment which is larger than 50GB and > want to > VB> import it into databse, and transform it to relational > schema. > > Were I doing such a conversion, I'd use perl to convert the > xml into > something which COPY can grok. Any other language, script > or compiled, > would work just as well. The goal is to avoid having to > slurp the > whole > xml structure into memory. > > -JimC > -- > James Cloos <cloos@jhcloos.com <mailto:cloos@jhcloos.com> > <mailto:cloos@jhcloos.com <mailto:cloos@jhcloos.com>>> > > OpenPGP: 1024D/ED7DAEA6 > > > The insertion into dabase is not very big problem. > I insert it as XML docs, or as varchar lines or as XML docs in > varchar format. Usually i use transaction and commit after > block of 1000 inserts and it goes very fast. so insertion is > over after few hours. > But the problem occurs when i want to transform it inside > database from XML(varchar or XML format) into tables by parsing. > That processing takes too much time in database no matter if > it is stored as varchar lines, varchar nodes or XML data type. > > -- > --------------------------------------- > Viktor Bojovic' > > --------------------------------------- > Wherever I go, Murphy goes with me > > > Are you saying you first load the xml into the database, then > parse that xml into instance of objects (rows in tables)? > > > Yes. That way takes less ram then using twig or simple xml, so I tried > using postgre xml functions or regexes. > > > > -- > --------------------------------------- > Viktor Bojović > --------------------------------------- > Wherever I go, Murphy goes with me Is the entire load a set of "entry" elements as your example contains? This I believe would parse nicely into a tidy but non-trivial schema directly without the "middle-man" of having xml in db (unless of course you prefer xpath to sql ;) ) The single most significant caveat I would have for you is Beware: Biologists involved. Inconsistency (at least overloaded concepts) almost assured :). EMBL too is suspect imho, but I've been out of that arena for a while.
В списке pgsql-sql по дате отправления: