Re: Parallel copy
От | Tomas Vondra |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | 20200225160051.q7df3mibkguubnwf@development обсуждение исходный текст |
Ответ на | Re: Parallel copy (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Parallel copy
Re: Parallel copy |
Список | pgsql-hackers |
On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote: >Hi, > >On 2020-02-19 11:38:45 +0100, Tomas Vondra wrote: >> I generally agree with the impression that parsing CSV is tricky and >> unlikely to benefit from parallelism in general. There may be cases with >> restrictions making it easier (e.g. restrictions on the format) but that >> might be a bit too complex to start with. >> >> For example, I had an idea to parallelise the planning by splitting it >> into two phases: > >FWIW, I think we ought to rewrite our COPY parsers before we go for >complex schemes. They're way slower than a decent green-field >CSV/... parser. > Yep, that's quite possible. > >> The one piece of information I'm missing here is at least a very rough >> quantification of the individual steps of CSV processing - for example >> if parsing takes only 10% of the time, it's pretty pointless to start by >> parallelising this part and we should focus on the rest. If it's 50% it >> might be a different story. Has anyone done any measurements? > >Not recently, but I'm pretty sure that I've observed CSV parsing to be >way more than 10%. > Perhaps. I guess it'll depend on the CSV file (number of fields, ...), so I still think we need to do some measurements first. I'm willing to do that, but (a) I doubt I'll have time for that until after 2020-03, and (b) it'd be good to agree on some set of typical CSV files. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: