Re: Parallel copy
От | Tomas Vondra |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | 20200222002802.yew5buvrd2yrjkm6@development обсуждение исходный текст |
Ответ на | Re: Parallel copy (Ants Aasma <ants@cybertec.at>) |
Список | pgsql-hackers |
On Fri, Feb 21, 2020 at 02:54:31PM +0200, Ants Aasma wrote: >On Thu, 20 Feb 2020 at 18:43, David Fetter <david@fetter.org> wrote:> >> On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote: >> > I think the wc2 is showing that maybe instead of parallelizing the >> > parsing, we might instead try using a different tokenizer/parser and >> > make the implementation more efficient instead of just throwing more >> > CPUs on it. >> >> That was what I had in mind. >> >> > I don't know if our code is similar to what wc does, maytbe parsing >> > csv is more complicated than what wc does. >> >> CSV parsing differs from wc in that there are more states in the state >> machine, but I don't see anything fundamentally different. > >The trouble with a state machine based approach is that the state >transitions form a dependency chain, which means that at best the >processing rate will be 4-5 cycles per byte (L1 latency to fetch the >next state). > >I whipped together a quick prototype that uses SIMD and bitmap >manipulations to do the equivalent of CopyReadLineText() in csv mode >including quotes and escape handling, this runs at 0.25-0.5 cycles per >byte. > Interesting. How does that compare to what we currently have? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: