Re: Speed up COPY FROM text/CSV parsing using SIMD
От | Ants Aasma |
---|---|
Тема | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Дата | |
Msg-id | CANwKhkMnay=xrVNcuw45G+8nMAGkWee9KtFSGussZX8-16+zNg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
Список | pgsql-hackers |
On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > I have a couple of ideas that I was working on: > --- > > + * However, SIMD optimization cannot be applied in the following cases: > + * - Inside quoted fields, where escape sequences and closing quotes > + * require sequential processing to handle correctly. > > I think you can continue SIMD inside quoted fields. Only important > thing is you need to set last_was_esc to false when SIMD skipped the > chunk. There is a trick with doing carryless multiplication with -1 that can be used to SIMD process transitions between quoted/not-quoted. [1] This is able to convert a bitmask of unescaped quote character positions to a quote mask in a single operation. I last looked at it 5 years ago, but I remember coming to the conclusion that it would work for implementing PostgreSQL's interpretation of CSV. [1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76 -- Ants
В списке pgsql-hackers по дате отправления: