Optimizing COPY with SIMD

Поиск

Список

Период

Сортировка

От	Neil Conway
Тема	Optimizing COPY with SIMD
Дата	2 июня 22:17:21
Msg-id	CAOW5sYb1HprQKrzjCsrCP1EauQzZy+njZ-AwBbOUMoGJHJS7Sw@mail.gmail.com обсуждение исходный текст
Ответы	Re: Optimizing COPY with SIMD Re: Optimizing COPY with SIMD
Список	pgsql-hackers

Дерево обсуждения

Inspired by David Rowley's work [1] on optimizing JSON escape processing with SIMD, I noticed that the COPY code could potentially benefit from SIMD instructions in a few places, eg:

(1) CopyAttributeOutCSV() has 2 byte-by-byte loops

(2) CopyAttributeOutText() has 1

(3) CopyReadLineText() has 1

(4) CopyReadAttributesCSV() has 1

(5) CopyReadAttributesText() has 1

Attached is a quick POC patch that uses SIMD instructions for case (1) above. For sufficiently large attribute values, this is a significant performance win. For small fields, performance looks to be about the same. Results on an M1 Macbook Pro.

======

neilconway=# select count(*), avg(length(a))::int, avg(length(b))::int, avg(length(c))::int from short_strings;
count | avg | avg | avg
--------+-----+-----+-----
524288 | 8 | 8 | 8
(1 row)

neilconway=# select count(*), avg(length(a))::int, avg(length(b))::int, avg(length(c))::int from long_strings;
count | avg | avg | avg
-------+-----+-----+-----
65536 | 657 | 657 | 657
(1 row)

master @ 8fea1bd541:

$ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
Time (mean ± σ): 2.027 s ± 0.075 s [User: 0.001 s, System: 0.000 s]
Range (min … max): 1.928 s … 2.207 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
Time (mean ± σ): 1.420 s ± 0.027 s [User: 0.001 s, System: 0.000 s]
Range (min … max): 1.379 s … 1.473 s 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
Time (mean ± σ): 546.0 ms ± 9.6 ms [User: 1.4 ms, System: 0.3 ms]
Range (min … max): 539.0 ms … 572.1 ms 10 runs

master + SIMD patch:

$ for i in ~/*.sql; do hyperfine --warmup 5 "./psql -f $i"; done
Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long-quotes.sql
Time (mean ± σ): 797.8 ms ± 19.4 ms [User: 0.9 ms, System: 0.0 ms]
Range (min … max): 770.0 ms … 828.5 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-long.sql
Time (mean ± σ): 732.3 ms ± 20.8 ms [User: 1.2 ms, System: 0.0 ms]
Range (min … max): 701.1 ms … 763.5 ms 10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-short.sql
Time (mean ± σ): 545.7 ms ± 13.5 ms [User: 1.3 ms, System: 0.1 ms]
Range (min … max): 533.6 ms … 580.2 ms 10 runs

======

Implementation-wise, it seems complex to use SIMD when encoding_embeds_ascii is true (which should be uncommon). In principle, we could probably still use SIMD here, but it would require juggling between the SIMD chunk size and sizes returned by pg_encoding_mblen(). For now, the POC patch falls back to the old code path when encoding_embeds_ascii is true.

Any feedback would be very welcome.

Cheers,

Neil

[1] https://www.postgresql.org/message-id/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andrew Dunstan
Дата: 02 июня, 21:39:02
Сообщение: Re: The xversion-upgrade test fails to stop server

Следующее

От: Andrew Dunstan
Дата: 02 июня, 22:47:29
Сообщение: Re: meson and check-tests

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Optimizing COPY with SIMD

Вложения

Предыдущее

Следующее