Patch for SortSupport implementation on inet/cdir

Поиск

Список

Период

Сортировка

От	Brandur Leach
Тема	Patch for SortSupport implementation on inet/cdir
Дата	8 февраля 2019 г. 18:48:40
Msg-id	CABR_9B-PQ8o2MZNJ88wo6r-NxW2EFG70M96Wmcgf99G6HUQ3sw@mail.gmail.com обсуждение исходный текст
Ответы	Re: Patch for SortSupport implementation on inet/cdir (Brandur Leach <brandur@mutelight.org>) Re: Patch for SortSupport implementation on inet/cdir (Edmund Horner <ejrh00@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi list,

I've attached a patch that implements SortSupport for the

inet/cidr types. It has the effect of typically reducing

the time taken to sort these types by ~50-60% (as measured

by `SELECT COUNT(DISTINCT ...)` which will carry over to

common operations like index creation, `ORDER BY`, and

`DISTINCT`.

As with other SortSupport implementations, this one

proposes an abbreviated key design that packs in as much

sorting-relevant information into the available datum as

possible while remaining faithful to the existing sorting

rules for the types. The key format depends on datum size

and whether working with IPv4 or IPv6. In most cases that

involves storing as many netmask bits as we can fit, but we

can do something a little more complete with IPv4 on 64-bit

— because inet data is small and the datum is large, we

can store enough information for a successful key-only

comparison in the majority of cases. Precise details

including background and bit-level diagrams are included in

the comments of the attached patch.

I've tried to take a variety of steps to build confidence

that the proposed SortSupport-based sort is correct:

1. It passes the existing inet regression suite (which was

pretty complete already).

2. I added a few new test cases the suite, specifically

trying to target edge cases like the minimums and

maximums of each abbreviated key bit "boundary". The new

cases were run against the master branch to double-check

that they're right.

3. I've added a variety of assertions throughout the

implementation to make sure that each step is seeing

inputs with expected parameters, and only manipulates

the bits that it's supposed to be manipulating.

4. I built large random data sets (10M rows) and sorted

them against a development build to try and trigger the

aforementioned assertions.

5. I built an index on 10M values and ran amcheck against

it.

6. I wrote some unit tests to verify that the bit-level

representation of the new abbreviated keys are indeed

what we expect. They're available here [3]. I didn't

include them in the patch because I couldn't find an

easy way to produce an expected `.out` file for a 32-bit

machine (experiments building with `setarch` didn't

succeed). They might be overkill anyway. I can continue

to pursue getting them working if reviewers think they'd

be useful.

My benchmarking methodology and script is available here

[1], and involves gathering statistics for 100

`count(distinct ...)` queries at various data sizes. I've

saved the results I got on my machine here [2].

Hat tip to Peter Geoghegan for advising on an appropriate

abbreviated key design and helping answer many questions

along the way.

Brandur

Вложения

SortSupport-for-inet-cidr-types-v1.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alexey Kondratov
Дата: 08 февраля 2019 г., 18:30:18
Сообщение: Re: [Patch] pg_rewind: options to use restore_command fromrecovery.conf or command line

Следующее

От: Tom Lane
Дата: 08 февраля 2019 г., 19:15:39
Сообщение: Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Patch for SortSupport implementation on inet/cdir

Вложения

Предыдущее

Следующее