Re: Macro customizable hashtable / bitmapscan & aggregation perf

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Macro customizable hashtable / bitmapscan & aggregation perf
Дата
Msg-id 64bb428d-8e05-9130-5627-fac7e85a68aa@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Macro customizable hashtable / bitmapscan & aggregation perf  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 10/10/2016 01:38 AM, Andres Freund wrote:
> Hi,
>
> Attached is an updated version of the patchset. The main changes are
> - address most of Tomas' feedback
> - address regression test output changes by adding explicit ORDER BYs,
>   in a separate commit.
> - fix issues around hash tables sized up to PG_UINT32_MAX
> - fix a bug in iteration with "concurrent" deletions
>
>>> We didn't really for dynahash (as it basically assumed a fillfactor of 1
>>> all the time), not sure if changing it won't also cause issues.
>>>
>>
>> That's a case of glass half-full vs. half-empty, I guess. If we assumed load
>> factor 1.0, then I see it as accounting for load factor (while you see it as
>> not accounting of it).
>
> Well, that load factor is almost never achieved, because we'd have grown
> since...  I added a comment about disregarding fill factor and growth
> policy to estimate_hashagg_tablesize, which is actually the place where
> it'd seemingly make sense to handle it.
>
> Tomas, did you have time to run a benchmark?
>

Yes, I've done a bunch of TPC-H and TPC-DS tests on the patch, but it
took quite a bit of time. These tests were done on a fairly small
machine (the usual i5-2500k with 8GB of RAM), with only 1GB data sets
for both benchmarks, to keep it completely in memory as I presume once
we start hitting I/O, it becomes the dominant part.

The machine was tuned a bit (shared_buffers=1GB, work_mem=256MB). There
was no parallelism enabled, and the tests were done first with a single
client and then with 4 clients running the queries for 4 hours.

I plan to run the tests on some larger machine (with more RAM, allowing
to use larger data sets while still keeping it in memory). But the
machine is busy doing something else, so it'll have to wait.

I didn't have time to do any sort of in-depth analysis (and I don't
expect to have the time in foreseeable future) - in particular I have
not looked at execution plans or anything like that. But let me show
some quick summary:


TPC-H (tpch.ods)
----------------

The results are either neutral or slightly positive - both in the case
of single stream (summary) and 4 parallel streams (summary-4-streams).
BTW I've happened to run the single stream tests twice while debugging
some tooling issues, so that's why there are two sets of columns.

Each stream executed ~9200 queries in total, which means ~450 executions
for each query template. So, a lot of data, and the results look quite
stable (despite the query parameters being random).

There are a few queries that got a tiny bit (~3%) slower, but most of
those queries are very short anyway (0.1s) making them susceptible to
noise. On the other hand, there are a few queries that got ~10% faster,
which is nice. It's not something that would significantly change our
TPC-H scores, but not bad either.


TPC-DS (tpcds.ods)
------------------

In this case, I'd say the results are less convincing. There are quite a
few queries that got slower by ~10%, which is well above - for example
queries 22 and 67. There are of course queries that got ~10% faster, and
in total the patched version executed more queries (so overall the
result is slightly positive, but not significantly).


regards
Tomas

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Macro customizable hashtable / bitmapscan & aggregation perf
Следующее
От: Noah Misch
Дата:
Сообщение: Re: Forbid use of LF and CR characters in database and role names