Re: Experimenting with hash join prefetch

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Experimenting with hash join prefetch
Дата
Msg-id CA+hUKGKB=EWq+rj4NK8CX6ZqHZzuQYWSb9iDfYKzGN6-ejShUQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Experimenting with hash join prefetch  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: Experimenting with hash join prefetch  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sun, Oct 14, 2018 at 11:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > 14 окт. 2018 г., в 9:18, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):
> >
> > +               /* Prefetch the bucket for the next key */
> > +               uint32 next_hash = hash_uint32(DatumGetInt32(keyval) + 1);
> > +               uint32 next_bucket = next_hash % hashtable->nbuckets;
> > +               __builtin_prefetch(&hashtable->buckets.unshared[next_bucket]);
>
>
> +1, I also think that we should use __builtin_prefetch these days (years, actually).
> Exactly after listening Anastassia Ailamaki's (author of referenced paper) talk on VLDB I've proposed to do that for
B-tree[0], but did not really pursuit that idea. 

The above was obviously just a hard-coded hack that "knew" the next
key would be n + 1.  I've been wondering how you might abstract real
look-ahead with the shiny new TupleTableSlot design.  Here's a concept
I came up with: ExecScrollSlot(slot, 1) -> bool, to try to look ahead
to the next tuple if possible.  I suppose there could be a kind of
scrolling that actually consumes tuples (for true batch-mode tuple
processing in tight inner loops, for example hash table build), and
scrolling that merely peeks ahead (what I'm describing so far).  I'm
keen to hear other ideas about how that could look, because I know
that "vector" and "batch" mode tuple processing are ideas that people
have been bouncing around for a while.  Flame away.

POC patch attached.  I never got around to figuring out why it breaks
anti-joins (hence some regression test failures) or filling out
various other important details (see commit message for a list), but I
figured it was better on the mailing list than hiding in a drawer,
anyway.

Here is an example of times for a trivial join on my laptop.  Note
that this is prefetching only the probing phase, not while building
which should also be possible.  I didn't get around to trying deeper
prefetch pipelines as discussed earlier, but those seemed to produce
diminishing returns with hardcoded tests like in the earlier message.

shared_buffers = '3GB'

create table r as select generate_series(1, 40000000)::int i;
create table s as select generate_series(1, 10000000)::int i;
analyze;

set max_parallel_workers_per_gather = 0;
set work_mem = '2GB';
select pg_prewarm('r'::regclass);
select pg_prewarm('s'::regclass);

select count(*) from r join s using (i);

Master:  00:14.230
Patched: 00:11.818


--
Thomas Munro
https://enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Problem with default partition pruning
Следующее
От: Peifeng Qiu
Дата:
Сообщение: Re: Speed up build on Windows by generating symbol definition in batch