Re: [v9.5] Custom Plan API

Поиск
Список
Период
Сортировка
От Kouhei Kaigai
Тема Re: [v9.5] Custom Plan API
Дата
Msg-id 9A28C8860F777E439AA12E8AEA7694F8F9F993@BPXM15GP.gisp.nec.co.jp
обсуждение исходный текст
Ответ на Re: [v9.5] Custom Plan API  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
> 
> I haven't really paid any attention to PGStrom. Perhaps it's just that I
> missed it, but I would find it useful if you could direct me towards a
> benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling compared
> to traditional strategies already implemented in Postgres and other
> systems.
> 
Implementation of Hash-Join on GPU side is still under development.

Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.

EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.


postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
                                            QUERY PLAN
 

----------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=10003175757.67..10003175757.68 rows=1 width=0) (actual time=46648.635..46648.635 rows=1 loops=1)  ->  Seq Scan on
t1 (cost=10000000000.00..10003109091.00 rows=26666667 width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
 Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double
precision)))< 10::double precision)        Rows Removed by Filter: 77486186Planning time: 0.066 msTotal runtime:
46648.668ms
 
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
                                               QUERY PLAN
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=1274424.33..1274424.34 rows=1 width=0) (actual time=1784.729..1784.729 rows=1 loops=1)  ->  Custom (GpuScan) on
t1 (cost=10000.00..1207757.67 rows=26666667 width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)        Host
References:       Device References: x, y        Device Filter: (sqrt((((x - 20::double precision) ^ 2::double
precision)+ ((y - 20::double precision) ^ 2::double precision))) < 10::double precision)        Total time to load:
0.231ms        Avg time in send-mq: 0.027 ms        Max time to build kernel: 1.064 ms        Avg time of DMA send:
3.050ms        Total time of DMA send: 933.318 ms        Avg time of kernel exec: 5.117 ms        Total time of kernel
exec:1565.799 ms        Avg time of DMA recv: 0.086 ms        Total time of DMA recv: 26.289 ms        Avg time in
recv-mq:0.011 msPlanning time: 0.094 msTotal runtime: 1784.793 ms
 
(17 rows)


> If I wanted to make joins faster, personally, I would look at opportunities
> to optimize our existing hash joins to take better advantage of modern CPU
> characteristics. A lot of the research suggests that it may be useful to
> implement techniques that take better advantage of available memory
> bandwidth through techniques like prefetching and partitioning, perhaps
> even (counter-intuitively) at the expense of compute bandwidth. It's
> possible that it just needs to be explained to me, but, with respect,
> intuitively I have a hard time imagining that offloading joins to the GPU
> will help much in the general case. Every paper on joins from the last decade
> talks a lot about memory bandwidth and memory latency. Are you concerned
> with some specific case that I may have missed? In what scenario might a
> cost-based optimizer reasonably prefer a custom join node implemented by
> PgStrom, over any of the existing join node types? It's entirely possible
> that I simply missed relevant discussions here.
> 
If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kouhei Kaigai
Дата:
Сообщение: Re: [v9.5] Custom Plan API
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [v9.5] Custom Plan API