Queries runs slow on GPU with PG-Strom

Поиск
Список
Период
Сортировка
От YANG
Тема Queries runs slow on GPU with PG-Strom
Дата
Msg-id BLU436-SMTP200807E5D5EABD07576C20C1830@phx.gbl
обсуждение исходный текст
Ответы Re: Queries runs slow on GPU with PG-Strom  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Список pgsql-hackers
Hello,

I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was

with pg_strom
=============

explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
                                                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=190993.70..190993.71 rows=1 width=0) (actual time=18792.236..18792.236 rows=1 loops=1)  ->  Custom Scan
(GpuPreAgg) (cost=7933.07..184161.18 rows=86 width=108) (actual time=4249.656..18792.074 rows=77 loops=1)
Bulkload:On (density: 100.00%)        Reduction: NoGroup        Device Filter: (sqrt((((x - '25.6'::double precision) ^
'2'::doubleprecision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < '10'::double precision)        ->
CustomScan (BulkScan) on t0  (cost=6933.07..182660.32 rows=10000060 width=0) (actual time=139.399..18499.246
rows=10000000loops=1)Planning time: 0.262 msExecution time: 19268.650 ms
 
(8 rows)



explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
                                                                   QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------HashAggregate
(cost=298541.48..298541.81 rows=26 width=12) (actual time=11311.568..11311.572 rows=26 loops=1)  Group Key: t0.cat  ->
CustomScan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52) (actual time=3304.727..11310.021 rows=2307
loops=1)       Bulkload: On (density: 100.00%)        Reduction: Local + Global        ->  Custom Scan (GpuJoin)
(cost=4178.82..248541.18rows=10000060 width=12) (actual time=923.417..2661.113 rows=10000000 loops=1)
Bulkload:On (density: 100.00%)              Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = aid),
nrows_ratio:1.00000000              ->  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60 rows=10000060 width=16)
(actualtime=6.980..871.431 rows=10000000 loops=1)              ->  Seq Scan on t1  (cost=0.00..734.00 rows=40000
width=4)(actual time=0.204..7.309 rows=40000 loops=1)Planning time: 47.834 msExecution time: 11355.103 ms
 
(12 rows)


without pg_strom
================

test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
                                             QUERY PLAN
 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=426193.03..426193.04 rows=1 width=0) (actual time=3880.379..3880.379 rows=1 loops=1)  ->  Seq Scan on t0
(cost=0.00..417859.65rows=3333353 width=0) (actual time=0.075..3859.200 rows=314063 loops=1)        Filter: (sqrt((((x
-'25.6'::double precision) ^ '2'::double precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) <
'10'::doubleprecision)        Rows Removed by Filter: 9685937Planning time: 0.411 msExecution time: 3880.445 ms
 
(6 rows)

t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
            QUERY PLAN
 

------------------------------------------------------------------------------------------------------------------------------HashAggregate
(cost=431593.73..431594.05 rows=26 width=12) (actual time=4960.810..4960.812 rows=26 loops=1)  Group Key: t0.cat  ->
HashJoin  (cost=1234.00..381593.43 rows=10000060 width=12) (actual time=20.859..3367.510 rows=10000000 loops=1)
HashCond: (t0.aid = t1.aid)        ->  Seq Scan on t0  (cost=0.00..242858.60 rows=10000060 width=16) (actual
time=0.021..895.908rows=10000000 loops=1)        ->  Hash  (cost=734.00..734.00 rows=40000 width=4) (actual
time=20.567..20.567rows=40000 loops=1)              Buckets: 65536  Batches: 1  Memory Usage: 1919kB              ->
SeqScan on t1  (cost=0.00..734.00 rows=40000 width=4) (actual time=0.017..11.013 rows=40000 loops=1)Planning time:
0.567msExecution time: 4961.029 ms
 
(10 rows)



Here is the details how I installed pg_strom,

1. download postgresql 9.5alpha1 and compile it with
   ,----   | ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert   | make -j8 all   | make install
`----

2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)

3. download and compile pg_strom with pg_config in /export/pg-9.5/bin
       ,----       | make       | make install       `----


4. create a db with --no-local
       ,----       | initdb --no-local 9.5       `----

5. change postgresql.conf
       ,----       | shared_buffers=1GB       | shared_preload_libraries='pg_strom.so'       | logging_collector = on
   | log_filename='postgresql-%d.log'       | pg_strom.enabled=on       `----
 


6. start postgres
       ,----       | pg_ctl -D 9.5 start       `----
  and got the following outputs
       ,----       | LOG:  CUDA Runtime version: 7.0.0       | LOG:  NVIDIA driver version: 346.59       | LOG:  GPU0
QuadroK620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 2047MB (128bits, 900KHz), capability 5.0       | LOG:  NVRTC -
CUDARuntime Compilation vertion 7.0       | LOG:  redirecting log output to logging collector process       | HINT:
Futurelog output will appear in directory "pg_log".       `----
 



7. import testdb
       ,----       | createdb test       | psql test < ~/devel/pg_strom/test/testdb.sql       | psql test -c 'create
extensionpg_strom'       `----
 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ildus Kurbangaliev
Дата:
Сообщение: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: [PROPOSAL] VACUUM Progress Checker.