Increasing parallelism of queries while using file fdw and partitions

Поиск
Список
Период
Сортировка
От Patrick Mulrooney
Тема Increasing parallelism of queries while using file fdw and partitions
Дата
Msg-id aea30799-809c-f027-1a4b-0c03c7efa3f6@gmail.com
обсуждение исходный текст
Ответы Re: Increasing parallelism of queries while using file fdw andpartitions
Список pgsql-performance
Wondering if anyone had any thoughts on how to tweak my setup to get it 
to read many files at once instead of one at a time when using file fdw 
and partitions. We have a bunch of data tied up in files (each file > 4M 
rows, 5,000+ files per year) that I would like to be able to query 
directly using FDW. The files are genomic VCF format and I find that 
vcf_fdw ( https://github.com/ergo70/vcf_fdw ) works really well to read 
the data. We only want to be able to search the data as quickly as 
possible, no updates / deletes / ...

I gave an example below of the basic setup and the output of explain 
analyze. I get the same performance if I setup the table such that the 
thousands of files end up in one non-partitioned table or setup each 
file as it's own partition of the table.

I have tried increasing ( / decreasing ) the worker threads and workers, 
but don't see any change in the number of files open at any given time. 
I tried reducing the cost of parallel queries to force them to run, but 
can't get them to kick in.

Any ideas or anything I can try?

Thanks!

Pat

PostgreSQL:  PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
Multicorn: 1.3.5
VCF_FDW ( https://github.com/ergo70/vcf_fdw ) : 1.0.0


CREATE DATABASE variants;

CREATE EXTENSION multicorn;

CREATE SERVER multicorn_vcf FOREIGN DATA WRAPPER multicorn OPTIONS (wrapper 'vcf_fdw.VCFForeignDataWrapper');

CREATE SCHEMA vcf;

CREATE TABLE vcf.variants ( ..., species text, ... ) PARTITION BY LIST ( species );

CREATE FOREIGN TABLE vcf.human ( ... ) SERVER multicorn_vcf OPTIONS (basedir '/path', species 'human', suffix
'.vcf.gz');
ALTER TABLE vcf.variants ATTACH PARTITION vcf.human FOR VALUES IN ( 'human' );

CREATE FOREIGN TABLE vcf.dog ( ... ) SERVER multicorn_vcf OPTIONS (basedir '/path', species 'dog', suffix '.vcf.gz');
ALTER TABLE vcf.variants ATTACH PARTITION vcf.dog FOR VALUES IN ( 'dog' );

CREATE FOREIGN TABLE vcf.cat ( ... ) SERVER multicorn_vcf OPTIONS (basedir '/path', species 'cat', suffix '.vcf.gz');
ALTER TABLE vcf.variants ATTACH PARTITION vcf.cat FOR VALUES IN ( 'cat' );

* My real data repeats this 1000+ more times

EXPLAIN ( ANALYZE, BUFFERS ) SELECT * FROM vcf.variants WHERE chrom = '1' AND pos = 10120 LIMIT 1000;

On my real data I get the following results:
--------------------------
                                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=20.00..352020.00 rows=1000 width=347) (actual time=445.548..101709.307 rows=20 loops=1)
    ->  Append  (cost=20.00..3555200000.00 rows=10100000 width=347) (actual time=445.547..101709.285 rows=20 loops=1)
          ->  Foreign Scan on dog  (cost=20.00..3520000.00 rows=10000 width=352) (actual time=198.653..198.654 rows=0
loops=1)
                Filter: ((chrom = '1'::text) AND (pos = 10120))
          ->  Foreign Scan on cat  (cost=20.00..3520000.00 rows=10000 width=352) (actual time=111.840..111.840 rows=0
loops=1)
                Filter: ((chrom = '1'::text) AND (pos = 10120))
          ->  Foreign Scan on human  (cost=20.00..3520000.00 rows=10000 width=352) (actual time=135.050..138.534 rows=1
loops=1)
                Filter: ((chrom = '1'::text) AND (pos = 10120))
          ... repeats many more times for each partition
  Planning time: 613.815 ms
  Execution time: 101873.880 ms
(2024 rows)



В списке pgsql-performance по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Why Postgres doesn't use TID scan?
Следующее
От: anand086
Дата:
Сообщение: SQL Perfomance during autovacuum