[GENERAL] Queries on very big table

Поиск
Список
Период
Сортировка
От Job
Тема [GENERAL] Queries on very big table
Дата
Msg-id 88EF58F000EC4B4684700C2AA3A73D7A08054EACC1D5@W2008DC01.ColliniConsulting.lan
обсуждение исходный текст
Ответы Re: [GENERAL] Queries on very big table  (Andy Colson <andy@squeakycode.net>)
Re: [GENERAL] Queries on very big table  (junior <drrtuy@yandex.ru>)
Re: [GENERAL] Queries on very big table  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-general
Hello guys and very good new year to everybody!

We are now approaching some queries and statistics on very big table (about 180 millions of record).
The table is partitioned by day (about ~3 Gb of data for every partition/day).
We use Postgresql 9.6.1

I am experiencing quite important slowdown on queries.
I manually made a "vacuum full" and a "reindex" on every partition in order to clean free space and reorder records.

I have a BRIN index on timestamp and index on other field (btree)

Starting by a simple query: explain analyze select count(domain) from webtraffic_archive:

                                                                                   QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=3220451.94..3220451.95 rows=1 width=8) (actual time=36912.624..36912.624 rows=1 loops=1)
   ->  Gather  (cost=3220451.52..3220451.93 rows=4 width=8) (actual time=36911.600..36912.614 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=3219451.52..3219451.53 rows=1 width=8) (actual time=36906.804..36906.804 rows=1
loops=5)
               ->  Append  (cost=0.00..3094635.41 rows=49926443 width=0) (actual time=4.716..31331.229 rows=39853988
loops=5)
                     ->  Parallel Seq Scan on webtraffic_archive  (cost=0.00..0.00 rows=1 width=0) (actual
time=0.001..0.001rows=0 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_01_01  (cost=0.00..10.47 rows=47 width=0)
(actualtime=0.000..0.000 rows=0 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_16  (cost=0.00..213728.26 rows=3498026
width=0)(actual time=4.713..2703.458 rows=2798421 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_17  (cost=0.00..201379.39 rows=3247739
width=0)(actual time=6.334..2364.726 rows=2598191 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_18  (cost=0.00..176248.86 rows=2824986
width=0)(actual time=7.437..2014.812 rows=2259989 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_19  (cost=0.00..177493.33 rows=2866433
width=0)(actual time=9.951..2145.958 rows=2293146 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_20  (cost=0.00..120271.83 rows=1960883
width=0)(actual time=0.011..372.092 rows=1568706 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_21  (cost=0.00..276391.94 rows=4485294
width=0)(actual time=5.386..3111.589 rows=3588235 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_22  (cost=0.00..287611.68 rows=4630668
width=0)(actual time=6.598..3335.834 rows=3704535 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_23  (cost=0.00..249047.61 rows=4014361
width=0)(actual time=7.206..2628.884 rows=3211489 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_24  (cost=0.00..192008.70 rows=3097370
width=0)(actual time=9.870..1882.826 rows=2477896 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_25  (cost=0.00..87385.16 rows=1405616
width=0)(actual time=0.018..427.248 rows=1124493 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_26  (cost=0.00..88262.80 rows=1436080
width=0)(actual time=0.014..277.327 rows=1148864 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_27  (cost=0.00..222607.43 rows=3557243
width=0)(actual time=8.497..1232.210 rows=2845795 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_28  (cost=0.00..210414.76 rows=3365676
width=0)(actual time=0.033..548.878 rows=2692541 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_29  (cost=0.00..185065.72 rows=2955872
width=0)(actual time=0.031..498.079 rows=2364697 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_30  (cost=0.00..149139.55 rows=2382656
width=0)(actual time=0.011..501.351 rows=1906124 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2016_12_31  (cost=0.00..166991.89 rows=2664288
width=0)(actual time=0.041..437.631 rows=2131431 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2017_01_01  (cost=0.00..79197.29 rows=1260930
width=0)(actual time=0.018..254.124 rows=1008744 loops=5) 
                     ->  Parallel Seq Scan on webtraffic_archive_day_2017_01_02  (cost=0.00..11378.74 rows=272274
width=0)(actual time=0.017..34.352 rows=130691 loops=5) 
 Planning time: 313.907 ms
 Execution time: 36941.700 ms

Other more complex queries are slower.

How can i improve it?
Records number can raise up until 1.000 millions.
Do i need a third-part tool for big data?

THANK YOU!
/F






В списке pgsql-general по дате отправления:

Предыдущее
От: vod vos
Дата:
Сообщение: [GENERAL] COPY: row is too big
Следующее
От: "Frank Millman"
Дата:
Сообщение: Re: [GENERAL] Difficulty modelling sales taxes