Partitioning with a lot of number of partitions
От | Aleksej Trofimov |
---|---|
Тема | Partitioning with a lot of number of partitions |
Дата | |
Msg-id | 4E959F8A.30009@ruptela.lt обсуждение исходный текст |
Ответы |
Re: Partitioning with a lot of number of partitions
|
Список | pgsql-novice |
Hello, I'm novice in using postgres, so please be patient =) By the moment we are using PostgreSQL 9.1.0. Our database is about 270GB and currently we have a problem with our "hot" table which is about 120GB of data and 20GB index. By "hot" I mean, that we have a lot of insert/update/select operations on this table (about 30/30/120 per second). This hot table is called coordinates. In our logic we have several definitions: Coordinate - basic packet of data; Object - basic unit, which sends coordinates; Client - client which own object. One client may has a lot of object. and each object may has a lot of coordinates. All coordinates are stored in coordinates table. So by the moment we have an issue with slow insert/select queries... this why we decided to partition this table. We have tested 2 types of partitioning: By date (weekly), by object; This are our results: SELECT QUERY (1 object): ----------------------------------------------------- Partition type | M | W | D | ----------------------------------------------------- Non-partitioned | 5830 | 460 | 2913 | Date(weekly) | 1000 | 440 | 106 | Object | 0.02 | 0.03 | 0.009 | ------------------------------------------------------ M - query for month period W - query for week period D - query for day period All times in ms. So according to our results, we have much greater performance using partitioning by Object, BUT we have a lot of object (about 5000, and planning to connect 2-5 times bigger object count next year). This plans leads to the firs question: 1) How postgres would "like" such a big count of partitions? Will it decrease performance in future? By the moment we noticed several performance issues with the big count of object id in such conditional statement: SELECT * FROM coordinates_object.coordinates WHERE object_id in (HERE ARE SEVERAL OBJECTS); EXPLAIN ANALYZE: EXPLAIN ANALYZE SELECT * FROM coordinates_object.coordinates WHERE datetime between '2011-07-01' and '2011-07-02' AND object_id in (722::bigint, 728::bigint, 727::bigint, 248::bigint); Returns: "Result (cost=0.00..1887.23 rows=1455 width=655) (actual time=49.266..695.483 rows=908 loops=1)" " -> Append (cost=0.00..1887.23 rows=1455 width=655) (actual time=49.264..694.995 rows=908 loops=1)" " -> Seq Scan on coordinates (cost=0.00..0.00 rows=1 width=314) (actual time=0.001..0.001 rows=0 loops=1)" " Filter: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone) AND (object_id = ANY ('{722,728,727,248}'::bigint[])))" " -> Bitmap Heap Scan on coordinates_722 coordinates (cost=10.90..451.49 rows=258 width=654) (actual time=49.261..195.291 rows=166 loops=1)" " Recheck Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))" " -> Bitmap Index Scan on coordinates_722_datetime_index (cost=0.00..10.83 rows=258 width=0) (actual time=49.237..49.237 rows=166 loops=1)" " Index Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " -> Bitmap Heap Scan on coordinates_728 coordinates (cost=13.00..568.65 rows=463 width=656) (actual time=108.833..203.139 rows=331 loops=1)" " Recheck Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))" " -> Bitmap Index Scan on coordinates_728_datetime_index (cost=0.00..12.88 rows=463 width=0) (actual time=61.914..61.914 rows=332 loops=1)" " Index Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " -> Bitmap Heap Scan on coordinates_727 coordinates (cost=19.56..824.53 rows=713 width=651) (actual time=147.295..251.406 rows=370 loops=1)" " Recheck Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))" " -> Bitmap Index Scan on coordinates_727_datetime_index (cost=0.00..19.38 rows=713 width=0) (actual time=125.261..125.261 rows=370 loops=1)" " Index Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " -> Index Scan using coordinates_248_datetime_index on coordinates_248 coordinates (cost=0.00..42.56 rows=20 width=777) (actual time=0.017..44.991 rows=41 loops=1)" " Index Cond: ((datetime >= '2011-07-01 00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp without time zone))" " Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))" "Total runtime: 695.963 ms" 2) As you see from EXPLAIN, postgres is using array for filtering Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))", why? 3) Last question would be about some "unpredictable" behavior on such query: EXPLAIN ANALYZE SELECT * FROM coordinates_object.coordinates WHERE datetime between '2011-07-01' and '2011-07-02' AND object_id in (SELECT id FROM object WHERE client_id=318); Using such query, postgres is trying to Bitmap Heap Scan on every index of all partitions (Remember we have 5000 partitions). Such a query makes our server think a lot =). So if it possible please judge our decision and offer some other solutions...
В списке pgsql-novice по дате отправления: