Re: Slow index scan on B-Tree index over timestamp field
От | Elliot |
---|---|
Тема | Re: Slow index scan on B-Tree index over timestamp field |
Дата | |
Msg-id | 52781027.7080307@gmail.com обсуждение исходный текст |
Ответ на | Re: Slow index scan on B-Tree index over timestamp field (Caio Casimiro <casimiro.listas@gmail.com>) |
Ответы |
Re: Slow index scan on B-Tree index over timestamp field
(Caio Casimiro <casimiro.listas@gmail.com>)
|
Список | pgsql-performance |
On 2013-11-04 16:10, Caio Casimiro wrote:
Yes, because that part of the query is kicking back so many rows, many of which are totally unnecessary anyway - you're first getting all the tweets in a particular time range, then limiting them down to just users that are followed. Here's clarification on the approach I mentioned earlier. All you should really need are basic (btree) indexes on your different keys (tweet_topic.tweet_id, tweet.id, tweet.user_id, relationship.follower_id, relationship.followed_id). I also changed the left join to an inner join as somebody pointed out that your logic amounted to reducing the match to an inner join anyway.Hi Neyman, thank you for your answer.Unfortunately this query runs almost at the same time:Sort (cost=4877693.98..4877702.60 rows=3449 width=20) (actual time=25820.291..25821.845 rows=1640 loops=1)Sort Key: tt.tweet_idSort Method: quicksort Memory: 97kBBuffers: shared hit=1849 read=32788-> Nested Loop (cost=247.58..4877491.32 rows=3449 width=20) (actual time=486.839..25814.120 rows=1640 loops=1)Buffers: shared hit=1849 read=32788-> Hash Semi Join (cost=229.62..88553.23 rows=1681 width=8) (actual time=431.654..13209.159 rows=597 loops=1)Hash Cond: (t.user_id = relationship.followed_id)Buffers: shared hit=3 read=31870-> Index Scan using tweet_creation_time_index on tweet t (cost=0.57..83308.25 rows=1781234 width=16) (actual time=130.144..10037.764 rows=1759645 loops=1)Index Cond: ((creation_time >= '2013-05-05 00:00:00-03'::timestamp with time zone) AND (creation_time <= '2013-05-06 00:00:00-03'::timestamp with time zone))Buffers: shared hit=1 read=31867-> Hash (cost=227.12..227.12 rows=154 width=8) (actual time=94.365..94.365 rows=106 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 3kBBuffers: shared hit=2 read=3-> Index Only Scan using relationship_id on relationship (cost=0.42..227.12 rows=154 width=8) (actual time=74.540..94.101 rows=106 loops=1)Index Cond: (follower_id = 335093362)Heap Fetches: 0Buffers: shared hit=2 read=3-> Bitmap Heap Scan on tweet_topic tt (cost=17.96..2841.63 rows=723 width=20) (actual time=21.014..21.085 rows=3 loops=597)Recheck Cond: (tweet_id = t.id)Buffers: shared hit=1846 read=918-> Bitmap Index Scan on tweet_topic_pk (cost=0.00..17.78 rows=723 width=0) (actual time=15.012..15.012 rows=3 loops=597)Index Cond: (tweet_id = t.id)Buffers: shared hit=1763 read=632Total runtime: 25823.386 msI have noticed that in both queries the index scan on tweet_creation_time_index is very expensive. Is there anything I can do to make the planner choose a index only scan?
SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt
JOIN tweet AS t
ON tt.tweet_id = t.id
join relationship
on t.user_id = relationship.followed_id
WHERE creation_time BETWEEN 'D1' AND 'D2'
AND relationship.follower_id = N
ORDER BY tt.tweet_id
;
В списке pgsql-performance по дате отправления:
Предыдущее
От: Caio CasimiroДата:
Сообщение: Re: Slow index scan on B-Tree index over timestamp field