Re: Slow index scan on B-Tree index over timestamp field

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Slow index scan on B-Tree index over timestamp field
Дата
Msg-id CAMkU=1wU4trachr2jnTaZMbV97KQEMFe91C57A7qXX6MoRg+sw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Slow index scan on B-Tree index over timestamp field  (Caio Casimiro <casimiro.listas@gmail.com>)
Ответы Re: Slow index scan on B-Tree index over timestamp field
Список pgsql-performance
On Mon, Nov 4, 2013 at 12:44 PM, Caio Casimiro <casimiro.listas@gmail.com> wrote:
Thank you very much for your answers guys!


On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro.listas@gmail.com> wrote:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
            WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
            (SELECT followed_id FROM relationship WHERE follower_id = N) ORDER BY tt.tweet_id;


I don't know if this affects the plan at all, but it is silly to do a left join to "tweet" when the WHERE clause has conditions that can't be satisfied with a null row.  Also, you could try changing the IN-list to an EXISTS subquery.

I'm sorry the ignorance, but I don't understand the issue with the left join, could you explain more?

A left join means you are telling it to make up an all-NULL tweet row for any tweet_topic that does not have a corresponding tweet.  But then once it did so, it would just filter out that row later, because the null creation_time and user_id cannot pass the WHERE criteria--so doing a left join can't change the answer, but it can fool the planner into making a worse choice.
 


Is there some patterns to D1 and D2 that could help the caching?  For example, are they both usually in the just-recent past?
The only pattern is that it is always a one day interval, e.g. D1 = '2013-05-01' and  D2 = '2013-05-02'.

If you only compare creation_time to dates, rather than ever using date+time, then it would probably be better to store them in the table as date, not timestamp.  This might make the index smaller, and can also lead to better estimates and index usage.

But why would you want to offer suggestions to someone based on tweets that were made on exactly one day, over 5 months ago?  I can see why would want a brief period in the immediate past, or a long period; but a brief period that is not the recent past just seems like a strange thing to want to do.  (And it is going to be hard to get good performance with that requirement.)
 
Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: postgresql recommendation memory
Следующее
От: Caio Casimiro
Дата:
Сообщение: Re: Slow index scan on B-Tree index over timestamp field