Re: newsfeed type query

Поиск
Список
Период
Сортировка
От Ladislav Lenart
Тема Re: newsfeed type query
Дата
Msg-id 55411268.1060008@volny.cz
обсуждение исходный текст
Ответ на Re: newsfeed type query  (Jonathan Vanasco <postgres@2xlp.com>)
Список pgsql-general
On 29.4.2015 18:54, Jonathan Vanasco wrote:
>
> On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:
>
>> Could you please explain to me the error(s) in my reasoning?
>
> Let me just flip your list in reverse... and add in some elements (marked with a *):
>
> posting  ts  context
> p60      60  friend
> p55      55 friend*
> p54      54 friend*
> p50      50  group
> p50      49  group*
> p50      49  group*
> p40      40  friend
> p30      30  group
> p20      20  friend
> p10      10  group
> p0        0  friend
>
> With the 2 limited subqueries, the results would be:
>     60F, 55F, 50G, 49G
>
> But the "most recent" data is
>     50F, 55F, 54F, 50G
>
> So we end up showing 49 which is less relevant than 54.

I would expect the overall query to return only 60F nad 55F as the most recent
data. No? You expect it to return 4 items when the LIMIT is only 2. Remember
that the overall query should be also ordered by ts and limited to 2.

I thought you want most recent items across all contexts and not 2 most recent
items from friends plus two most recent items from groups...


Ladislav Lenart


> In some situations this isn't much of an issue, but in others it is detrimental.
> For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven.  While
"friend"and "group" might be relatively close in time to one another, "system" or other events may be months old -- and
thatolder content gets pulled in with this style of query.   
>
> If you need to paginate the data and select the next 10 overall items, it gets even more complicated.
>
> IIRC, the best mix of performance and "product" that I've found is do something like this:
>
>     SELECT * FROM (
>         SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
>         UNION
>         SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
>     ) as unioned
>      order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0;
>
> by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work
andmemory usage (like a lot) 
> then, joining a few lists and sorting 20k (or even 100k) items is really cheap.
> the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 --
notthe combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still
showingthe right amount of content for people. 




В списке pgsql-general по дате отправления:

Предыдущее
От: John McKown
Дата:
Сообщение: New column modifier?
Следующее
От: Ramesh T
Дата:
Сообщение: