Re: Website mailing list search enhancement idea - one result per thread

Поиск
Список
Период
Сортировка
От Masahiro Ikeda
Тема Re: Website mailing list search enhancement idea - one result per thread
Дата
Msg-id 55df24092b8c1ffa44f382b6f3a681df@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Website mailing list search enhancement idea - one result per thread  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Website mailing list search enhancement idea - one result per thread  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Список pgsql-www
"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode.  For moderately long running threads a search term is going to 
> return
> many messages from the same thread.  It would desirable to ask that 
> only
> threads be shown and maybe how many individual messages in that thread
> matched.

+1. I always think so.

On 2020-05-01 01:55, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>>>> It would desirable to ask that only
>>>> threads be shown and maybe how many individual messages in that 
>>>> thread
>>>> matched.
> 
>>> +1, an option to do that would be handy.  No idea how hard it is ...
> 
>> Not having actually looked at it, probably not too hard. The hardest 
>> part
>> might be bikeshedding about the color (aka user interface).
> 
>> Should this replace the current interface, or do we want to make 
>> something
>> like this a choice?
> 
> No no, not replace.  It should be an option, "aggregate into threads"
> or something like that.

Why don't you implement the following feature?

(1) add an option on the advanced search page.

I agree with the idea to add the option to aggregate into threads
on the advanced search page.

I attached the image("add_option_to_aggregate_into_threads.png").
What do you think?


(2) change the message's URLs on the result pages if the option is 
checked.

I think it's better to change the message's URLs to flatted ones on 
result pages
if the messages are aggregated into threads because users want to access 
the
discussion pages not per messages but per threads directly.

```
--- a/templates/search/listsearch.html
+++ b/templates/search/listsearch.html
@@ -53,7 +59,12 @@
      <h2>Results {{firsthit}}-{{lasthit}} of {%if hitcount == 1000%}more 
than 1000{%else%}{{hitcount}}{%endif%}.</h2>
      {%if pagelinks %}Result pages: 
{{pagelinks|safe}}<br/><br/>{%endif%}
      {%for hit in hits %}
-      {{forloop.counter0|add:firsthit}}. <a 
href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> 
[{{hit.rank|floatformat:2}}]<br/>
+      {{forloop.counter0|add:firsthit}}.
+        {%if aggregate%}
+          <a 
href="https://www.postgresql.org/message-id/flat/{{hit.messageid}}">{{hit.subject}}</a> 
[{{hit.rank|floatformat:2}}]<br/>
+        {%else%}
+          <a 
href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a> 
[{{hit.rank|floatformat:2}}]<br/>
+        {%endif%}
        From {{hit.author}} on {{hit.date}}.<br/>
        {{hit.abstract|safe}}<br/>
        <a 

href="https://www.postgresql.org/message-id/{{hit.messageid}}">https://www.postgresql.org/message-id/{{hit.messageid}}</a><br/>
```


If someone already works for this, it's ok to ignore the following.
Although I'm not familiar with Django, I made the patches for pgarchives 
and pgweb.

- pgweb: "0001-wip-add-an-search-option-to-aggregate-into-threads.patch"

This patch changes the user interface the above (1) and (2).


- pgarchives: 
"0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch"

This patch changes the search query to the Postgresql. The example query 
is the following.
To aggregate into threads, it uses the window function to extract the 
highest-ranked
messages in the same thread group.


```
SELECT messageid, date, subject, _from, ts_rank_cd(fti, 
plainto_tsquery('public.pg', 'wal writer')),
        ts_headline(bodytxt, plainto_tsquery('public.pg', 'wal 
writer'),'StartSel="[[[[[[",StopSel="]]]]]]"')
FROM
   (SELECT messageid, date, subject, _from, fti, bodytxt,                 
                                                  -- add a sub query
           RANK() OVER (PARTITION BY threadid ORDER BY ts_rank_cd(fti, 
plainto_tsquery('wal writer')) DESC, id) AS rank    -- this line is main 
change
    FROM messages m
    WHERE
      fti @@ plainto_tsquery('public.pg', 'wal writer')
      AND EXISTS (SELECT 1 FROM list_threads lt WHERE 
lt.threadid=m.threadid AND lt.listid=ANY(ARRAY[2]))
      AND m.date > '2020-02-07T08:44:56.738979'::timestamp
   ) m1
WHERE m1.rank = 1
ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC
LIMIT 1000;
```

If you have any good ideas, please let me know.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Вложения

В списке pgsql-www по дате отправления:

Предыдущее
От: Sad Man
Дата:
Сообщение: Wiki editor request
Следующее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: Documentation building broken in CFBot