Обсуждение: pgarchives: strip angle brackets when checking for msgid search
In some email-clients, when you copy the Message-ID to the clipboard <br /> it surrounds in (or perhaps, doesn't strip)angle brackets.<br /> When pasting this into the mail archive search, search fails to find the message<br /> and thishappens often enough so as to be annoying. The fix is trivial.<br /><br /> Discussion of similar problem in the commitfestapp:<br /><br /><a class="moz-txt-link-rfc2396E" href="mailto:CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com"><CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com></a><br /><br/> patch attached.<br /><br /> Amir<br />
Amir Rohan wrote: > In some email-clients, when you copy the Message-ID to the clipboard > <br> > it surrounds in (or perhaps, doesn't strip) angle brackets.<br> > When pasting this into the mail archive search, search fails to find > the message<br> > and this happens often enough so as to be annoying. The fix is > trivial.<br> Sounds reasonable as a feature. Can't comment on the implemtation. Can you please not send HTML-only email? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/02/2015 09:53 PM, Alvaro Herrera wrote: > Amir Rohan wrote: >> >> In some email-clients, when you copy the Message-ID to the clipboard >> it surrounds in (or perhaps, doesn't strip) angle brackets. >> When pasting this into the mail archive search, search fails to find the message >> and this happens often enough so as to be annoying. The fix is trivial. >> >> Discussion of similar problem in the commitfest app: >> >> <CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com> >> >> patch attached. > > Sounds reasonable as a feature. Can't comment on the implemtation. > > Can you please not send HTML-only email? > I am mortified by all my recent email mishaps and had just switched email provider for that reason. Hopefully with this, they end. Amir
On Fri, Oct 2, 2015 at 7:29 PM, Amir Rohan <amir.rohan@zoho.com> wrote:
In some email-clients, when you copy the Message-ID to the clipboard
it surrounds in (or perhaps, doesn't strip) angle brackets.
When pasting this into the mail archive search, search fails to find the message
and this happens often enough so as to be annoying. The fix is trivial.
Discussion of similar problem in the commitfest app:
<CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com>
patch attached.
Interesting - we also had a similar fix in 9b7e9b59 for the archives, but that one only deals with the URLs and not the search box. Makes much sense to do it in the search box as well, so applied. Thanks!
Actually, before I sent this. It's also broken because it redirects to bad URLs if there are spaces in it. Or if there is just one of the two, e.g. it has < but not >. All those now generate 404s, which is not good.
This patch needs to be in sync with the callers of that API as well as the code that actually views entries in the archives before it can be safely applied.
Hi Magnus, On 10/03/2015 07:31 PM, Magnus Hagander wrote: > On Fri, Oct 2, 2015 at 7:29 PM, Amir Rohan <amir.rohan@zoho.com > <mailto:amir.rohan@zoho.com>> wrote: > > In some email-clients, when you copy the Message-ID to the clipboard > it surrounds in (or perhaps, doesn't strip) angle brackets. > When pasting this into the mail archive search, search fails to find > the message > and this happens often enough so as to be annoying. The fix is trivial. > > Discussion of similar problem in the commitfest app: > > <CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com> > <mailto:CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com> > > patch attached. > > > Interesting - we also had a similar fix in 9b7e9b59 for the archives, > but that one only deals with the URLs and not the search box. Makes much > sense to do it in the search box as well, so applied. Thanks! > > Actually, before I sent this. >It's also broken because it redirects to bad URLs if there are spaces in it. What is "It"? > Or if there is just one of the two, > e.g. it has < but not >. I'm sorry, I lost you. Let's try and sync... Prior to this patch: "msgid" = works " msgid " = fails " msgid" = fails "msgid " = fails <msgid>" = fails <msgid" = fails "msgid>" = fails " msgid>" = fails etc' With this patch -- they should all work. If there's a search query that currently works and is broken by this patch, can you give an example? If you simply mean that searching for "m sgid" is broken, yes it currently is, and no the patch doesn't fix that. > All those now generate 404s, which is not good. > "Now" meaning before applying this patch? > This patch needs to be in sync with the callers of that API as well as > the code that actually views entries in the archives before it can be > safely applied. There is no API change. afaict nothing that currently works and uses this endpoint should be affected. The only effect should be that queries that now fail but should succeed, succeed. I'm happy to address any concerns you have, but I don't understand what precisely you want done. Amir
On Sat, Oct 3, 2015 at 7:42 PM, Amir Rohan <amir.rohan@zoho.com> wrote:
-- Hi Magnus,
On 10/03/2015 07:31 PM, Magnus Hagander wrote:
> On Fri, Oct 2, 2015 at 7:29 PM, Amir Rohan <amir.rohan@zoho.com
> <mailto:amir.rohan@zoho.com>> wrote:
>
> In some email-clients, when you copy the Message-ID to the clipboard
> it surrounds in (or perhaps, doesn't strip) angle brackets.
> When pasting this into the mail archive search, search fails to find
> the message
> and this happens often enough so as to be annoying. The fix is trivial.
>
> Discussion of similar problem in the commitfest app:
>
> <CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com>
> <mailto:CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com>
>
> patch attached.
>
>
> Interesting - we also had a similar fix in 9b7e9b59 for the archives,
> but that one only deals with the URLs and not the search box. Makes much
> sense to do it in the search box as well, so applied. Thanks!
>
> Actually, before I sent this.
>It's also broken because it redirects to bad URLs if there are spaces
in it.
What is "It"?
> Or if there is just one of the two, > e.g. it has < but not >.
I'm sorry, I lost you. Let's try and sync...
Prior to this patch:
"msgid" = works
" msgid " = fails
" msgid" = fails
"msgid " = fails
<msgid>" = fails
<msgid" = fails
"msgid>" = fails
" msgid>" = fails
etc'
With this patch -- they should all work.
If there's a search query that currently works and is broken by this
patch, can you give an example?
The search works. Meaning the API returns "this messageid exists", which causes the search client to send a redirect to an URL that still has the non-stripped version of the URL (including the < or space).
The regexp used for parsing URLs only allows either "msgid" or "<msgid>". Any other of those combinations will result in a 404 at the receiving end of that redirect.
> All those now generate 404s, which is not good.
>
"Now" meaning before applying this patch?
No, after. Today they render a 200 OK with "search returned no hits".
> This patch needs to be in sync with the callers of that API as well as
> the code that actually views entries in the archives before it can be
> safely applied.
There is no API change. afaict nothing that currently works and uses
this endpoint should be affected. The only effect should be that
queries that now fail but should succeed, succeed.
That is exactly what I'm saying it doesn't. I deployed it and tested and it broke.
So either the API needs to be updated to return the actual URL, and the consumers (which at this point I believe are just the main website) need to be updated to actually use that URL, or the consumer also needs to be updated with the same "cleaning" rules, or the URL parsing in the archives code need to also deal with all those cases.
I think either the first, or a combination of first and third, of those are the way to go. The second one (doing the same changes in the pgweb code) seems to just put us in a position where we'll make the same mistake again next time we try to fix these rules.
On 10/05/2015 11:57 AM, Magnus Hagander wrote: > > > On Sat, Oct 3, 2015 at 7:42 PM, Amir Rohan <amir.rohan@zoho.com > <mailto:amir.rohan@zoho.com>> wrote: > > Hi Magnus, > > On 10/03/2015 07:31 PM, Magnus Hagander wrote: > > On Fri, Oct 2, 2015 at 7:29 PM, Amir Rohan <amir.rohan@zoho.com <mailto:amir.rohan@zoho.com> > > <mailto:amir.rohan@zoho.com <mailto:amir.rohan@zoho.com>>> wrote: > > > > In some email-clients, when you copy the Message-ID to the clipboard > > it surrounds in (or perhaps, doesn't strip) angle brackets. > > When pasting this into the mail archive search, search fails to find > > the message > > and this happens often enough so as to be annoying. The fix is trivial. > > > > Discussion of similar problem in the commitfest app: > > > > <CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com > <mailto:CABUevExRGr%2BQ5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com>> > > > <mailto:CABUevExRGr+Q5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com > <mailto:CABUevExRGr%2BQ5gzp-9cDLOqgntsOEcM0zZYg5VkADj3Z_NKz5A@mail.gmail.com>> > > > > patch attached. > > > > > > Interesting - we also had a similar fix in 9b7e9b59 for the archives, > > but that one only deals with the URLs and not the search box. Makes much > > sense to do it in the search box as well, so applied. Thanks! > > > > Actually, before I sent this. > >It's also broken because it redirects to bad URLs if there are spaces > in it. > > What is "It"? > > > Or if there is just one of the two, > e.g. it has < but not >. > > I'm sorry, I lost you. Let's try and sync... > > Prior to this patch: > > "msgid" = works > " msgid " = fails > " msgid" = fails > "msgid " = fails > <msgid>" = fails > <msgid" = fails > "msgid>" = fails > " msgid>" = fails > etc' > > With this patch -- they should all work. > > If there's a search query that currently works and is broken by this > patch, can you give an example? > > > > The search works. Meaning the API returns "this messageid exists", which > causes the search client to send a redirect to an URL that still has the > non-stripped version of the URL (including the < or space). > > The regexp used for parsing URLs only allows either "msgid" or > "<msgid>". Any other of those combinations will result in a 404 at the > receiving end of that redirect. > > > > All those now generate 404s, which is not good. > > > > "Now" meaning before applying this patch? > > > No, after. Today they render a 200 OK with "search returned no hits". > > > > > > This patch needs to be in sync with the callers of that API as well as > > the code that actually views entries in the archives before it can be > > safely applied. > > There is no API change. afaict nothing that currently works and uses > this endpoint should be affected. The only effect should be that > queries that now fail but should succeed, succeed. > > > > That is exactly what I'm saying it doesn't. I deployed it and tested and > it broke. > > So either the API needs to be updated to return the actual URL, and the > consumers (which at this point I believe are just the main website) need > to be updated to actually use that URL, or the consumer also needs to be > updated with the same "cleaning" rules, or the URL parsing in the > archives code need to also deal with all those cases. > > I think either the first, or a combination of first and third, of those > are the way to go. The second one (doing the same changes in the pgweb > code) seems to just put us in a position where we'll make the same > mistake again next time we try to fix these rules. > Ah, thanks for clearing that up, I understand now and I didn't think to test that. I'll send an updated patch. Amir
On 10/05/2015 11:57 AM, Magnus Hagander wrote: > > > On Sat, Oct 3, 2015 at 7:42 PM, Amir Rohan <amir.rohan@zoho.com > <mailto:amir.rohan@zoho.com>> wrote: > > Hi Magnus, > > On 10/03/2015 07:31 PM, Magnus Hagander wrote: > > On Fri, Oct 2, 2015 at 7:29 PM, Amir Rohan <amir.rohan@zoho.com <mailto:amir.rohan@zoho.com> > > <mailto:amir.rohan@zoho.com <mailto:amir.rohan@zoho.com>>> wrote: > > > > In some email-clients, when you copy the Message-ID to the clipboard > > it surrounds in (or perhaps, doesn't strip) angle brackets. > > When pasting this into the mail archive search, search fails to find > > the message > > patch attached. > > > > The search works. Meaning the API returns "this messageid exists", which > causes the search client to send a redirect to an URL that still has the > non-stripped version of the URL (including the < or space). > > The regexp used for parsing URLs only allows either "msgid" or > "<msgid>". Any other of those combinations will result in a 404 at the > receiving end of that redirect. > > <...> > > So either the API needs to be updated to return the actual URL, and the > consumers (which at this point I believe are just the main website) need > to be updated to actually use that URL, or the consumer also needs to be > updated with the same "cleaning" rules, or the URL parsing in the > archives code need to also deal with all those cases. > > I think either the first, or a combination of first and third, of those > are the way to go. The second one (doing the same changes in the pgweb > code) seems to just put us in a position where we'll make the same > mistake again next time we try to fix these rules. > With these patches (now touching pgweb and pgarchives, when a (cleaned) search query gets a hit on a msgid we now include the canonical messageid in the JSON response. I didn't have it return a url as you suggested because archives should know as little as possible about pgweb's link structure. So pgweb still crafts the redirect url, only it uses the returned msgid if its present in the response. A separate patch makes archives notion of where pgweb /search lives coonfigurable, so devs can test. As a side note, the ping-pong between archives and pgweb seems too complicated. The search form should possibly be in pgweb instead of archives, as all it does is server a form submitting to pgweb's own /search endpoint, it has nothing to do with archives directly. I tested as best I could, but there's some stuff missing from my dev setup and no documentation how get thing running. Amir
Вложения
Amir Rohan wrote: > With these patches (now touching pgweb and pgarchives, when a (cleaned) > search query gets a hit on a msgid we now include the canonical > messageid in the JSON response. I didn't have it return a url as you > suggested because archives should know as little as possible > about pgweb's link structure. So pgweb still crafts the redirect url, > only it uses the returned msgid if its present in the response. FWIW the URL structure for archives can never be changed, because many URLs are in use in permanent storage such as commit messages and archived emails. As an example, when we migrated from the old Mhonarc-generated urls (with the mbox/month/message number structure) we, or Magnus actually, created a database with redirects from that structure to the message-id-based URL structure, to avoid breaking links. I'm not saying your approach is bad or shouldn't be used, just pointing out something that the rest of the -www team is probably already aware of and you might not be. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/05/2015 04:59 PM, Alvaro Herrera wrote: > Amir Rohan wrote: > >> With these patches (now touching pgweb and pgarchives, when a (cleaned) >> search query gets a hit on a msgid we now include the canonical >> messageid in the JSON response. I didn't have it return a url as you >> suggested because archives should know as little as possible >> about pgweb's link structure. So pgweb still crafts the redirect url, >> only it uses the returned msgid if its present in the response. > > FWIW the URL structure for archives can never be changed, because many > URLs are in use in permanent storage such as commit messages and > archived emails. As an example, when we migrated from the old > Mhonarc-generated urls (with the mbox/month/message number structure) > we, or Magnus actually, created a database with redirects from that > structure to the message-id-based URL structure, to avoid breaking > links. > > I'm not saying your approach is bad or shouldn't be used, just pointing > out something that the rest of the -www team is probably already aware > of and you might not be. > That's a valid point (two, actually) but it still /logically/ belongs in the the main website instead of the archive "API server", and keeps with how this worked before. But if magnus concurs with you, It's easily changed. Amir
Hi, Related issue to <560EBEF5.6070209@zoho.com> seen here: http://www.postgresql.org/message-id/CACACo5Sz7G0MFauC082iM=XX_hQ7qQ5ndR4JPo+H-O5vp6iCcQ@mail.gmail.com msgid in link gets flagged as email address and is rewritten for spam protection, breaking and the link. Amir
On 10/05/2015 04:01 PM, Amir Rohan wrote: > On 10/05/2015 11:57 AM, Magnus Hagander wrote: >> >> So either the API needs to be updated to return the actual URL, and the >> consumers (which at this point I believe are just the main website) need >> to be updated to actually use that URL, or the consumer also needs to be >> updated with the same "cleaning" rules, or the URL parsing in the >> archives code need to also deal with all those cases. >> >> I think either the first, or a combination of first and third, of those >> are the way to go. The second one (doing the same changes in the pgweb >> code) seems to just put us in a position where we'll make the same >> mistake again next time we try to fix these rules. >> > > With these patches <...> Bump. Magnus, is there something more I can do to get this on live on postgresql.org?
On 10/21/2015 04:49 PM, Amir Rohan wrote: > On 10/05/2015 04:01 PM, Amir Rohan wrote: >> On 10/05/2015 11:57 AM, Magnus Hagander wrote: >>> >>> So either the API needs to be updated to return the actual URL, and the >>> consumers (which at this point I believe are just the main website) need >>> to be updated to actually use that URL, or the consumer also needs to be >>> updated with the same "cleaning" rules, or the URL parsing in the >>> archives code need to also deal with all those cases. >>> >>> I think either the first, or a combination of first and third, of those >>> are the way to go. The second one (doing the same changes in the pgweb >>> code) seems to just put us in a position where we'll make the same >>> mistake again next time we try to fix these rules. >>> >> >> With these patches <...> > > Bump. Magnus, is there something more I can do to get this on live > on postgresql.org? > Bump. Waiting for feedback on the revised patch. Amir