Обсуждение: [PATCH] TODO “Allow LISTEN on patterns”

Поиск
Список
Период
Сортировка

[PATCH] TODO “Allow LISTEN on patterns”

От
Alexander Cheshev
Дата:
Hello Hackers,

I have implemented TODO “Allow LISTEN on patterns” [1] and attached
the patch to the email. The patch basically consists of the following
two parts.

1. Support wildcards in LISTEN command

Notification channels can be composed of multiple levels in the form
‘a.b.c’ where ‘a’, ‘b’ and ‘c’ are identifiers.

Listen channels can be composed of multiple levels in the form ‘a.b.c’
where ‘a’, ‘b’ and ‘c’ are identifiers which can contain the following
wildcards:
 *  ‘%’ matches everything until the end of a level. Can only appear
at the end of a level. For example, the notification channels ‘a.b.c’,
‘a.bc.c’ match against the listen channel ‘a.b%.c’.
 * ‘>’ matches everything to the right. Can only appear at the end of
the last level. For example, the notification channels ‘a.b’, ‘a.bc.d’
match against the listen channel ‘a.b>’.

In [1] Sev Zaslavsky proposed to add a GUC so that we don't break
existing code. The patch adds three additional characters ‘.’, ‘%’ and
‘>’ which are forbidden in existing code. Without these characters the
patch works in the same way as existing code. So there is no need to
add a GUC.

2. Performance improvement of matching

To match a notification channel against listen channels Postgres uses
the function IsListeningOn which iterates over all listen channels and
compares them with the notification channel. The time complexity can
be estimated as O(nm) where n is the number of the listen channels and
m is the size of the notification channel.

To match a notification channel against listen channels the patch
searches in binary trie. The time complexity can be estimated as O(m)
where m is the size of the notification channel. So there is no
dependence on the number of the listen channels.

The patch builds binary trie in O(nm) where n is the number of the
listen channels and m is the maximum length among the listen channels.
The space complexity required to build a binary trie is dominated by
the leaf nodes and can be estimated as O(n) where n is the number of
the listen channels.

I gathered data to compare Postgres with the patch. In the file
benchmark.jpg you can find three graphics for three different amounts
of notifications. Horizontal line represents number of listen channels
and vertical line represents time in nanoseconds. The time measures
pure calls to IsListeningOn and IsMatchingOn. From the graphics you
can deduce the following observations:
 * The time of the trie match doesn’t depend on the number of listen
channels and remains the same. The time of the list search depends
linerary on the number of listen channels. So the practical results
coincide with the theoretical observations.
 * When the number of listen channels is higher than 250 the trie
match outperforms list search around 6 times.
 * When the number of listen channels is lower than 16 the list search
outperforms the trie match around 2 times. I tried to use list match
on a small number of listen channels but it didn’t perform any better
than the trie match. The reason for that is that the list search uses
strcmp under the hood which I couldn’t purely use because I also had
to deal with wildcards.

[1] https://www.postgresql.org/message-id/flat/52693FC5.7070507%40gmail.com

Regards,
Alexander Cheshev

Вложения

Re: [PATCH] TODO “Allow LISTEN on patterns”

От
Emanuel Calvo
Дата:

Hello there,


El vie, 15 mar 2024 a las 9:01, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hello Hackers,

I have implemented TODO “Allow LISTEN on patterns” [1] and attached
the patch to the email. The patch basically consists of the following
two parts.

1. Support wildcards in LISTEN command

Notification channels can be composed of multiple levels in the form
‘a.b.c’ where ‘a’, ‘b’ and ‘c’ are identifiers.

Listen channels can be composed of multiple levels in the form ‘a.b.c’
where ‘a’, ‘b’ and ‘c’ are identifiers which can contain the following
wildcards:
 *  ‘%’ matches everything until the end of a level. Can only appear
at the end of a level. For example, the notification channels ‘a.b.c’,
‘a.bc.c’ match against the listen channel ‘a.b%.c’.
 * ‘>’ matches everything to the right. Can only appear at the end of
the last level. For example, the notification channels ‘a.b’, ‘a.bc.d’
match against the listen channel ‘a.b>’.


I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

postgres=# LISTEN device1.alerts.%;
LISTEN
postgres=# ;
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.
postgres=# UNLISTEN >;
UNLISTEN
postgres=# ; -- Here I send a notification over the same channel
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.

The same happens with "UNLISTEN %;", although I'm not sure if this should have
the same behavior.

It stops listening correctly if I do explicit UNLISTEN (exact channel matching).

I'll be glad to conduct more tests or checks on this.

Cheers,


--
--
Emanuel Calvo
Database Engineering

Re: [PATCH] TODO “Allow LISTEN on patterns”

От
Alexander Cheshev
Дата:
Hi Emanuel,

I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

TODO “Allow LISTEN on patterns” [1] is a bit vague about that feature. So I didn't implement it in the first version of the patch. Also I see that I made a mistake in the documentation and mentioned that it is actually supported. Sorry for the confusion.

Besides obvious reasons I think that your finding is especially attractive for the following reason. We have an UNLISTEN * command. If we replace > with * in the patch (which I actually did in the new version of the patch) then we have a generalisation of the above command. For example, UNLISTEN a* cancels registration on all channels which start with a.

I attached to the email the new version of the patch which supports the requested feature. Instead of > I use * for the reason which I mentioned above. Also I added test cases, changed documentation, etc.

I appreciate your work, Emanuel! If you have any further findings I will be glad to adjust the patch accordingly.


Regards,
Alexander Cheshev

Regards,
Alexander Cheshev


On Tue, 9 Jul 2024 at 11:01, Emanuel Calvo <3manuek@gmail.com> wrote:

Hello there,


El vie, 15 mar 2024 a las 9:01, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hello Hackers,

I have implemented TODO “Allow LISTEN on patterns” [1] and attached
the patch to the email. The patch basically consists of the following
two parts.

1. Support wildcards in LISTEN command

Notification channels can be composed of multiple levels in the form
‘a.b.c’ where ‘a’, ‘b’ and ‘c’ are identifiers.

Listen channels can be composed of multiple levels in the form ‘a.b.c’
where ‘a’, ‘b’ and ‘c’ are identifiers which can contain the following
wildcards:
 *  ‘%’ matches everything until the end of a level. Can only appear
at the end of a level. For example, the notification channels ‘a.b.c’,
‘a.bc.c’ match against the listen channel ‘a.b%.c’.
 * ‘>’ matches everything to the right. Can only appear at the end of
the last level. For example, the notification channels ‘a.b’, ‘a.bc.d’
match against the listen channel ‘a.b>’.


I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

postgres=# LISTEN device1.alerts.%;
LISTEN
postgres=# ;
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.
postgres=# UNLISTEN >;
UNLISTEN
postgres=# ; -- Here I send a notification over the same channel
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.

The same happens with "UNLISTEN %;", although I'm not sure if this should have
the same behavior.

It stops listening correctly if I do explicit UNLISTEN (exact channel matching).

I'll be glad to conduct more tests or checks on this.

Cheers,


--
--
Emanuel Calvo
Database Engineering

Вложения

Re: [PATCH] TODO “Allow LISTEN on patterns”

От
Alexander Cheshev
Дата:
Hi Emanuel,

Changed implementation of the function Exec_UnlistenCommit . v2 of the path contained a bug in the function Exec_UnlistenCommit (added a test case for that) and also it was not implemented in natural to C form using pointers. Now it looks fine and works as expected.

In the previous email I forgot to mention that the new implementation of the function Exec_UnlistenCommit has the same space and time complexities as the original implementation (which doesn't support wildcards).

Regards,
Alexander Cheshev


On Sat, 13 Jul 2024 at 13:26, Alexander Cheshev <alex.cheshev@gmail.com> wrote:
Hi Emanuel,

I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

TODO “Allow LISTEN on patterns” [1] is a bit vague about that feature. So I didn't implement it in the first version of the patch. Also I see that I made a mistake in the documentation and mentioned that it is actually supported. Sorry for the confusion.

Besides obvious reasons I think that your finding is especially attractive for the following reason. We have an UNLISTEN * command. If we replace > with * in the patch (which I actually did in the new version of the patch) then we have a generalisation of the above command. For example, UNLISTEN a* cancels registration on all channels which start with a.

I attached to the email the new version of the patch which supports the requested feature. Instead of > I use * for the reason which I mentioned above. Also I added test cases, changed documentation, etc.

I appreciate your work, Emanuel! If you have any further findings I will be glad to adjust the patch accordingly.


Regards,
Alexander Cheshev

Regards,
Alexander Cheshev


On Tue, 9 Jul 2024 at 11:01, Emanuel Calvo <3manuek@gmail.com> wrote:

Hello there,


El vie, 15 mar 2024 a las 9:01, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hello Hackers,

I have implemented TODO “Allow LISTEN on patterns” [1] and attached
the patch to the email. The patch basically consists of the following
two parts.

1. Support wildcards in LISTEN command

Notification channels can be composed of multiple levels in the form
‘a.b.c’ where ‘a’, ‘b’ and ‘c’ are identifiers.

Listen channels can be composed of multiple levels in the form ‘a.b.c’
where ‘a’, ‘b’ and ‘c’ are identifiers which can contain the following
wildcards:
 *  ‘%’ matches everything until the end of a level. Can only appear
at the end of a level. For example, the notification channels ‘a.b.c’,
‘a.bc.c’ match against the listen channel ‘a.b%.c’.
 * ‘>’ matches everything to the right. Can only appear at the end of
the last level. For example, the notification channels ‘a.b’, ‘a.bc.d’
match against the listen channel ‘a.b>’.


I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

postgres=# LISTEN device1.alerts.%;
LISTEN
postgres=# ;
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.
postgres=# UNLISTEN >;
UNLISTEN
postgres=# ; -- Here I send a notification over the same channel
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.

The same happens with "UNLISTEN %;", although I'm not sure if this should have
the same behavior.

It stops listening correctly if I do explicit UNLISTEN (exact channel matching).

I'll be glad to conduct more tests or checks on this.

Cheers,


--
--
Emanuel Calvo
Database Engineering

Вложения

Re: [PATCH] TODO “Allow LISTEN on patterns”

От
Emanuel Calvo
Дата:

Hi Alexander,

I did a review on the new patch version and I observed that the identifier
passed to the LISTEN command is handled differently between outer and inner 
levels.

When the outer level exceeds the 64 characters limitation, the outer level of the 
channel name is truncated, but leaves the inner levels in the channel name due
that isn't parsed in the same way.
 
Also, even if the outer level isn't truncated, it is allowed to add channels names 
that exceeds the allowed identifier size.

It can be reproduced just by:

      # LISTEN a.a.a.a.a.lot.of.levels..; -- this doesn't fail at LISTEN, but fails in NOTIFY due to channel name too long

In the following, the outer level is truncated, but it doesn't cut out the inner levels. This leaves
listening channels that cannot receive any notifications in the queue:

      # LISTEN notify_async_channel_name_too_long____________________________________.a.a. ...
      NOTICE: identifier .... will be truncated

      # select substring(c.channel,0,66), length(c.channel) from pg_listening_channels() c(channel) where length(c.channel) > 64;
      substring | notify_async_channel_name_too_long_____________________________.a
      length    | 1393


I guess that the expected behavior would be that if the outer level is truncated, the rest of the
channel name should be ignored, as there won't be possible to notify it anyway.

In the case of the inner levels creating a channel name too long, it may probably sane to just 
check the length of the entire identifier, and truncate -- ensuring that channel name doesn't 
end with the level separator.

Another observation, probably not strictly related to this patch itself but the async-notify tests, is that there is no test for 
"payload too long". Probably there is a reason on why isn't in the specs?


Regards,


El lun, 15 jul 2024 a las 12:59, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hi Emanuel,

Changed implementation of the function Exec_UnlistenCommit . v2 of the path contained a bug in the function Exec_UnlistenCommit (added a test case for that) and also it was not implemented in natural to C form using pointers. Now it looks fine and works as expected.

In the previous email I forgot to mention that the new implementation of the function Exec_UnlistenCommit has the same space and time complexities as the original implementation (which doesn't support wildcards).

Regards,
Alexander Cheshev


On Sat, 13 Jul 2024 at 13:26, Alexander Cheshev <alex.cheshev@gmail.com> wrote:
Hi Emanuel,

I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

TODO “Allow LISTEN on patterns” [1] is a bit vague about that feature. So I didn't implement it in the first version of the patch. Also I see that I made a mistake in the documentation and mentioned that it is actually supported. Sorry for the confusion.

Besides obvious reasons I think that your finding is especially attractive for the following reason. We have an UNLISTEN * command. If we replace > with * in the patch (which I actually did in the new version of the patch) then we have a generalisation of the above command. For example, UNLISTEN a* cancels registration on all channels which start with a.

I attached to the email the new version of the patch which supports the requested feature. Instead of > I use * for the reason which I mentioned above. Also I added test cases, changed documentation, etc.

I appreciate your work, Emanuel! If you have any further findings I will be glad to adjust the patch accordingly.


Regards,
Alexander Cheshev

Regards,
Alexander Cheshev


On Tue, 9 Jul 2024 at 11:01, Emanuel Calvo <3manuek@gmail.com> wrote:

Hello there,


El vie, 15 mar 2024 a las 9:01, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hello Hackers,

I have implemented TODO “Allow LISTEN on patterns” [1] and attached
the patch to the email. The patch basically consists of the following
two parts.

1. Support wildcards in LISTEN command

Notification channels can be composed of multiple levels in the form
‘a.b.c’ where ‘a’, ‘b’ and ‘c’ are identifiers.

Listen channels can be composed of multiple levels in the form ‘a.b.c’
where ‘a’, ‘b’ and ‘c’ are identifiers which can contain the following
wildcards:
 *  ‘%’ matches everything until the end of a level. Can only appear
at the end of a level. For example, the notification channels ‘a.b.c’,
‘a.bc.c’ match against the listen channel ‘a.b%.c’.
 * ‘>’ matches everything to the right. Can only appear at the end of
the last level. For example, the notification channels ‘a.b’, ‘a.bc.d’
match against the listen channel ‘a.b>’.


I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

postgres=# LISTEN device1.alerts.%;
LISTEN
postgres=# ;
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.
postgres=# UNLISTEN >;
UNLISTEN
postgres=# ; -- Here I send a notification over the same channel
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.

The same happens with "UNLISTEN %;", although I'm not sure if this should have
the same behavior.

It stops listening correctly if I do explicit UNLISTEN (exact channel matching).

I'll be glad to conduct more tests or checks on this.

Cheers,


--
--
Emanuel Calvo
Database Engineering



--
--
Emanuel Calvo

Re: [PATCH] TODO “Allow LISTEN on patterns”

От
Alexander Cheshev
Дата:
Hi Emanuel,

I did a review on the new patch version and I observed that the identifier
passed to the LISTEN command is handled differently between outer and inner 
levels.

We have the following grammar:

notify_channel:
                       ColId
                                       { $$ = $1; }
                       | notify_channel '.' ColId
                                       { $$ = psprintf("%s.%s", $1, $3); }

And ColId is truncated in core scanner:

     ident = downcase_truncate_identifier(yytext, yyleng, true);

So each level is truncated independently. For this reason we observe the behaviour which you described above.  

Another observation, probably not strictly related to this patch itself but the async-notify tests, is that there is no test for 
"payload too long". Probably there is a reason on why isn't in the specs?

I believe that simply because not all functionality is covered with tests. But I have noticed a very interesting test "channel name too long":

SELECT pg_notify('notify_async_channel_name_too_long______________________________','sample_message1');
ERROR:  channel name too long

But the behaviour is inconsistent with NOTIFY command:

NOTIFY notify_async_channel_name_too_long______________________________
NOTICE:  identifier "notify_async_channel_name_too_long______________________________" will be truncated to ...

I guess that the expected behavior would be that if the outer level is truncated, the rest of the
channel name should be ignored, as there won't be possible to notify it anyway.

In the case of the inner levels creating a channel name too long, it may probably sane to just 
check the length of the entire identifier, and truncate -- ensuring that channel name doesn't 
end with the level separator.


Well, I believe that we can forbid too long channel names at all. So it provides consistent behaviour among different ways we can send notifications, and I agree with you that "there won't be possible to notify it anyway". I created a patch for that and attached it to the email. In the patch I relocated truncation from core scanner to parser. And as the same core scanner is also used in plsql I added three lines of code to its scanner to basically truncate too long identifiers in there. Here is an example of the new behaviour:

-- Should fail. Too long channel names
NOTIFY notify_async_channel_name_too_long_________._____________________;
ERROR:  channel name too long
LISTEN notify_async_channel_name_too_long_________%._____________________;
ERROR:  channel name too long
UNLISTEN notify_async_channel_name_too_long_________%._____________________;
ERROR:  channel name too long

Regards,
Alexander Cheshev


On Sun, 21 Jul 2024 at 21:36, Emanuel Calvo <3manuek@gmail.com> wrote:

Hi Alexander,

I did a review on the new patch version and I observed that the identifier
passed to the LISTEN command is handled differently between outer and inner 
levels.

When the outer level exceeds the 64 characters limitation, the outer level of the 
channel name is truncated, but leaves the inner levels in the channel name due
that isn't parsed in the same way.
 
Also, even if the outer level isn't truncated, it is allowed to add channels names 
that exceeds the allowed identifier size.

It can be reproduced just by:

      # LISTEN a.a.a.a.a.lot.of.levels..; -- this doesn't fail at LISTEN, but fails in NOTIFY due to channel name too long

In the following, the outer level is truncated, but it doesn't cut out the inner levels. This leaves
listening channels that cannot receive any notifications in the queue:

      # LISTEN notify_async_channel_name_too_long____________________________________.a.a. ...
      NOTICE: identifier .... will be truncated

      # select substring(c.channel,0,66), length(c.channel) from pg_listening_channels() c(channel) where length(c.channel) > 64;
      substring | notify_async_channel_name_too_long_____________________________.a
      length    | 1393


I guess that the expected behavior would be that if the outer level is truncated, the rest of the
channel name should be ignored, as there won't be possible to notify it anyway.

In the case of the inner levels creating a channel name too long, it may probably sane to just 
check the length of the entire identifier, and truncate -- ensuring that channel name doesn't 
end with the level separator.

Another observation, probably not strictly related to this patch itself but the async-notify tests, is that there is no test for 
"payload too long". Probably there is a reason on why isn't in the specs?


Regards,


El lun, 15 jul 2024 a las 12:59, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hi Emanuel,

Changed implementation of the function Exec_UnlistenCommit . v2 of the path contained a bug in the function Exec_UnlistenCommit (added a test case for that) and also it was not implemented in natural to C form using pointers. Now it looks fine and works as expected.

In the previous email I forgot to mention that the new implementation of the function Exec_UnlistenCommit has the same space and time complexities as the original implementation (which doesn't support wildcards).

Regards,
Alexander Cheshev


On Sat, 13 Jul 2024 at 13:26, Alexander Cheshev <alex.cheshev@gmail.com> wrote:
Hi Emanuel,

I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

TODO “Allow LISTEN on patterns” [1] is a bit vague about that feature. So I didn't implement it in the first version of the patch. Also I see that I made a mistake in the documentation and mentioned that it is actually supported. Sorry for the confusion.

Besides obvious reasons I think that your finding is especially attractive for the following reason. We have an UNLISTEN * command. If we replace > with * in the patch (which I actually did in the new version of the patch) then we have a generalisation of the above command. For example, UNLISTEN a* cancels registration on all channels which start with a.

I attached to the email the new version of the patch which supports the requested feature. Instead of > I use * for the reason which I mentioned above. Also I added test cases, changed documentation, etc.

I appreciate your work, Emanuel! If you have any further findings I will be glad to adjust the patch accordingly.


Regards,
Alexander Cheshev

Regards,
Alexander Cheshev


On Tue, 9 Jul 2024 at 11:01, Emanuel Calvo <3manuek@gmail.com> wrote:

Hello there,


El vie, 15 mar 2024 a las 9:01, Alexander Cheshev (<alex.cheshev@gmail.com>) escribió:
Hello Hackers,

I have implemented TODO “Allow LISTEN on patterns” [1] and attached
the patch to the email. The patch basically consists of the following
two parts.

1. Support wildcards in LISTEN command

Notification channels can be composed of multiple levels in the form
‘a.b.c’ where ‘a’, ‘b’ and ‘c’ are identifiers.

Listen channels can be composed of multiple levels in the form ‘a.b.c’
where ‘a’, ‘b’ and ‘c’ are identifiers which can contain the following
wildcards:
 *  ‘%’ matches everything until the end of a level. Can only appear
at the end of a level. For example, the notification channels ‘a.b.c’,
‘a.bc.c’ match against the listen channel ‘a.b%.c’.
 * ‘>’ matches everything to the right. Can only appear at the end of
the last level. For example, the notification channels ‘a.b’, ‘a.bc.d’
match against the listen channel ‘a.b>’.


I did a test over the "UNLISTEN >" behavior, and I'm not sure if this is expected.
This command I assume should free all the listening channels, however, it doesn't
seem to do so:

postgres=# LISTEN device1.alerts.%;
LISTEN
postgres=# ;
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.
postgres=# UNLISTEN >;
UNLISTEN
postgres=# ; -- Here I send a notification over the same channel
Asynchronous notification "device1.alerts.temp" with payload "80" received from server process with PID 237.

The same happens with "UNLISTEN %;", although I'm not sure if this should have
the same behavior.

It stops listening correctly if I do explicit UNLISTEN (exact channel matching).

I'll be glad to conduct more tests or checks on this.

Cheers,


--
--
Emanuel Calvo
Database Engineering



--
--
Emanuel Calvo

Вложения