Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Дата
Msg-id CAD21AoCNNkLPoc+PWLGpZs74NwHxCvvffAjx6Yn3pJ2xJJPWtw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Список pgsql-bugs
On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear Sawada-san,
> > >
> > > > Alternative idea is to lower the constant value when using an
> > > > assertion build. That way, we don't need to rely on injection points
> > > > being enabled.
> > >
> > > Hmm, possible but I prefer current one. Two concerns:
> > >
> > > 1.
> > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > to call debug functions in debug build.
> >
> > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > the test coverages.
> >
> > > 2.
> > > If we add tests which is usable only for debug build, it must be run only when it
> > > is enabled. IIUC such test does not exist yet.
> >
> > I think we need to test cases not to check if we reach a specific code
> > point but to check if we can get the correct results even if we've
> > executed various code paths. As for this bug, it is better to check
> > that it works properly in a variety of cases. That way, we can check
> > overflow cases and non-overflow cases also in test cases added in the
> > future, improving the test coverage more.
> >
>
> This makes sense, but we should see whether some existing tests cover
> this code path after lowering the limit in the overflow code path. One
> minor point to consider is that at the time, the value MT_NRELS_HASH
> was used to cover cases in a debug build, but we didn't have the
> injection_point framework.

True.

After thinking about it more, perhaps my proposal would not be a good
idea for this case. I think that the cases where we selectively
invalidate caches is more complex and error-prone than the cases where
we invalidate a complete cache. If we invalidated all caches after
decoding each transaction, we wouldn't have had the original data-loss
issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
assertio build means that we're going to test the cases using a
simpler invalidation mechanism while productions systems, which has a
higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
complex cases, which is not great. What do you think?

BTW, as for a new test case, it might be worth having a case I
mentioned before[1]:

1)  S1: CREATE TABLE d (data text not null);
2)  S1: INSERT INTO d VALUES ('d1');
3)      S2: BEGIN; INSERT INTO d VALUES ('d2');
4)          S3: BEGIN; INSERT INTO d VALUES ('d3');
5)  S1: ALTER PUBLICATION pb ADD TABLE d;
6)      S2: INSERT INTO d VALUES ('d4');
7)      S2: COMMIT;
8)          S3: COMMIT;
9)    S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');

With this case, we can test if we need to execute the distributed
invalidations as well in the non-error path in
ReorderBufferProcessTXN().

>
> BTW, I noticed that you removed the following comments in your suggestions:
>   /*
>   * Stores cache invalidation messages distributed by other transactions.
> - *
> - * It is acceptable to skip invalidations received from concurrent
> - * transactions during ReorderBufferForget and ReorderBufferInvalidate,
> - * because the transaction being discarded wouldn't have loaded any shared
>
> IIUC, you only mentioned having some comments like this for ease of
> understanding, and now you are suggesting to remove those.

I forgot to mention the reason. I thought we need either a
comprehensive comment in a place about in which case we need to
execute both the current transaction's inval messages and the
distributed inval messages and in which case we need to execute only
inval messages in the current transaction or to put comments where we
need. The v11 added the comprehensive comment to the declaration of
ninvalidations_distributed and invalidations_distributed in
ReoderBufferTXN, but I'm not sure that was the right place to have
such a comment as it's beyond the description of these fields. So in
my suggestion, I tried to clarify that we execute only the inval
message in the current transaction in ReorderBufferForget() and
ReorderBufferAbort() as they seems to already have a enough comment
for the reason why we need to execute the inval message there.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoCeM2nni1P7Z6KXzLM%3D6zCdShC82sOvuvu0_hBuJkm9Qw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-bugs по дате отправления: