Обсуждение: Re: dsa_allocate() faliure

Поиск
Список
Период
Сортировка

Re: dsa_allocate() faliure

От
Justin Pryzby
Дата:
Moving to -hackers, hopefully it doesn't confuse the list scripts too much.

On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> I see the error showing up every night on 2 different servers. But it's a
> bit of a heisenbug because If I go there now it won't be reproducible.

Do you have query logging enabled ?  If not, could you consider it on at least
one of those servers ?  I'm interested to know what ELSE is running at the time
that query failed.  

Perhaps you could enable query logging JUST for the interval of time that the
server usually errors ?  The CSV logs can be imported to postgres for analysis.
You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN
..AND .. GROUP BY 1 ORDER BY 2;
 
And just maybe there'd be a query there that only runs once per day which would
allow reproducing the error at will.  Or utility command like vacuum..

I think ideally you'd set:

log_statement                = all
log_min_messages             = info
log_destination              = 'stderr,csvlog'
# stderr isn't important for this purpose, but I keep it set to capture crash messages, too

You should set these to something that works well at your site:

log_rotation_age            = '2min'
log_rotation_size           = '32MB'

I would normally set these, and I don't see any reason why you wouldn't set
them too:

log_checkpoints             = on
log_lock_waits              = on
log_temp_files              = on
log_min_error_statement     = notice
log_temp_files              = 0
log_min_duration_statement  = '9sec'
log_autovacuum_min_duration = '999sec'

And I would set these too but maybe you'd prefer to do something else:

log_directory               = /var/log/postgresql
log_file_mode               = 0640
log_filename                = postgresql-%Y-%m-%d_%H%M%S.log

Justin


Re: dsa_allocate() faliure

От
Jakub Glapa
Дата:
> Do you have query logging enabled ?  If not, could you consider it on at least
one of those servers ?  I'm interested to know what ELSE is running at the time
that query failed. 

Ok, I have configured that and will enable in the time window when the errors usually occur. I'll report as soon as I have something.


--
regards,
pozdrawiam,
Jakub Glapa


On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Moving to -hackers, hopefully it doesn't confuse the list scripts too much.

On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> I see the error showing up every night on 2 different servers. But it's a
> bit of a heisenbug because If I go there now it won't be reproducible.

Do you have query logging enabled ?  If not, could you consider it on at least
one of those servers ?  I'm interested to know what ELSE is running at the time
that query failed. 

Perhaps you could enable query logging JUST for the interval of time that the
server usually errors ?  The CSV logs can be imported to postgres for analysis.
You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN .. AND .. GROUP BY 1 ORDER BY 2;
And just maybe there'd be a query there that only runs once per day which would
allow reproducing the error at will.  Or utility command like vacuum..

I think ideally you'd set:

log_statement                = all
log_min_messages             = info
log_destination              = 'stderr,csvlog'
# stderr isn't important for this purpose, but I keep it set to capture crash messages, too

You should set these to something that works well at your site:

log_rotation_age            = '2min'
log_rotation_size           = '32MB'

I would normally set these, and I don't see any reason why you wouldn't set
them too:

log_checkpoints             = on
log_lock_waits              = on
log_temp_files              = on
log_min_error_statement     = notice
log_temp_files              = 0
log_min_duration_statement  = '9sec'
log_autovacuum_min_duration = '999sec'

And I would set these too but maybe you'd prefer to do something else:

log_directory               = /var/log/postgresql
log_file_mode               = 0640
log_filename                = postgresql-%Y-%m-%d_%H%M%S.log

Justin

Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Thu, Feb 7, 2019 at 9:10 PM Jakub Glapa <jakub.glapa@gmail.com> wrote:
> > Do you have query logging enabled ?  If not, could you consider it on at least
> one of those servers ?  I'm interested to know what ELSE is running at the time
> that query failed.
>
> Ok, I have configured that and will enable in the time window when the errors usually occur. I'll report as soon as I
havesomething.
 

I don't have the answer yet but I have some progress: I finally
reproduced the "could not find %d free pages" error by running lots of
concurrent parallel queries.  Will investigate.

Set up:

create table foo (p int, a int, b int) partition by list (p);
create table foo_1 partition of foo for values in (1);
create table foo_2 partition of foo for values in (2);
create table foo_3 partition of foo for values in (3);
alter table foo_1 set (parallel_workers = 4);
alter table foo_2 set (parallel_workers = 4);
alter table foo_3 set (parallel_workers = 4);
insert into foo
select generate_series(1, 10000000)::int % 3 + 1,
       generate_series(1, 10000000)::int % 50,
       generate_series(1, 10000000)::int % 50;
create index on foo_1(a);
create index on foo_2(a);
create index on foo_3(a);
create index on foo_1(b);
create index on foo_2(b);
create index on foo_3(b);
analyze;

Then I ran three copies of :

#!/bin/sh
(
  echo "set max_parallel_workers_per_gather = 4;"
  for I in `seq 1 100000`; do
    echo "explain analyze select count(*) from foo where a between 5
and 6 or b between 5 and 6;"
  done
) | psql postgres

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Fri, Feb 8, 2019 at 4:49 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I don't have the answer yet but I have some progress: I finally
> reproduced the "could not find %d free pages" error by running lots of
> concurrent parallel queries.  Will investigate.

Sometimes FreeManagerPutInternal() returns a
number-of-contiguous-pages-created-by-this-insertion that is too large
by one.  If this happens to be a new max-number-of-contiguous-pages,
it causes trouble some arbitrary time later because the max is wrong
and this FPM cannot satisfy a request that large, and it may not be
recomputed for some time because the incorrect value prevents
recomputation.  Not sure yet if this is due to the lazy computation
logic or a plain old fence-post error in the btree consolidation code
or something else.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

От
Robert Haas
Дата:
On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Sometimes FreeManagerPutInternal() returns a
> number-of-contiguous-pages-created-by-this-insertion that is too large
> by one.  If this happens to be a new max-number-of-contiguous-pages,
> it causes trouble some arbitrary time later because the max is wrong
> and this FPM cannot satisfy a request that large, and it may not be
> recomputed for some time because the incorrect value prevents
> recomputation.  Not sure yet if this is due to the lazy computation
> logic or a plain old fence-post error in the btree consolidation code
> or something else.

I spent a long time thinking about this and starting at code this
afternoon, but I didn't really come up with much of anything useful.
It seems like a strange failure mode, because
FreePageManagerPutInternal() normally just  returns its third argument
unmodified. The only cases where anything else happens are the ones
where we're able to consolidate the returned span with a preceding or
following span, and I'm scratching my head as to how that logic could
be wrong, especially since it also has some Assert() statements that
seem like they would detect the kinds of inconsistencies that would
lead to trouble.  For example, if we somehow ended up with two spans
that (improperly) overlapped, we'd trip an Assert().  And if that
didn't happen -- because we're not in an Assert-enabled build -- the
code is written so that it only relies on the npages value of the last
of the consolidated scans, so an error in the npages value of one of
the earlier spans would just get fixed up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Sat, Feb 9, 2019 at 9:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > Sometimes FreeManagerPutInternal() returns a
> > number-of-contiguous-pages-created-by-this-insertion that is too large
> > by one. [...]
>
> I spent a long time thinking about this and starting at code this
> afternoon, but I didn't really come up with much of anything useful.
> It seems like a strange failure mode, because
> FreePageManagerPutInternal() normally just  returns its third argument
> unmodified. [...]

Bleugh.  Yeah.  What I said before wasn't quite right.  The value
returned by FreePageManagerPutInternal() is actually correct at the
moment it is returned, but it ceases to be correct immediately
afterwards if the following call to FreePageBtreeCleanup() happens to
reduce the size of that particular span.  The problem is that we
clobber fpm->contiguous_pages with the earlier (and by now incorrect)
value that we were holding in a local variable.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Sun, Feb 10, 2019 at 7:24 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Feb 9, 2019 at 9:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
> > <thomas.munro@enterprisedb.com> wrote:
> > > Sometimes FreeManagerPutInternal() returns a
> > > number-of-contiguous-pages-created-by-this-insertion that is too large
> > > by one. [...]
> >
> > I spent a long time thinking about this and starting at code this
> > afternoon, but I didn't really come up with much of anything useful.
> > It seems like a strange failure mode, because
> > FreePageManagerPutInternal() normally just  returns its third argument
> > unmodified. [...]
>
> Bleugh.  Yeah.  What I said before wasn't quite right.  The value
> returned by FreePageManagerPutInternal() is actually correct at the
> moment it is returned, but it ceases to be correct immediately
> afterwards if the following call to FreePageBtreeCleanup() happens to
> reduce the size of that particular span.

... but why would it do that?  I can reproduce cases where (for
example) FreePageManagerPutInternal() returns 179, and then
FreePageManagerLargestContiguous() returns 179, but then after
FreePageBtreeCleanup() it returns 178.  At that point FreePageDump()
says:

    btree depth 1:
      77@0 l: 27(1) 78(178)
    freelists:
      1: 27
      129: 78(178)

But at first glance it shouldn't be allocating pages, because it just
does consolidation to try to convert to singleton format, and then it
does recycle list cleanup using soft=true so that no allocation of
btree pages should occur.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

От
Robert Haas
Дата:
On Sun, Feb 10, 2019 at 1:55 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Bleugh.  Yeah.  What I said before wasn't quite right.  The value
> returned by FreePageManagerPutInternal() is actually correct at the
> moment it is returned, but it ceases to be correct immediately
> afterwards if the following call to FreePageBtreeCleanup() happens to
> reduce the size of that particular span.  The problem is that we
> clobber fpm->contiguous_pages with the earlier (and by now incorrect)
> value that we were holding in a local variable.

Yeah, I had similar bugs to that during the initial development work I
did on freepage.c, and that's why I got rid of some lazy recomputation
thing that I had tried at some point.  The version that got committed
brought that back again, but possibly it's got the same kind of
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: dsa_allocate() faliure

От
Robert Haas
Дата:
On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> ... but why would it do that?  I can reproduce cases where (for
> example) FreePageManagerPutInternal() returns 179, and then
> FreePageManagerLargestContiguous() returns 179, but then after
> FreePageBtreeCleanup() it returns 178.  At that point FreePageDump()
> says:
>
>     btree depth 1:
>       77@0 l: 27(1) 78(178)
>     freelists:
>       1: 27
>       129: 78(178)
>
> But at first glance it shouldn't be allocating pages, because it just
> does consolidation to try to convert to singleton format, and then it
> does recycle list cleanup using soft=true so that no allocation of
> btree pages should occur.

I think I see what's happening.  At the moment the problem occurs,
there is no btree - there is only a singleton range.  So
FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
then ends up in the section with the comment  /* Not contiguous; we
need to initialize the btree. */.  And that section, sadly, does not
respect the 'soft' flag, so kaboom.  Something like the attached might
fix it.

Boy, I love FreePageManagerDump!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: dsa_allocate() faliure

От
Justin Pryzby
Дата:
On Sun, Feb 10, 2019 at 12:10:52PM +0530, Robert Haas wrote:
> I think I see what's happening.  At the moment the problem occurs,
> there is no btree - there is only a singleton range.  So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment  /* Not contiguous; we
> need to initialize the btree. */.  And that section, sadly, does not
> respect the 'soft' flag, so kaboom.  Something like the attached might
> fix it.

I ran overnight with this patch, but all parallel processes ended up stuck in
the style of bug#15585.  So that's either not the root cause, or there's a 2nd
issue.

https://www.postgresql.org/message-id/flat/15585-324ff6a93a18da46%40postgresql.org

Justin


Re: dsa_allocate() faliure

От
Sergei Kornilov
Дата:
Hi

> I ran overnight with this patch, but all parallel processes ended up stuck in
> the style of bug#15585. So that's either not the root cause, or there's a 2nd
> issue.

Maybe i missed something in this discussion, but you can reproduce bug#15585? How? With this testcase:
https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com?
 

regards, Sergei


Re: dsa_allocate() faliure

От
Justin Pryzby
Дата:
On Sun, Feb 10, 2019 at 07:11:22PM +0300, Sergei Kornilov wrote:
> > I ran overnight with this patch, but all parallel processes ended up stuck in
> > the style of bug#15585. So that's either not the root cause, or there's a 2nd
> > issue.
> 
> Maybe i missed something in this discussion, but you can reproduce bug#15585? How? With this testcase:
https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com?
 

By running the queued_alters query multiple times in a loop:
https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com

I'm able to trigger dsa "ERROR"s with that query on a newly initdb cluster with
only that table.  But I think some servers are more likely to hit it than
others.

I've only tripped on 15585 twice, and only while trying to trigger other DSA
bugs (the working hypothesis is that bug is 2ndary issue which happens after
hitting some other bug).  And not consistently or quickly.

Justin


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Sun, Feb 10, 2019 at 5:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > But at first glance it shouldn't be allocating pages, because it just
> > does consolidation to try to convert to singleton format, and then it
> > does recycle list cleanup using soft=true so that no allocation of
> > btree pages should occur.
>
> I think I see what's happening.  At the moment the problem occurs,
> there is no btree - there is only a singleton range.  So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment  /* Not contiguous; we
> need to initialize the btree. */.  And that section, sadly, does not
> respect the 'soft' flag, so kaboom.  Something like the attached might
> fix it.

Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
I couldn't reproduce the problem after 90 minutes with a test case
that otherwise hits it within a couple of minutes.

Here's a patch with a commit message explaining the change.

It also removes an obsolete comment, which is in fact related.  The
comment refers to an output parameter internal_pages_used, which must
have been used to report this exact phenomenon in an earlier
development version.  But there is no such parameter in the committed
version, and instead there is the soft flag to prevent internal
allocation.  I have no view on which approach is best, but yeah, if
we're using a soft flag, it has to work reliably.

This brings us to a difficult choice: we're about to cut a new
release, and this could in theory be included.  Even though the fix is
quite convincing, it doesn't seem wise to change such complicated code
at the last minute, and I know from an off-list chat that that is also
Robert's view.  So I'll wait until after the release, and we'll have
to live with the bug for another 3 months.

Note that this patch addresses the error "dsa_allocate could not find
%zu free pages".  (The error "dsa_area could not attach to segment" is
something else and apparently rarer.)

> Boy, I love FreePageManagerDump!

Yeah.  And I love reproducible bugs.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: dsa_allocate() faliure

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> This brings us to a difficult choice: we're about to cut a new
> release, and this could in theory be included.  Even though the fix is
> quite convincing, it doesn't seem wise to change such complicated code
> at the last minute, and I know from an off-list chat that that is also
> Robert's view.

Yeah ... at this point we're just too close to the release deadline,
I'm afraid, even though the fix *looks* pretty safe.  Not worth the risk
given that this seems to be a low-probability bug.

I observe from

https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html

that the edge cases in this function aren't too well exercised by
our regression tests, meaning that the buildfarm might not prove
much either way about the correctness of this patch.  That is one
factor pushing me to think we shouldn't risk it.  But, taking a
longer view, is that something that's practical to improve?

> So I'll wait until after the release, and we'll have
> to live with the bug for another 3 months.

Check.  Please hold off committing until you see the release tags
appear, probably late Tuesday my time / Wednesday noonish yours.

            regards, tom lane


Re: dsa_allocate() faliure

От
Justin Pryzby
Дата:
On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote:
> Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
> I couldn't reproduce the problem after 90 minutes with a test case
> that otherwise hits it within a couple of minutes.
...
> Note that this patch addresses the error "dsa_allocate could not find
> %zu free pages".  (The error "dsa_area could not attach to segment" is
> something else and apparently rarer.)

"could not attach" is the error reported early this morning while
stress-testing this patch with queued_alters queries in loops, so that's
consistent with your understanding.  And I guess it preceded getting stuck on
lock; although I don't how long between the first happened and the second, I'm
guess not long and perhaps immedidately; since the rest of the processes were
all stuck as in bug#15585 rather than ERRORing once every few minutes.

I mentioned that "could not attach to segment" occurs in leader either/or
parallel worker.  And most of the time causes an ERROR only, and doesn't wedge
all future parallel workers.  Maybe bug#15585 "wedged" state maybe only occurs
after some pattern of leader+worker failures (?)  I've just triggered bug#15585
again, but if there's a pattern, I don't see it.

Please let me know whether you're able to reproduce the "not attach" bug using
simultaneous loops around the queued_alters query; it's easy here.

Justin


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Mon, Feb 11, 2019 at 11:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote:
> > Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
> > I couldn't reproduce the problem after 90 minutes with a test case
> > that otherwise hits it within a couple of minutes.
> ...
> > Note that this patch addresses the error "dsa_allocate could not find
> > %zu free pages".  (The error "dsa_area could not attach to segment" is
> > something else and apparently rarer.)
>
> "could not attach" is the error reported early this morning while
> stress-testing this patch with queued_alters queries in loops, so that's
> consistent with your understanding.  And I guess it preceded getting stuck on
> lock; although I don't how long between the first happened and the second, I'm
> guess not long and perhaps immedidately; since the rest of the processes were
> all stuck as in bug#15585 rather than ERRORing once every few minutes.
>
> I mentioned that "could not attach to segment" occurs in leader either/or
> parallel worker.  And most of the time causes an ERROR only, and doesn't wedge
> all future parallel workers.  Maybe bug#15585 "wedged" state maybe only occurs
> after some pattern of leader+worker failures (?)  I've just triggered bug#15585
> again, but if there's a pattern, I don't see it.
>
> Please let me know whether you're able to reproduce the "not attach" bug using
> simultaneous loops around the queued_alters query; it's easy here.

I haven't ever managed to reproduce that one yet.  It's great you have
a reliable repro...  Let's discuss it on the #15585 thread.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I observe from
>
> https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html
>
> that the edge cases in this function aren't too well exercised by
> our regression tests, meaning that the buildfarm might not prove
> much either way about the correctness of this patch.  That is one
> factor pushing me to think we shouldn't risk it.  But, taking a
> longer view, is that something that's practical to improve?

Yeah.  This is a nice example of code that really deserves unit tests
written in C.  Could be good motivation to built the infrastructure I
mentioned here:

https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Mon, Feb 11, 2019 at 10:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I observe from
>> https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html
>> that the edge cases in this function aren't too well exercised by
>> our regression tests, meaning that the buildfarm might not prove
>> much either way about the correctness of this patch.  That is one
>> factor pushing me to think we shouldn't risk it.  But, taking a
>> longer view, is that something that's practical to improve?

> Yeah.  This is a nice example of code that really deserves unit tests
> written in C.  Could be good motivation to built the infrastructure I
> mentioned here:
> https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com

Meh.  I think if you hold out for that, you're going to be waiting a
long time.  I was thinking more along the lines of making a test API
in src/test/modules/, akin to what we've got for predtest or rbtree.

            regards, tom lane


Re: dsa_allocate() faliure

От
Thomas Munro
Дата:
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > So I'll wait until after the release, and we'll have
> > to live with the bug for another 3 months.
>
> Check.  Please hold off committing until you see the release tags
> appear, probably late Tuesday my time / Wednesday noonish yours.

Pushed.

-- 
Thomas Munro
http://www.enterprisedb.com