Обсуждение: Curious run-away index build on upgrade to 8.1.3

Поиск
Список
Период
Сортировка

Curious run-away index build on upgrade to 8.1.3

От
Jerry Sievers
Дата:
Hello list; We stumbled on a maintenance_work_mem related problem with
index builds.

Recently imported a good sized DB into a new 8.1.3 install on platform
SunOS 5.9.  The DB consists of some 400 tables and about twice as many
indexes.

The Upgrade went very smoothly until we hit an index build over a
table with approx 1.5 million records.  The index field was type
bigint.  We have a great many other tables of similar composition
which had no problem on index creation.

What happened was that for a couple minutes the CPU load would
steadily increase and disk activity decrease at the same time.  Before
long, one CPU is 100% busy and we let this continue for 2 hours, a
100x longer than this index usually takes to build.  Disk IO dropped
to nothing and remained so.

Worse is that the backend that was spinning would not respond to a
cancel nor SIGTERM.  Stopping this activity required a -m immediate
shutdown of Postgres.

maintenance_work_mem parameter was traced down to be the problem.
It's initial setting was the same that we've been using on 8.0 with
apparent success.

And it would seem as if the maintenance_work_mem value could be set a
lot higher.

Nonetheless, I found the tipping point to be somewhere between 32 and
45k of the parameter setting.

A bit of info on the table;

1.5 million records, 1650 distinct values in index column,

Here's a mod 50 walk through the distribution of how many records
match each of the index field values;

select b from foo where a % 50 = 0 order by b;
  b
------
    1
    2
    2
    4
    6
   10
   15
   21
   30
   37
   44
   56
   68
   84
   99
  117
  149
  190
  228
  261
  298
  340
  423
  486
  593
  717
  853
 1077
 1302
 1596
 2470
 3385
 5739
(33 rows)

Any comments on this and/or request for additional diagnostics
welcome.

If it would be of interest to someone that I truss one of the spinning
processes, I can redo this in an R&D setting and submit the results.

Thank you!

--
-------------------------------------------------------------------------------
Jerry Sievers   305 854-3001 (home)     WWW ECommerce Consultant
                305 321-1144 (mobile    http://www.JerrySievers.com/

Re: Curious run-away index build on upgrade to 8.1.3

От
Tom Lane
Дата:
Jerry Sievers <jerry@jerrysievers.com> writes:
> What happened was that for a couple minutes the CPU load would
> steadily increase and disk activity decrease at the same time.  Before
> long, one CPU is 100% busy and we let this continue for 2 hours, a
> 100x longer than this index usually takes to build.  Disk IO dropped
> to nothing and remained so.

Hm, we were just doing some work in this area a week or two ago, and
8.2 should be materially faster than current releases ... but offhand
I don't know why 8.1 would be any worse than earlier versions.  Looking
in the CVS history shows that the sort logic didn't change at all
between 8.0 and 8.1.  Are you sure index build on this index really
performs differently than it did in 8.0?  What platform is this on?

> Worse is that the backend that was spinning would not respond to a
> cancel nor SIGTERM.  Stopping this activity required a -m immediate
> shutdown of Postgres.

Yeah, we also noticed last week that some of the major loops in btree
index build were free of any CHECK_FOR_INTERRUPTS calls :-(.  This is
fixed for 8.1.4.

> If it would be of interest to someone that I truss one of the spinning
> processes, I can redo this in an R&D setting and submit the results.

truss probably wouldn't show anything interesting; a gprof or oprofile
profile could be useful though.

            regards, tom lane

Re: Curious run-away index build on upgrade to 8.1.3

От
Jerry Sievers
Дата:
Hi Tom, thanks for responding on this. More below;

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Jerry Sievers <jerry@jerrysievers.com> writes:
> > What happened was that for a couple minutes the CPU load would
> > steadily increase and disk activity decrease at the same time.  Before
> > long, one CPU is 100% busy and we let this continue for 2 hours, a
> > 100x longer than this index usually takes to build.  Disk IO dropped
> > to nothing and remained so.
>
> Hm, we were just doing some work in this area a week or two ago, and
> 8.2 should be materially faster than current releases ... but offhand
> I don't know why 8.1 would be any worse than earlier versions.  Looking
> in the CVS history shows that the sort logic didn't change at all
> between 8.0 and 8.1.  Are you sure index build on this index really
> performs differently than it did in 8.0?  What platform is this on?

Ok, I just did re-check this on an 8.0.3 install and got proper
results.  That is; the index builds in about 1 minute.

Platform is Solaris 2.9

FWIW, there is at least one difference in compile time options and
that is; on our 8.1.3 installs, we are now using
'enable-thread-safety' as Slony asks for this.  Not sure if this may
be related to the problem though.

These are on machines with a Gig of RAM.

On 8.1.3 the index will not build with maintenance_work_mem set to our
v8.0 setting of 64k.  It will however build with it at 32k and even
builds fine with the default value for maintenance_work_mem whcih
which is 16k for this machine.

Were the index to now build in say 10 minutes of actual work, we'd be
looking at a performance problem.

On the contrary, the failure mode is a complete run-away process doing
nothing productive.

>
> > Worse is that the backend that was spinning would not respond to a
> > cancel nor SIGTERM.  Stopping this activity required a -m immediate
> > shutdown of Postgres.
>
> Yeah, we also noticed last week that some of the major loops in btree
> index build were free of any CHECK_FOR_INTERRUPTS calls :-(.  This is
> fixed for 8.1.4.

Great!

The problem we saw suggests the backend being in an endless loop of
some kind.

> > If it would be of interest to someone that I truss one of the spinning
> > processes, I can redo this in an R&D setting and submit the results.
>
> truss probably wouldn't show anything interesting; a gprof or oprofile
> profile could be useful though.

Ok I did a bit of reading on gprof and note where it says the program
has to call exit in order for the profiling data file to be written.

My limited experience with the problem here is that we had to stop it
with "Sig Niner".  Tried standard cancel and also OS level SIGTERM, to
no avail.

Any idea if there was a signal that this might have responded to shy
of -9?

Great big thanks.

>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
-------------------------------------------------------------------------------
Jerry Sievers   305 854-3001 (home)     WWW ECommerce Consultant
                305 321-1144 (mobile    http://www.JerrySievers.com/

Re: Curious run-away index build on upgrade to 8.1.3

От
Tom Lane
Дата:
Jerry Sievers <jerry@jerrysievers.com> writes:
> Any idea if there was a signal that this might have responded to shy
> of -9?

SIGQUIT might work.

Also, I'd be interested to take a look if you can send me a dump of just
the index key column data.  (COPY with a column list can do that.)

            regards, tom lane

Re: Curious run-away index build on upgrade to 8.1.3

От
Tom Lane
Дата:
Jerry Sievers <jerry@jerrysievers.com> writes:
> Platform is Solaris 2.9

I think this might be the key...

I can reproduce an unreasonably long runtime if I use src/port/qsort.c
(as we do on Solaris), but not with glibc's version of qsort.  I think
you are seeing the poor-choice-of-qsort-pivots problem recently
discussed on -hackers:
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00590.php
The fact that the data contains a very large fraction of nulls
probably contributes to the problem (thanks for sending it BTW).

The reason you only see it with maintenance_work_mem >= 64M is that
below that, the problem doesn't fit into work_mem and so we don't use
qsort at all.  What I still don't understand though is why you see it
in 8.1 and not 8.0 ... the src/port/qsort.c code didn't change at all
between those versions.  Maybe 8.0 isn't taking the qsort code path,
perhaps because it uses a shade more memory or some such.  Could you
try increasing maintenance_work_mem even more, like to 100M,
and see if 8.0 gets slow?

            regards, tom lane

Re: Curious run-away index build on upgrade to 8.1.3

От
Jerry Sievers
Дата:
Hi Tom;

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Jerry Sievers <jerry@jerrysievers.com> writes:
> > Platform is Solaris 2.9
>
> I think this might be the key...
>
> I can reproduce an unreasonably long runtime if I use src/port/qsort.c
> (as we do on Solaris), but not with glibc's version of qsort.  I think
> you are seeing the poor-choice-of-qsort-pivots problem recently
> discussed on -hackers:
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg00590.php
> The fact that the data contains a very large fraction of nulls
> probably contributes to the problem (thanks for sending it BTW).

> The reason you only see it with maintenance_work_mem >= 64M is that
> below that, the problem doesn't fit into work_mem and so we don't use
> qsort at all.  What I still don't understand though is why you see it
> in 8.1 and not 8.0 ... the src/port/qsort.c code didn't change at all
> between those versions.  Maybe 8.0 isn't taking the qsort code path,
> perhaps because it uses a shade more memory or some such.  Could you
> try increasing maintenance_work_mem even more, like to 100M,
> and see if 8.0 gets slow?

I did as you suggest here; taking the MWM setting incrementally up to
nearly a Gig in 100M increments, also raised and lowered the work_mem
setting too.

Tried a dozen or more combos and in every case, the index built
quickly on 8.0.3.

Very interesting!

Thanks for all your help.

--
-------------------------------------------------------------------------------
Jerry Sievers   305 854-3001 (home)     WWW ECommerce Consultant
                305 321-1144 (mobile    http://www.JerrySievers.com/

Re: Curious run-away index build on upgrade to 8.1.3

От
Tom Lane
Дата:
Jerry Sievers <jerry@jerrysievers.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> What I still don't understand though is why you see it
>> in 8.1 and not 8.0 ... the src/port/qsort.c code didn't change at all
>> between those versions.  Maybe 8.0 isn't taking the qsort code path,
>> perhaps because it uses a shade more memory or some such.  Could you
>> try increasing maintenance_work_mem even more, like to 100M,
>> and see if 8.0 gets slow?

> I did as you suggest here; taking the MWM setting incrementally up to
> nearly a Gig in 100M increments, also raised and lowered the work_mem
> setting too.
> Tried a dozen or more combos and in every case, the index built
> quickly on 8.0.3.

On further analysis, it seems the problem is dependent on the exact
ordering of the inputs to the qsort function.  So not only do you need
maintenance_work_mem to be large enough that the code will try to use
qsort, but the physical order of the rows in the table matters.
I suspect that you are testing on an 8.0 table with a different physical
row order --- if you drop the table and reload it from the same dump you
loaded into 8.1, does it get slow?

            regards, tom lane

Re: Curious run-away index build on upgrade to 8.1.3

От
Jerry Sievers
Дата:
Well, I'll be dipped!  More below;

Tom Lane <tgl@sss.pgh.pa.us> writes:

> On further analysis, it seems the problem is dependent on the exact
> ordering of the inputs to the qsort function.  So not only do you need
> maintenance_work_mem to be large enough that the code will try to use
> qsort, but the physical order of the rows in the table matters.
> I suspect that you are testing on an 8.0 table with a different physical
> row order --- if you drop the table and reload it from the same dump you
> loaded into 8.1, does it get slow?

Tom, your theory is correct.  If we had to build that index from
scratch on 8.0, with the data ordered as it was when I made the
original dump, it would have stalled all the same.

I just reproduced this on an 8.0.3 box.

FYI

--
-------------------------------------------------------------------------------
Jerry Sievers   305 854-3001 (home)     WWW ECommerce Consultant
                305 321-1144 (mobile    http://www.JerrySievers.com/