Обсуждение: BUG #3752: query yields "could not find block containing chunk", then server crashes

Поиск
Список
Период
Сортировка

BUG #3752: query yields "could not find block containing chunk", then server crashes

От
"Michael Charnoky"
Дата:
The following bug has been logged online:

Bug reference:      3752
Logged by:          Michael Charnoky
Email address:      noky@nextbus.com
PostgreSQL version: 8.3beta2
Operating system:   Linux (Fedora Core 3) 2.6.17
Description:        query yields "could not find block containing chunk",
then server crashes
Details:

I installed PG8.3beta2 and created a db instance using pg_restore.  (The
dump was created using the pg8.3beta2 pg_dump util, from a db on a pg8.1
server).  Data restored with no errors, later our app encountered an sql
error while querying data in the db.  Here's the relevant log snippet:

2007-11-15 15:38:03.880 PST: ERROR:  could not find block containing chunk
0x902fb98
2007-11-15 15:38:03.880 PST: STATEMENT:  SELECT path_tag, dayset_tag,
time2secs(ts_endtime), segtimes
         FROM pathtimes where rev=(select rev from projects) ORDER BY
time2secs(ts_endtime);
2007-11-15 15:38:29.821 PST: LOG:  server process (PID 17777) was terminated
by signal 11: Segmentation fault
2007-11-15 15:38:29.821 PST: LOG:  terminating any other active server
processes
2007-11-15 15:38:29.825 PST: LOG:  all server processes terminated;
reinitializing
2007-11-15 15:38:29.887 PST: LOG:  database system was interrupted; last
known up at 2007-11-15 15:28:27 PST
2007-11-15 15:38:29.887 PST: LOG:  database system was not properly shut
down; automatic recovery in progress
2007-11-15 15:38:30.044 PST: FATAL:  the database system is in recovery
mode
2007-11-15 15:38:30.285 PST: LOG:  record with zero length at 0/7CB47A08
2007-11-15 15:38:30.286 PST: LOG:  redo is not required
2007-11-15 15:38:30.714 PST: LOG:  autovacuum launcher started
2007-11-15 15:38:30.715 PST: LOG:  database system is ready to accept
connections
*** glibc detected *** free(): invalid next size (normal): 0x09045378 ***
2007-11-15 15:38:41.463 PST: LOG:  server process (PID 17811) was terminated
by signal 6: Aborted
2007-11-15 15:38:41.463 PST: LOG:  terminating any other active server
processes
2007-11-15 15:38:41.464 PST: LOG:  all server processes terminated;
reinitializing
2007-11-15 15:38:41.516 PST: LOG:  database system was interrupted; last
known up at 2007-11-15 15:38:30 PST
2007-11-15 15:38:41.516 PST: LOG:  database system was not properly shut
down; automatic recovery in progress
2007-11-15 15:38:41.683 PST: LOG:  record with zero length at 0/7CB47A48
2007-11-15 15:38:41.683 PST: LOG:  redo is not required
2007-11-15 15:38:41.702 PST: FATAL:  the database system is in recovery
mode
2007-11-15 15:38:41.806 PST: LOG:  autovacuum launcher started
2007-11-15 15:38:41.807 PST: LOG:  database system is ready to accept
connections

Re: BUG #3752: query yields "could not find block containing chunk", then server crashes

От
Zdenek Kotala
Дата:
Do you have a core file? Can you provide stack trace output?

        Thanks


Michael Charnoky wrote:
> The following bug has been logged online:
>
> Bug reference:      3752
> Logged by:          Michael Charnoky
> Email address:      noky@nextbus.com
> PostgreSQL version: 8.3beta2
> Operating system:   Linux (Fedora Core 3) 2.6.17
> Description:        query yields "could not find block containing chunk",
> then server crashes
> Details:
>
> I installed PG8.3beta2 and created a db instance using pg_restore.  (The
> dump was created using the pg8.3beta2 pg_dump util, from a db on a pg8.1
> server).  Data restored with no errors, later our app encountered an sql
> error while querying data in the db.  Here's the relevant log snippet:
>
> 2007-11-15 15:38:03.880 PST: ERROR:  could not find block containing chunk
> 0x902fb98
> 2007-11-15 15:38:03.880 PST: STATEMENT:  SELECT path_tag, dayset_tag,
> time2secs(ts_endtime), segtimes
>          FROM pathtimes where rev=(select rev from projects) ORDER BY
> time2secs(ts_endtime);
> 2007-11-15 15:38:29.821 PST: LOG:  server process (PID 17777) was terminated
> by signal 11: Segmentation fault
> 2007-11-15 15:38:29.821 PST: LOG:  terminating any other active server
> processes
> 2007-11-15 15:38:29.825 PST: LOG:  all server processes terminated;
> reinitializing
> 2007-11-15 15:38:29.887 PST: LOG:  database system was interrupted; last
> known up at 2007-11-15 15:28:27 PST
> 2007-11-15 15:38:29.887 PST: LOG:  database system was not properly shut
> down; automatic recovery in progress
> 2007-11-15 15:38:30.044 PST: FATAL:  the database system is in recovery
> mode
> 2007-11-15 15:38:30.285 PST: LOG:  record with zero length at 0/7CB47A08
> 2007-11-15 15:38:30.286 PST: LOG:  redo is not required
> 2007-11-15 15:38:30.714 PST: LOG:  autovacuum launcher started
> 2007-11-15 15:38:30.715 PST: LOG:  database system is ready to accept
> connections
> *** glibc detected *** free(): invalid next size (normal): 0x09045378 ***
> 2007-11-15 15:38:41.463 PST: LOG:  server process (PID 17811) was terminated
> by signal 6: Aborted
> 2007-11-15 15:38:41.463 PST: LOG:  terminating any other active server
> processes
> 2007-11-15 15:38:41.464 PST: LOG:  all server processes terminated;
> reinitializing
> 2007-11-15 15:38:41.516 PST: LOG:  database system was interrupted; last
> known up at 2007-11-15 15:38:30 PST
> 2007-11-15 15:38:41.516 PST: LOG:  database system was not properly shut
> down; automatic recovery in progress
> 2007-11-15 15:38:41.683 PST: LOG:  record with zero length at 0/7CB47A48
> 2007-11-15 15:38:41.683 PST: LOG:  redo is not required
> 2007-11-15 15:38:41.702 PST: FATAL:  the database system is in recovery
> mode
> 2007-11-15 15:38:41.806 PST: LOG:  autovacuum launcher started
> 2007-11-15 15:38:41.807 PST: LOG:  database system is ready to accept
> connections
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate

Re: BUG #3752: query yields "could not find block containing chunk", then server crashes

От
Zdenek Kotala
Дата:
Michael Charnoky wrote:

<snip>
>
> 2007-11-15 15:38:03.880 PST: ERROR:  could not find block containing chunk
> 0x902fb98

This message appears in AllocSetFree or AllocSetRealloc function in
aset.c source. In both function it means that defined context does not
contain memory block. By my opinion there should be two more probable
scenarios:

1) memory block does not exist -> for AllocSetFree it means e.g. double
free or for AllocSetRealloc it means that somebody want to realloc
memory which was already freed.

2) memory is still allocated but in different context. However, palloc
and pfree should control it.


By my opinion it is double free problem, but without stack trace or
reproduction scenario it is difficult to find it.

        Zdenek

Re: BUG #3752: query yields "could not find block containing chunk", then server crashes

От
Tom Lane
Дата:
"Michael Charnoky" <noky@nextbus.com> writes:
> 2007-11-15 15:38:03.880 PST: ERROR:  could not find block containing chunk
> 0x902fb98

We can't do much about this without a self-contained test case.

> 2007-11-15 15:38:03.880 PST: STATEMENT:  SELECT path_tag, dayset_tag,
> time2secs(ts_endtime), segtimes
>          FROM pathtimes where rev=(select rev from projects) ORDER BY
> time2secs(ts_endtime);

Is this query using any complex views?  If so, I'd assume the bug is
somehow triggered by that, and try to extract a test case using the
view definition(s).

            regards, tom lane

Re: BUG #3752: query yields "could not find block containing chunk", then server crashes

От
Mike Charnoky
Дата:
Just forwarding this info along as Zdenek requested...

Turns out this problem is not a bug in pg8.3, it was a problem with our
custom data type.  I have since dropped the custom data type and am now
using standard pg float4 arrays.  Did the dump and restore, and our app
works just fine, no crash when the query is run.

BTW- PG8.3 seriously rocks!  We've got some large tables that had very
poor performance in PG8.1... things are really snappy now, HOT usage
really helps our app (as shown by the handy pg_stat_all_tables).


Mike

Zdenek Kotala wrote:
> Mike Charnoky wrote:
>> It seems this problem has to do with a custom data type we are using.  I
>> am working on eliminating this custom data type, as it is becoming too
>> much of a pain to support (it is basically float4[]).  If the problem
>> persists after the data type conversion, I will follow up.  Otherwise, I
>> think this was an error in our custom type code (maybe corruption during
>> dump/reload)
>
> Thanks for update.
>
>> Would the stack trace still be useful?  Where would I find the dump
>> file?  I didn't see anything...
>
> If you are sure, that it is in your data type implementation then
> probably not. You can find core file usually in postgres data directory
> if you have core file generation enabled by ulimit command. You can get
> stack trace by gdb.
>
>         Zdenek
>
>>
>> Mike
>>
>> Zdenek Kotala wrote:
>>> Michael Charnoky wrote:
>>>
>>> <snip>
>>>> 2007-11-15 15:38:03.880 PST: ERROR:  could not find block containing
>>>> chunk
>>>> 0x902fb98
>>> This message appears in AllocSetFree or AllocSetRealloc function in
>>> aset.c source. In both function it means that defined context does not
>>> contain memory block. By my opinion there should be two more probable
>>> scenarios:
>>>
>>> 1) memory block does not exist -> for AllocSetFree it means e.g. double
>>> free or for AllocSetRealloc it means that somebody want to realloc
>>> memory which was already freed.
>>>
>>> 2) memory is still allocated but in different context. However, palloc
>>> and pfree should control it.
>>>
>>>
>>> By my opinion it is double free problem, but without stack trace or
>>> reproduction scenario it is difficult to find it.
>>>
>>>         Zdenek
>