Обсуждение: invalid memory alloc request size

Поиск
Список
Период
Сортировка

invalid memory alloc request size

От
Ben Chobot
Дата:
Yesterday I had a problem on a 64-bit 9.1.1 install:

# select version();
                                                    version
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
(1 row)


The logs showed this anomaly:

2011-12-25T19:33:18+00:00 pgdb2-vpc postgres[27546]: [74474-1] ERROR:  invalid memory alloc request size
18446744073709551613
2011-12-25T19:33:18+00:00 pgdb2-vpc postgres[27546]: [74474-2] STATEMENT:  SELECT * FROM "asset_user_accesses" WHERE
("asset_user_accesses"."asset_code"= 'assignments:course_141208' AND "asset_user_accesses"."user_id" = 618503) LIMIT 1; 


Googling around, it sounds like this is often due to table corruption, which would be unfortunate, but usually seems to
berepeatable. I can re-run that query without issue, and in fact can select * from the entire table without issue. I do
seethe row was updated a few minutes after this error, so is it wishful thinking that vacuum came around and
successfullyremoved the old, corrupted row version? 

Re: invalid memory alloc request size

От
Ben Chobot
Дата:
On Dec 26, 2011, at 8:08 AM, Ben Chobot wrote:

> Yesterday I had a problem on a 64-bit 9.1.1 install:
>
> # select version();
>                                                    version
> ----------------------------------------------------------------------------------------------------------------
> PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
> (1 row)
>
>
> The logs showed this anomaly:
>
> 2011-12-25T19:33:18+00:00 pgdb2-vpc postgres[27546]: [74474-1] ERROR:  invalid memory alloc request size
18446744073709551613
> 2011-12-25T19:33:18+00:00 pgdb2-vpc postgres[27546]: [74474-2] STATEMENT:  SELECT * FROM "asset_user_accesses" WHERE
("asset_user_accesses"."asset_code"= 'assignments:course_141208' AND "asset_user_accesses"."user_id" = 618503) LIMIT 1; 
>
>
> Googling around, it sounds like this is often due to table corruption, which would be unfortunate, but usually seems
tobe repeatable. I can re-run that query without issue, and in fact can select * from the entire table without issue. I
dosee the row was updated a few minutes after this error, so is it wishful thinking that vacuum came around and
successfullyremoved the old, corrupted row version? 

It also happens that 18446744073709551613 is -3 in 64-bit 2's complement if it was unsigned. Is it possible that -3 was
someerror return code that got cast and then passed directly to malloc()? 


Re: invalid memory alloc request size

От
Tomas Vondra
Дата:
On 27.12.2011 18:34, Ben Chobot wrote:
> On Dec 26, 2011, at 8:08 AM, Ben Chobot wrote:
>
>> Yesterday I had a problem on a 64-bit 9.1.1 install:
>>
>> # select version();
>>                                                    version
>> ----------------------------------------------------------------------------------------------------------------
>> PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
>> (1 row)
>>
>>
>> The logs showed this anomaly:
>>
>> 2011-12-25T19:33:18+00:00 pgdb2-vpc postgres[27546]: [74474-1] ERROR:  invalid memory alloc request size
18446744073709551613
>> 2011-12-25T19:33:18+00:00 pgdb2-vpc postgres[27546]: [74474-2] STATEMENT:  SELECT * FROM "asset_user_accesses" WHERE
("asset_user_accesses"."asset_code"= 'assignments:course_141208' AND "asset_user_accesses"."user_id" = 618503) LIMIT 1; 
>>
>>
>> Googling around, it sounds like this is often due to table corruption, which would be unfortunate, but usually seems
tobe repeatable. I can re-run that query without issue, and in fact can select * from the entire table without issue. I
dosee the row was updated a few minutes after this error, so is it wishful thinking that vacuum came around and
successfullyremoved the old, corrupted row version? 
>
> It also happens that 18446744073709551613 is -3 in 64-bit 2's complement if it was unsigned. Is it possible that -3
wassome error return code that got cast and then passed directly to malloc()? 

That's not likely. The corruption is usually the cause, when it hits
varlena header - that's where the length info is stored. In that case
PostgreSQL suddenly thinks the varlena field has a negative value (and
malloc accepts unsigned integers).

Some time ago I've written an extension that might help you locate
where's the actual issue (which block / row / field) and Heikki did some
review about a month ago so there's a change it might work. It's
available here

  http://github.com/tvondra/pg_check

Let me know in case of any issues.

regards
Tomas

Re: invalid memory alloc request size

От
Merlin Moncure
Дата:
On Tue, Dec 27, 2011 at 4:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> Googling around, it sounds like this is often due to table corruption, which would be unfortunate, but usually
seemsto be repeatable. I can re-run that query without issue, and in fact can select * from the entire table without
issue.I do see the row was updated a few minutes after this error, so is it wishful thinking that vacuum came around
andsuccessfully removed the old, corrupted row version? 
>>
>> It also happens that 18446744073709551613 is -3 in 64-bit 2's complement if it was unsigned. Is it possible that -3
wassome error return code that got cast and then passed directly to malloc()? 
>
> That's not likely. The corruption is usually the cause, when it hits
> varlena header - that's where the length info is stored. In that case
> PostgreSQL suddenly thinks the varlena field has a negative value (and
> malloc accepts unsigned integers).

If the problem truly went away, one likely possibility is that the bad
tuple was simply deleted -- occasionally the corruption is limited to
a tuple or two but doesn't spill over into the page itself -- in such
situations some judicious deletion of rows can get you to a point
where you can pull off a dump.

merlin

Re: invalid memory alloc request size

От
Tomas Vondra
Дата:
On 27.12.2011 23:23, Merlin Moncure wrote:
> On Tue, Dec 27, 2011 at 4:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> That's not likely. The corruption is usually the cause, when it hits
>> varlena header - that's where the length info is stored. In that case
>> PostgreSQL suddenly thinks the varlena field has a negative value (and
>> malloc accepts unsigned integers).
>
> If the problem truly went away, one likely possibility is that the bad
> tuple was simply deleted -- occasionally the corruption is limited to
> a tuple or two but doesn't spill over into the page itself -- in such
> situations some judicious deletion of rows can get you to a point
> where you can pull off a dump.

Or maybe the record is not read for some other reason ... maybe the
table is accessed in a different way and the corrupted column is not
checked. Or maybe it does not match the WHERE condition or something.

I've seen cases where the table was accessed sequentially and it was
failing (as the column was checked because of the WHERE condition), and
then it switched to index scan and it did not fail anymore (because it
was not necessary to check the column anymore).

Tomas