Обсуждение: Seg fault when processing large SPI cursor (PG9.13)
I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if there are any known bugs regarding large cursor fetches, or if I am to blame.
My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block after processing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23 million+ records several times and everything appears to work great.
I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways up and down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block sizes, anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the same sweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).
To me, it appears to be an obvious memory leak, the question is who caused it. I would typically assume I am to blame (and I may be), but the code is so simple (shown below) that I can't see how it could be me - unless I am misusing pg-sql (which is totally possible).
Here is the code segment that is crashing...
<code>
// Cursor variables
const char *cursor_name = NULL; // Postgres will self-assign a name
const int arg_count = 0; // No arguments will be passed
Oid *arg_types = NULL; // n/a
Datum *arg_values = NULL; // n/a
const char *null_args = NULL; // n/a
bool read_only = true; // read_only allows for optimization
const int cursor_opts = CURSOR_OPT_NO_SCROLL; // default cursor options
bool forward = true;
//const long fetch_count = FETCH_ALL;
//const long fetch_count = 1048576; // 2^20 - last processed = 48,234,496
//const long fetch_count = 524288; // 2^19 - last processed = 48,758,784
//const long fetch_count = 262144; // 2^18 - last processed = 48,758,784
const long fetch_count = 131072; // 2^17 - last processed = 48,889,856
//const long fetch_count = 65536; // 2^16 - last processed = 48,889,856
//const long fetch_count = 32768; // 2^15 - last processed = 48,857,088
//const long fetch_count = 16384; // 2^14 - last processed = 48,791,552
//const long fetch_count = 8192; // 2^13 - last processed = 48,660,480
//const long fetch_count = 4096; // 2^12 - last processed = 48,398,336
//const long fetch_count = 2048; // 2^11
//const long fetch_count = 1024; // 2^10
//const long fetch_count = 512; // 2^9
//const long fetch_count = 256; // 2^8
//const long fetch_count = 128; // 2^7
//const long fetch_count = 64; // 2^6
//const long fetch_count = 32; // 2^5
//const long fetch_count = 16; // 2^4
//const long fetch_count = 8; // 2^3
//const long fetch_count = 4; // 2^2
//const long fetch_count = 2; // 2^1
//const long fetch_count = 1; // 2^0
unsigned int i, j, end, stored;
unsigned int result_counter = 0;
float8 l1_norm;
bool is_null = true;
bool nulls[4];
Datum result_tuple_datum[4];
HeapTuple new_tuple;
MemoryContext function_context;
ResultCandidate *candidate, **candidates, *top, *free_candidate = NULL;
KSieve<ResultCandidate *> sieve(result_cnt_);
/*********************/
/** Init SPI_cursor **/
/*********************/
// Connect to SPI
if ( SPI_connect() != SPI_OK_CONNECT ) { return; }
// Prepare and open SPI cursor
Portal signature_cursor = SPI_cursor_open_with_args(cursor_name, sql_stmt_, arg_count, arg_types, arg_values, null_args, read_only, cursor_opts);
do {
// Fetch rows for processing (Populates SPI_processed and SPI_tuptable)
SPI_cursor_fetch(signature_cursor, forward, fetch_count);
/************************/
/** Process SPI_cursor **/
/************************/
// Iterate cursor and perform calculations
for (i = 0 ; i < SPI_processed ; ++i) {
// Transfer columns to work array
for ( j = 1 ; j < 4 ; ++j ) {
result_tuple_datum[j-1] = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, j, &is_null);
nulls[j-1] = is_null;
}
// Special Handling for final column
Datum raw_double_array = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4, &is_null);
nulls[3] = is_null;
if ( is_null ) {
l1_norm = FLT_MAX;
result_tuple_datum[3] = PointerGetDatum(NULL);
} else {
// Transform binary into double array
ArrayType *pg_double_array = DatumGetArrayTypeP(raw_double_array);
l1_norm = meanAbsoluteError(signature_, (double *)ARR_DATA_PTR(pg_double_array), (ARR_DIMS(pg_double_array))[0], 0);
result_tuple_datum[3] = Float8GetDatum(l1_norm);
}
// Create and test candidate
if ( free_candidate ) {
candidate = free_candidate;
free_candidate = NULL;
} else {
candidate = (ResultCandidate *)palloc(sizeof(ResultCandidate));
}
(*candidate).lat = DatumGetFloat8(result_tuple_datum[0]);
(*candidate).null_lat = nulls[0];
(*candidate).lon = DatumGetFloat8(result_tuple_datum[1]);
(*candidate).null_lon = nulls[1];
(*candidate).orientation = DatumGetFloat8(result_tuple_datum[2]);
(*candidate).null_orientation = nulls[2];
(*candidate).rank = l1_norm;
(*candidate).null_rank = nulls[3];
// Run candidate through sieve
top = sieve.top();
if ( !sieve.siftItem(candidate) ) {
// Free non-filtered candidates
free_candidate = candidate;
} else if ( sieve.size() == result_cnt_ ) {
// Free non-filtered candidates
free_candidate = top;
}
}
result_counter += i;
} while ( SPI_processed );
SPI_finish();
</code>
Is there an obvious error I'm overlooking, or is there a known bug (PG9.13) for large fetch sizes?
Thanks,
Zak
P.S. KSieve is POD encapsulating an array that has been allocated with palloc().
My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block after processing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23 million+ records several times and everything appears to work great.
I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways up and down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block sizes, anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the same sweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).
To me, it appears to be an obvious memory leak, the question is who caused it. I would typically assume I am to blame (and I may be), but the code is so simple (shown below) that I can't see how it could be me - unless I am misusing pg-sql (which is totally possible).
Here is the code segment that is crashing...
<code>
// Cursor variables
const char *cursor_name = NULL; // Postgres will self-assign a name
const int arg_count = 0; // No arguments will be passed
Oid *arg_types = NULL; // n/a
Datum *arg_values = NULL; // n/a
const char *null_args = NULL; // n/a
bool read_only = true; // read_only allows for optimization
const int cursor_opts = CURSOR_OPT_NO_SCROLL; // default cursor options
bool forward = true;
//const long fetch_count = FETCH_ALL;
//const long fetch_count = 1048576; // 2^20 - last processed = 48,234,496
//const long fetch_count = 524288; // 2^19 - last processed = 48,758,784
//const long fetch_count = 262144; // 2^18 - last processed = 48,758,784
const long fetch_count = 131072; // 2^17 - last processed = 48,889,856
//const long fetch_count = 65536; // 2^16 - last processed = 48,889,856
//const long fetch_count = 32768; // 2^15 - last processed = 48,857,088
//const long fetch_count = 16384; // 2^14 - last processed = 48,791,552
//const long fetch_count = 8192; // 2^13 - last processed = 48,660,480
//const long fetch_count = 4096; // 2^12 - last processed = 48,398,336
//const long fetch_count = 2048; // 2^11
//const long fetch_count = 1024; // 2^10
//const long fetch_count = 512; // 2^9
//const long fetch_count = 256; // 2^8
//const long fetch_count = 128; // 2^7
//const long fetch_count = 64; // 2^6
//const long fetch_count = 32; // 2^5
//const long fetch_count = 16; // 2^4
//const long fetch_count = 8; // 2^3
//const long fetch_count = 4; // 2^2
//const long fetch_count = 2; // 2^1
//const long fetch_count = 1; // 2^0
unsigned int i, j, end, stored;
unsigned int result_counter = 0;
float8 l1_norm;
bool is_null = true;
bool nulls[4];
Datum result_tuple_datum[4];
HeapTuple new_tuple;
MemoryContext function_context;
ResultCandidate *candidate, **candidates, *top, *free_candidate = NULL;
KSieve<ResultCandidate *> sieve(result_cnt_);
/*********************/
/** Init SPI_cursor **/
/*********************/
// Connect to SPI
if ( SPI_connect() != SPI_OK_CONNECT ) { return; }
// Prepare and open SPI cursor
Portal signature_cursor = SPI_cursor_open_with_args(cursor_name, sql_stmt_, arg_count, arg_types, arg_values, null_args, read_only, cursor_opts);
do {
// Fetch rows for processing (Populates SPI_processed and SPI_tuptable)
SPI_cursor_fetch(signature_cursor, forward, fetch_count);
/************************/
/** Process SPI_cursor **/
/************************/
// Iterate cursor and perform calculations
for (i = 0 ; i < SPI_processed ; ++i) {
// Transfer columns to work array
for ( j = 1 ; j < 4 ; ++j ) {
result_tuple_datum[j-1] = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, j, &is_null);
nulls[j-1] = is_null;
}
// Special Handling for final column
Datum raw_double_array = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4, &is_null);
nulls[3] = is_null;
if ( is_null ) {
l1_norm = FLT_MAX;
result_tuple_datum[3] = PointerGetDatum(NULL);
} else {
// Transform binary into double array
ArrayType *pg_double_array = DatumGetArrayTypeP(raw_double_array);
l1_norm = meanAbsoluteError(signature_, (double *)ARR_DATA_PTR(pg_double_array), (ARR_DIMS(pg_double_array))[0], 0);
result_tuple_datum[3] = Float8GetDatum(l1_norm);
}
// Create and test candidate
if ( free_candidate ) {
candidate = free_candidate;
free_candidate = NULL;
} else {
candidate = (ResultCandidate *)palloc(sizeof(ResultCandidate));
}
(*candidate).lat = DatumGetFloat8(result_tuple_datum[0]);
(*candidate).null_lat = nulls[0];
(*candidate).lon = DatumGetFloat8(result_tuple_datum[1]);
(*candidate).null_lon = nulls[1];
(*candidate).orientation = DatumGetFloat8(result_tuple_datum[2]);
(*candidate).null_orientation = nulls[2];
(*candidate).rank = l1_norm;
(*candidate).null_rank = nulls[3];
// Run candidate through sieve
top = sieve.top();
if ( !sieve.siftItem(candidate) ) {
// Free non-filtered candidates
free_candidate = candidate;
} else if ( sieve.size() == result_cnt_ ) {
// Free non-filtered candidates
free_candidate = top;
}
}
result_counter += i;
} while ( SPI_processed );
SPI_finish();
</code>
Is there an obvious error I'm overlooking, or is there a known bug (PG9.13) for large fetch sizes?
Thanks,
Zak
P.S. KSieve is POD encapsulating an array that has been allocated with palloc().
"Fields, Zachary J. (MU-Student)" <zjfe58@mail.missouri.edu> writes:
> I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if
thereare any known bugs regarding large cursor fetches, or if I am to blame.
> My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block
afterprocessing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23
million+records several times and everything appears to work great.
> I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways
upand down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block
sizes,anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the
samesweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).
> To me, it appears to be an obvious memory leak,
Well, you're leaking the SPITupleTables (you should be doing
SPI_freetuptable when done with each one), so running out of memory is
not exactly surprising. I suspect what is happening is that an
out-of-memory error is getting thrown and recovery from that is messed
up somehow. Have you tried getting a stack trace from the crash?
I note that you're apparently using C++. C++ in the backend is rather
dangerous, and one of the main reasons is that C++ error handling
doesn't play nice with elog/ereport error handling. It's possible to
make it work safely but it takes a lot of attention and extra code,
which you don't seem to have here.
regards, tom lane
On Mon, Mar 4, 2013 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Fields, Zachary J. (MU-Student)" <zjfe58@mail.missouri.edu> writes: >> I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if thereare any known bugs regarding large cursor fetches, or if I am to blame. >> My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block afterprocessing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23 million+records several times and everything appears to work great. > >> I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways upand down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block sizes,anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the samesweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below). > >> To me, it appears to be an obvious memory leak, > > Well, you're leaking the SPITupleTables (you should be doing > SPI_freetuptable when done with each one), so running out of memory is > not exactly surprising. I suspect what is happening is that an > out-of-memory error is getting thrown and recovery from that is messed > up somehow. Have you tried getting a stack trace from the crash? > > I note that you're apparently using C++. C++ in the backend is rather > dangerous, and one of the main reasons is that C++ error handling > doesn't play nice with elog/ereport error handling. It's possible to > make it work safely but it takes a lot of attention and extra code, > which you don't seem to have here. could be c++ is throwing exception. if you haven't already, try disabling exception handling completely in the compiler. merlin
"Fields, Zachary J. (MU-Student)" <zjfe58@mail.missouri.edu> writes:
> Thanks for getting back to me! I had already discovered freeing the SPI_tuptable each time, and you are correct, it
madea big difference. However, I still was only able to achieve 140+ million before it crashed.
> My current working implementation is to reset the "current" memory context after X number of iterations, which keeps
memoryin check. This seems like a big hammer for the job, and I'm sure it is not optimal. Speed is very important to my
application,so I would prefer to use a scalpel instead of a hatchet. If I am already freeing the SPI_tuptable created
bythe cursor, where else should I be looking for memory leaks?
There are a lot of places that could be leaking memory, for instance if
the array you're working with are large enough then they could be
toasted, and DatumGetArrayTypeP would involve making a working copy.
I'm not too sure that you're not ever leaking "candidate" structs,
either.
The usual theory in Postgres is that a memory context reset is cheaper,
as well as much less leak-prone, than trying to make sure you've pfree'd
each individual allocation. So we tend to work with short-lived
contexts that can be reset at the end of each tuple cycle --- or in this
example, probably once per cursor fetch would be good. The main problem
I'd have with what you're doing is that it's not very safe for a
function to reset the whole SPI Proc context: you might be clobbering
some storage that's still in use, eg related to the cursor you're using.
Instead create a context that's a child of that context, switch into
that to do your processing, and reset it every so often.
regards, tom lane