Обсуждение: Seg fault when processing large SPI cursor (PG9.13)

Поиск
Список
Период
Сортировка

Seg fault when processing large SPI cursor (PG9.13)

От
"Fields, Zachary J. (MU-Student)"
Дата:
I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if there are any known bugs regarding large cursor fetches, or if I am to blame.

My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block after processing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23 million+ records several times and everything appears to work great.

I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways up and down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block sizes, anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the same sweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).

To me, it appears to be an obvious memory leak, the question is who caused it. I would typically assume I am to blame (and I may be), but the code is so simple (shown below) that I can't see how it could be me - unless I am misusing pg-sql (which is totally possible).

Here is the code segment that is crashing...

<code>
    // Cursor variables
    const char *cursor_name = NULL;  // Postgres will self-assign a name
    const int arg_count = 0;  // No arguments will be passed
    Oid *arg_types = NULL;  // n/a
    Datum *arg_values = NULL;  // n/a
    const char *null_args = NULL;  // n/a
    bool read_only = true;  // read_only allows for optimization
    const int cursor_opts = CURSOR_OPT_NO_SCROLL;  // default cursor options
    bool forward = true;
    //const long fetch_count = FETCH_ALL;
    //const long fetch_count = 1048576;  // 2^20 - last processed = 48,234,496
    //const long fetch_count = 524288;  // 2^19 - last processed = 48,758,784
    //const long fetch_count = 262144;  // 2^18 - last processed = 48,758,784
    const long fetch_count = 131072;  // 2^17 - last processed = 48,889,856
    //const long fetch_count = 65536;  // 2^16 - last processed = 48,889,856
    //const long fetch_count = 32768;  // 2^15 - last processed = 48,857,088
    //const long fetch_count = 16384;  // 2^14 - last processed = 48,791,552
    //const long fetch_count = 8192;  // 2^13 - last processed = 48,660,480
    //const long fetch_count = 4096;  // 2^12 - last processed = 48,398,336
    //const long fetch_count = 2048;  // 2^11
    //const long fetch_count = 1024;  // 2^10
    //const long fetch_count = 512;  // 2^9
    //const long fetch_count = 256;  // 2^8
    //const long fetch_count = 128;  // 2^7
    //const long fetch_count = 64;  // 2^6
    //const long fetch_count = 32;  // 2^5
    //const long fetch_count = 16;  // 2^4
    //const long fetch_count = 8;  // 2^3
    //const long fetch_count = 4;  // 2^2
    //const long fetch_count = 2;  // 2^1
    //const long fetch_count = 1;  // 2^0

    unsigned int i, j, end, stored;
    unsigned int result_counter = 0;
    float8 l1_norm;
    bool is_null = true;
    bool nulls[4];
    Datum result_tuple_datum[4];
    HeapTuple new_tuple;
    MemoryContext function_context;

    ResultCandidate *candidate, **candidates, *top, *free_candidate = NULL;
    KSieve<ResultCandidate *> sieve(result_cnt_);

      /*********************/
     /** Init SPI_cursor **/
    /*********************/

    // Connect to SPI
    if ( SPI_connect() != SPI_OK_CONNECT ) { return; }

    // Prepare and open SPI cursor
    Portal signature_cursor = SPI_cursor_open_with_args(cursor_name, sql_stmt_, arg_count, arg_types, arg_values, null_args, read_only, cursor_opts);

    do {
        // Fetch rows for processing (Populates SPI_processed and SPI_tuptable)
        SPI_cursor_fetch(signature_cursor, forward, fetch_count);

          /************************/
         /** Process SPI_cursor **/
        /************************/

        // Iterate cursor and perform calculations
        for (i = 0 ; i < SPI_processed ; ++i) {
            // Transfer columns to work array
            for ( j = 1 ; j < 4 ; ++j ) {
                result_tuple_datum[j-1] = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, j, &is_null);
                nulls[j-1] = is_null;
            }

            // Special Handling for final column
            Datum raw_double_array = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4, &is_null);
            nulls[3] = is_null;
            if ( is_null ) {
                l1_norm = FLT_MAX;
                result_tuple_datum[3] = PointerGetDatum(NULL);
            } else {
                // Transform binary into double array
                ArrayType *pg_double_array = DatumGetArrayTypeP(raw_double_array);
                l1_norm = meanAbsoluteError(signature_, (double *)ARR_DATA_PTR(pg_double_array), (ARR_DIMS(pg_double_array))[0], 0);
                result_tuple_datum[3] = Float8GetDatum(l1_norm);
            }

            // Create and test candidate
            if ( free_candidate ) {
                candidate = free_candidate;
                free_candidate = NULL;
            } else {
                candidate = (ResultCandidate *)palloc(sizeof(ResultCandidate));
            }
            (*candidate).lat = DatumGetFloat8(result_tuple_datum[0]);
            (*candidate).null_lat = nulls[0];
            (*candidate).lon = DatumGetFloat8(result_tuple_datum[1]);
            (*candidate).null_lon = nulls[1];
            (*candidate).orientation = DatumGetFloat8(result_tuple_datum[2]);
            (*candidate).null_orientation = nulls[2];
            (*candidate).rank = l1_norm;
            (*candidate).null_rank = nulls[3];

            // Run candidate through sieve
            top = sieve.top();
            if ( !sieve.siftItem(candidate) ) {
                // Free non-filtered candidates
                free_candidate = candidate;
            } else if ( sieve.size() == result_cnt_ ) {
                // Free non-filtered candidates
                free_candidate = top;
            }
        }
        result_counter += i;
    } while ( SPI_processed );

    SPI_finish();
</code>

Is there an obvious error I'm overlooking, or is there a known bug (PG9.13) for large fetch sizes?

Thanks,
Zak

P.S. KSieve is POD encapsulating an array that has been allocated with palloc().

Re: Seg fault when processing large SPI cursor (PG9.13)

От
Tom Lane
Дата:
"Fields, Zachary J. (MU-Student)" <zjfe58@mail.missouri.edu> writes:
> I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if
thereare any known bugs regarding large cursor fetches, or if I am to blame.
 
> My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block
afterprocessing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23
million+records several times and everything appears to work great.
 

> I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways
upand down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block
sizes,anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the
samesweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).
 

> To me, it appears to be an obvious memory leak,

Well, you're leaking the SPITupleTables (you should be doing
SPI_freetuptable when done with each one), so running out of memory is
not exactly surprising.  I suspect what is happening is that an
out-of-memory error is getting thrown and recovery from that is messed
up somehow.  Have you tried getting a stack trace from the crash?

I note that you're apparently using C++.  C++ in the backend is rather
dangerous, and one of the main reasons is that C++ error handling
doesn't play nice with elog/ereport error handling.  It's possible to
make it work safely but it takes a lot of attention and extra code,
which you don't seem to have here.
        regards, tom lane



Re: Seg fault when processing large SPI cursor (PG9.13)

От
Merlin Moncure
Дата:
On Mon, Mar 4, 2013 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Fields, Zachary J. (MU-Student)" <zjfe58@mail.missouri.edu> writes:
>> I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if
thereare any known bugs regarding large cursor fetches, or if I am to blame. 
>> My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block
afterprocessing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23
million+records several times and everything appears to work great. 
>
>> I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways
upand down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block
sizes,anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the
samesweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below). 
>
>> To me, it appears to be an obvious memory leak,
>
> Well, you're leaking the SPITupleTables (you should be doing
> SPI_freetuptable when done with each one), so running out of memory is
> not exactly surprising.  I suspect what is happening is that an
> out-of-memory error is getting thrown and recovery from that is messed
> up somehow.  Have you tried getting a stack trace from the crash?
>
> I note that you're apparently using C++.  C++ in the backend is rather
> dangerous, and one of the main reasons is that C++ error handling
> doesn't play nice with elog/ereport error handling.  It's possible to
> make it work safely but it takes a lot of attention and extra code,
> which you don't seem to have here.

could be c++ is throwing exception. if you haven't already, try
disabling exception handling completely in the compiler.

merlin



Re: Seg fault when processing large SPI cursor (PG9.13)

От
Tom Lane
Дата:
"Fields, Zachary J. (MU-Student)" <zjfe58@mail.missouri.edu> writes:
> Thanks for getting back to me! I had already discovered freeing the SPI_tuptable each time, and you are correct, it
madea big difference. However, I still was only able to achieve 140+ million before it crashed.
 

> My current working implementation is to reset the "current" memory context after X number of iterations, which keeps
memoryin check. This seems like a big hammer for the job, and I'm sure it is not optimal. Speed is very important to my
application,so I would prefer to use a scalpel instead of a hatchet. If I am already freeing the SPI_tuptable created
bythe cursor, where else should I be looking for memory leaks?
 

There are a lot of places that could be leaking memory, for instance if
the array you're working with are large enough then they could be
toasted, and DatumGetArrayTypeP would involve making a working copy.
I'm not too sure that you're not ever leaking "candidate" structs,
either.

The usual theory in Postgres is that a memory context reset is cheaper,
as well as much less leak-prone, than trying to make sure you've pfree'd
each individual allocation.  So we tend to work with short-lived
contexts that can be reset at the end of each tuple cycle --- or in this
example, probably once per cursor fetch would be good.  The main problem
I'd have with what you're doing is that it's not very safe for a
function to reset the whole SPI Proc context: you might be clobbering
some storage that's still in use, eg related to the cursor you're using.
Instead create a context that's a child of that context, switch into
that to do your processing, and reset it every so often.
        regards, tom lane