Обсуждение: What is the posix_memalign() equivalent for the PostgreSQL?

Поиск
Список
Период
Сортировка

What is the posix_memalign() equivalent for the PostgreSQL?

От
Anderson Carniel
Дата:
Dear all,

I am developing an extension for the PostgreSQL that write/read some external files from the PostgreSQL. In order to write/read, I am using the O_DIRECT flag and using the posix_memalign to allocate memory. I would like to know if the postgresql internal library provides an equivalent function for the posix_memalign since I am getting unexpected errors. All my allocations are in the TopMemoryContext since I am working with several buffers that must be alive while the PostgreSQL Server is activated.

Thanks in advance,
Anderson

Re: What is the posix_memalign() equivalent for the PostgreSQL?

От
Craig Ringer
Дата:
On 2 September 2016 at 01:12, Anderson Carniel <accarniel@gmail.com> wrote:
> Dear all,
>
> I am developing an extension for the PostgreSQL that write/read some
> external files from the PostgreSQL. In order to write/read, I am using the
> O_DIRECT flag and using the posix_memalign to allocate memory. I would like
> to know if the postgresql internal library provides an equivalent function
> for the posix_memalign since I am getting unexpected errors.

"unexpected errors". Details please?

If you're trying to allocate aligned memory, I believe PostgreSQL
typically uses the TYPEALIGN macros (see c.h) but I'm painfully
clueless in the area, so ... yeah. Don't trust me.

I was a bit surprised not to see a MemoryContextAlloc or palloc
variant that returns memory aligned to a given boundary.

> All my
> allocations are in the TopMemoryContext since I am working with several
> buffers that must be alive while the PostgreSQL Server is activated.

You can't posix_memalign into TopMemoryContext. Such memory is outside
the memory context system, like memory directly malloc()'d.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: What is the posix_memalign() equivalent for the PostgreSQL?

От
Anderson Carniel
Дата:
The unexpected errors are problems to free the memory that was previously allocated with posix_memalign. After a number of memory allocation and free calls, the next free call crashes the system.

On the other hand, if I replace the posix_memalign and the free calls for the following malloc/free aligned implementations, it works. 


void *malloc_aligned(size_t alignment, size_t bytes) {
    void *mallocPtr;        //Initial pointer returned from malloc
    void *newMallocPtr;    //New pointer after adjustment
    void *alignedPtr;    
    size_t alignMask;        //Need this to get the aligned address
    size_t totalBytes = 0;    
 
    /* Make sure alignment is power of 2 and it is not zero
     * because zero is not power of 2 */
    if ( !(!(alignment & (alignment-1)) && alignment) )
        return NULL;
 
    /* We need to allocate extra memory to make sure the allocated 
     * memory will be aligned and need sizeof(size_t) bytes more for
     * storing the value of the bytes we padded. 
     */      
       totalBytes = bytes + alignment + sizeof(size_t);
 
    mallocPtr = palloc(totalBytes);
 
    if (NULL == mallocPtr)
        return NULL;
 
    newMallocPtr = (void*)((char*)mallocPtr + sizeof(size_t));
 
    alignMask = ~(alignment - 1); 
 
    /* Value of alignedPtr should be multiple of alignment */    
    alignedPtr = (void *)(((size_t)newMallocPtr + alignment) & alignMask);
 
    /* Store the extra bytes info right before alignedPtr */
    *((size_t*)alignedPtr - 1) = (size_t)alignedPtr - (size_t)mallocPtr;
 
    return alignedPtr;
}

void free_aligned(void *raw_data) {
    void *mallocPtr;    //Initial malloc pointer
    size_t extraBytes;
 
    if (NULL == raw_data)
        return;
 
    /* Retrieve the extra padded byte info */
    extraBytes = *((size_t*)raw_data - 1);
 
    /* Get initial malloc ptr */
    mallocPtr = (void*) ((size_t)raw_data - extraBytes);
 
    pfree(mallocPtr);
}

Please note that I am using in these functions, the palloc and pfree instead of malloc and free respectively. But the problem is that the free_aligned function is not indeed freeing the allocated memory. Thus, I would like to know if the PostgreSQL provides a memory function that allocates aligned memory. If not, according to your experience, is there a significance difference between the performance of the O_DIRECT or not?

Thank you,
Anderson

2016-09-02 7:24 GMT-03:00 Craig Ringer <craig@2ndquadrant.com>:
On 2 September 2016 at 01:12, Anderson Carniel <accarniel@gmail.com> wrote:
> Dear all,
>
> I am developing an extension for the PostgreSQL that write/read some
> external files from the PostgreSQL. In order to write/read, I am using the
> O_DIRECT flag and using the posix_memalign to allocate memory. I would like
> to know if the postgresql internal library provides an equivalent function
> for the posix_memalign since I am getting unexpected errors.

"unexpected errors". Details please?

If you're trying to allocate aligned memory, I believe PostgreSQL
typically uses the TYPEALIGN macros (see c.h) but I'm painfully
clueless in the area, so ... yeah. Don't trust me.

I was a bit surprised not to see a MemoryContextAlloc or palloc
variant that returns memory aligned to a given boundary.

> All my
> allocations are in the TopMemoryContext since I am working with several
> buffers that must be alive while the PostgreSQL Server is activated.

You can't posix_memalign into TopMemoryContext. Such memory is outside
the memory context system, like memory directly malloc()'d.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: What is the posix_memalign() equivalent for the PostgreSQL?

От
Tom Lane
Дата:
Anderson Carniel <accarniel@gmail.com> writes:
> I would like to know if the PostgreSQL provides a memory function that
> allocates aligned memory.

Not beyond MAXALIGN (although the shared-memory alloc functions do align
to cacheline boundaries IIRC).

> If not, according to your experience, is there a
> significance difference between the performance of the O_DIRECT or not?

AFAIK, nobody's really bothered to measure whether that would be useful
for Postgres.  The results would probably be quite platform-specific
anyway.
        regards, tom lane



Re: What is the posix_memalign() equivalent for the PostgreSQL?

От
Andres Freund
Дата:
On 2016-09-02 13:05:37 -0400, Tom Lane wrote:
> Anderson Carniel <accarniel@gmail.com> writes:
> > If not, according to your experience, is there a
> > significance difference between the performance of the O_DIRECT or not?
> 
> AFAIK, nobody's really bothered to measure whether that would be useful
> for Postgres.  The results would probably be quite platform-specific
> anyway.

I've played with patches to make postgres use O_DIRECT. On linux, it's
rather beneficial for some workloads (fits into memory), but it also
works really badly for some others, because our IO code isn't
intelligent enough.  We pretty much rely on write() being nearly
instantaneous when done by normal backends (during buffer replacement),
we rely on readahead, we rely on the kernel to stopgap some bad
replacement decisions we're making.

Andres



Re: What is the posix_memalign() equivalent for the PostgreSQL?

От
Robert Haas
Дата:
On Fri, Sep 2, 2016 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-02 13:05:37 -0400, Tom Lane wrote:
>> Anderson Carniel <accarniel@gmail.com> writes:
>> > If not, according to your experience, is there a
>> > significance difference between the performance of the O_DIRECT or not?
>>
>> AFAIK, nobody's really bothered to measure whether that would be useful
>> for Postgres.  The results would probably be quite platform-specific
>> anyway.
>
> I've played with patches to make postgres use O_DIRECT. On linux, it's
> rather beneficial for some workloads (fits into memory), but it also
> works really badly for some others, because our IO code isn't
> intelligent enough.  We pretty much rely on write() being nearly
> instantaneous when done by normal backends (during buffer replacement),
> we rely on readahead, we rely on the kernel to stopgap some bad
> replacement decisions we're making.

So, suppose we changed the world so that backends don't write dirty
buffers, or at least not normally.  If they need to perform a buffer
eviction, they first check the freelist, then run the clock sweep.
The clock sweep puts clean buffers on the freelist and dirty buffers
on a to-be-cleaned list.  A background process writes buffers on the
to-be-cleaned list and then adds them to the freelist afterward if the
usage count hasn't been bumped meanwhile.  As in Amit's bgreclaimer
patch, we have a target size for the freelist, with a low watermark
and a high watermark.  When we drop below the low watermark, the
background processes run the clock sweep and write from the
to-be-cleaned list to try to populate it; when we surge above the high
watermark, they go back to sleep.

Further, suppose we also create a prefetch system, maybe based on the
synchronous scan machinery.  It preemptively pulls data into
shared_buffers if an ongoing scan will need it soon.  Or maybe don't
base it on the synchronous scan machinery, but instead just have a
queue that lets backends throw prefetch requests over the wall; when
the queue wraps, old requests are discarded.  A background process -
or perhaps one per tablespace or something like that - pull the data
in.

Neither of those things seems that hard.  And if we could do those
things and make them work, then maybe we could offer direct I/O as an
option.  We'd still lose heavily in the case where our buffer eviction
decisions are poor, but that'd probably spur some improvement in that
area, which IMHO would be a good thing.

I personally think direct I/O would be a really good thing, not least
because O_ATOMIC is designed to allow MySQL to avoid double buffering,
their alternative to full page writes.  But we can't use it because it
requires O_DIRECT.  The savings are probably massive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company