Re: [patch] libpq one-row-at-a-time API

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: [patch] libpq one-row-at-a-time API
Дата
Msg-id CAHyXU0w5G25FshZtHe4DS1QY3GCUWW6mG-SBebmq3scV2CgyAA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [patch] libpq one-row-at-a-time API  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [patch] libpq one-row-at-a-time API  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-hackers
On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> But, the faster rowbuf method is a generally incompatible way of
>> dealing with data vs current libpq -- this is bad.  If it's truly
>> impossible to get those benefits without bypassing result API that
>> then I remove my objection on the grounds it's optional behavior (my
>> gut tells me it is possible though).
>
> Um, please clarify what are you talking about here?
>
> What is the incompatibility of PGresult from branch 1?

Incompatibility in terms of usage -- we should be getting data with
PQgetdata.  I think you're suspecting that I incorrectly believe your
forced to use the rowbuf API -- I don't (although I wasn't clear on
that earlier).  Basically I'm saying that we should only buy into that
if all other alternative routes to getting the faster performance are
exhausted.

On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I think the dummy copy of PGresult is plausible (if by that you mean
>> optimizing PQgetResult when in single row mode).  That would be even
>> better: you'd remove the need for the rowbuf mode.
>
> I haven't spent any time looking at this, but my gut tells me that a big
> chunk of the expense is copying the PGresult's metadata (the column
> names, types, etc).  It has to be possible to make that cheaper.
>
> One idea is to rearrange the internal storage so that that part reduces
> to one memcpy().  Another thought is to allow PGresults to share
> metadata by treating the metadata as a separate reference-counted
> object.  The second could be a bit hazardous though, as we advertise
> that PGresults are independent objects that can be manipulated by
> separate threads.  I don't want to introduce mutexes into PGresults,
> but I'm not sure reference-counted metadata can be safe without them.
> So maybe the memcpy idea is the only workable one.

Yeah -- we had a very similar problem in libpqtypes and we solved it
exactly as you're thinking.  libpqtypes has to create a result with
each row iteration potentially (we expose rows and composites as on
the fly created result objects) and stores some extra non-trivial data
with the result.  We solved it with the optimized-memcpy method (look
here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h
and you'll see all the important structs like PGtypeHandler are
somewhat haphazardly designed to be run through a memcpy.   We
couldn't do anything about internal libpq issues though, but some
micro optimization of PQsetResultAttrs (which is called via
PQcopyResult) might fit the bill.

The 'source' result (or source data that would be copied into the
destination result) would be stored in the PGconn, right? So, the idea
is that when you set up single row mode the connection generates a
template PGconn which is then copied out repeatedly during row-by-row
processing.  I like it, but only if we're reasonably confident the
PGresult can be sufficiently optimized like that.

merlin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: canceling autovacuum task woes
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: [patch] libpq one-row-at-a-time API