On Sun, Dec 13, 2015 at 11:29:23AM +0300, Konstantin Knizhnik wrote:
> Hi,
>
> I am trying to create version of COPY command which can scatter/replicate data to different nodes based on some
distributionmethod.
> There is some master process, having information about data distribution, to which all clients are connected.
> This master process should receive copied data from client and scatters tuples to nodes.
> May be somebody can recommend me the best way of implementing such COPY agent?
>
> The obvious plan is the following:
>
> 1. Register utility callback
> 2. Handle T_CopyStmt in this callback
> 3. Use BeginCopyFrom/NextCopyFrom to receive tuples from client
> 4. Calculate distribution function for the received tuple
> 5. Establish connection with correspondent node (if not yet established) and start the same COPY command to this node
(ifnot started yet).
> 6. Send data to this node using PQputCopyData.
>
> The problem is with step 6: I do not see any way to copy received data to the destination node.
> NextCopyFrom returns array of values (Dutums) of tuple columns. But there are no public methods to send tuple to the
copystream.
> All this logic is implemented in src/backend/commands/copy.c and is not available outside this module.
>
> It is more or less clear how to do it using text or CSV mode: I can use NextCopyFromRawFields and then construct a
linewith comma separated list of values.
> But how to handle binary mode? Also, I suspect that copy in text mode is significantly slower than in binary mode,
isn'tit?
>
> The dirty solution is just to cut&paste copy.c code. But may be there is some more elegant way?
A slightly cleaner solution is to make public methods to send tuples
to the copy stream and have COPY call those.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate