Обсуждение: Tee for COPY

Поиск
Список
Период
Сортировка

Tee for COPY

От
Konstantin Knizhnik
Дата:
Hi,

I am trying to create version of COPY command which can scatter/replicate data to different nodes based on some
distributionmethod.
 
There is some master process, having information about data distribution, to which all clients are connected.
This master process should receive copied data from client and scatters tuples to nodes.
May be somebody can recommend me the best way of implementing such COPY agent?

The obvious plan is the following:

1. Register utility callback
2. Handle T_CopyStmt in this callback
3. Use BeginCopyFrom/NextCopyFrom to receive tuples from client
4. Calculate distribution function for the received tuple
5. Establish connection with correspondent node (if not yet established) and start the same COPY command to this node
(ifnot started yet).
 
6. Send data to this node using PQputCopyData.

The problem is with step 6: I do not see any way to copy received data to the destination node.
NextCopyFrom returns array of values (Dutums) of tuple columns. But there are no public methods to send tuple to the
copystream.
 
All this logic is implemented in src/backend/commands/copy.c and is not available outside this module.

It is more or less clear how to do it using text or CSV mode: I can use NextCopyFromRawFields and then construct a line
withcomma separated list of values.
 
But how to handle binary mode? Also, I suspect that copy in text mode is significantly slower than in binary mode,
isn'tit?
 

The dirty solution is just to cut&paste copy.c code. But may be there is some more elegant way?

Thanks in advance,
Konstantin







Re: Tee for COPY

От
David Fetter
Дата:
On Sun, Dec 13, 2015 at 11:29:23AM +0300, Konstantin Knizhnik wrote:
> Hi,
> 
> I am trying to create version of COPY command which can scatter/replicate data to different nodes based on some
distributionmethod.
 
> There is some master process, having information about data distribution, to which all clients are connected.
> This master process should receive copied data from client and scatters tuples to nodes.
> May be somebody can recommend me the best way of implementing such COPY agent?
> 
> The obvious plan is the following:
> 
> 1. Register utility callback
> 2. Handle T_CopyStmt in this callback
> 3. Use BeginCopyFrom/NextCopyFrom to receive tuples from client
> 4. Calculate distribution function for the received tuple
> 5. Establish connection with correspondent node (if not yet established) and start the same COPY command to this node
(ifnot started yet).
 
> 6. Send data to this node using PQputCopyData.
> 
> The problem is with step 6: I do not see any way to copy received data to the destination node.
> NextCopyFrom returns array of values (Dutums) of tuple columns. But there are no public methods to send tuple to the
copystream.
 
> All this logic is implemented in src/backend/commands/copy.c and is not available outside this module.
> 
> It is more or less clear how to do it using text or CSV mode: I can use NextCopyFromRawFields and then construct a
linewith comma separated list of values.
 
> But how to handle binary mode? Also, I suspect that copy in text mode is significantly slower than in binary mode,
isn'tit?
 
> 
> The dirty solution is just to cut&paste copy.c code. But may be there is some more elegant way?

A slightly cleaner solution is to make public methods to send tuples
to the copy stream and have COPY call those.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate