Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
Дата
Msg-id 201206142338.33897.andres@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
> On 13.06.2012 14:28, Andres Freund wrote:
> > Features:
> > - streaming reading/writing
> > - filtering
> > - reassembly of records
> > 
> > Reusing the ReadRecord infrastructure in situations where the code that
> > wants to do so is not tightly integrated into xlog.c is rather hard and
> > would require changes to rather integral parts of the recovery code
> > which doesn't seem to be a good idea.
> It would be nice refactor ReadRecord and its subroutines out of xlog.c.
> That file has grown over the years to be really huge, and separating the
> code to read WAL sounds like it should be a pretty natural split. I
> don't want to duplicate all the WAL reading code, so we really should
> find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
> wrapper that just calls the new xlogreader code.
I aggree that it is not very nice to duplicate it. But I also don't want to go 
the route of replacing ReadRecord with it for a while, we can replace 
ReadRecord later if we want. As long as it is in flux like it is right now I 
don't really see the point in investing energy in it.
Also I am not that sure how a callback oriented API fits into the xlog.c 
workflow?

> > Missing:
> > - "compressing" the stream when removing uninteresting records
> > - writing out correct CRCs
> > - validating CRCs
> > - separating reader/writer
> 
> - comments.
> At a quick glance, I couldn't figure out how this works. There seems to
> be some callback functions? If you want to read an xlog stream using
> this facility, what do you do?
You currently have to fill out 4 callbacks:

XLogReaderStateInterestingCB is_record_interesting;
XLogReaderStateWriteoutCB writeout_data;
XLogReaderStateFinishedRecordCB finished_record;
XLogReaderStateReadPageCB read_page;

As an example how to use it (from the walsender support for 
START_LOGICAL_REPLICATION):

if(!xlogreader_state){xlogreader_state = XLogReaderAllocate();xlogreader_state->is_record_interesting = 
RecordRelevantForLogicalReplication;xlogreader_state->finished_record = ProcessRecord;xlogreader_state->writeout_data =
WriteoutData;xlogreader_state->read_page= XLogReadPage;
 
/* startptr is the current XLog position */xlogreader_state->startptr = startptr;
XLogReaderReset(xlogreader_state);
}

/* how far does valid data go */
xlogreader_state->endptr = endptr;

XLogReaderRead(xlogreader_state);

The last step will then call the above callbacks till it reaches endptr. I.e. 
it first reads a page with "read_page"; then checks whether a record is 
interesting for the use-case ("is_record_interesting"); in case it is 
interesting, it gets reassembled and passed to the "finished_record" callback. 
Then the bytestream gets written out again with "writeout_data".

In this case it gets written to the buffer the walsender has allocated. In 
others it might just get thrown away.

> Can this be used for writing WAL, as well as reading? If so, what do you
> need the write support for?
It currently can replace records which are not interesting (e.g. index changes 
in the case of logical rep). Filtered records are replaced with XLOG_NOOP 
records with correct length currently. In future the actual amount of data 
should really be reduced. I don't know yet know how to map LSNs of 
uncompressed/compressed stream onto each other...
The filtered data is then passed to a writeout callback (in a streaming 
fashion).

The whole writing out part is pretty ugly at the moment and I just bolted it 
ontop because it was convenient for the moment. I am not yet sure how the api 
for that should look....

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WIP: relation metapages
Следующее
От: Robert Haas
Дата:
Сообщение: Re: measuring spinning