Hello,
While investigating some issue, I found that pg_xlogdump fails to dump contents from a WAL file if the file has continuation data from previous WAL record and the data spans more than one page. In such cases, XLogFindNextRecord() fails to take into account that there will be more than one xlog page headers (long and short) and thus tries to read from an offset where no valid record exists. That results in pg_xlogdump throwing error such as:
pg_xlogdump: FATAL: could not find a valid record after 0/46000000
Attached WAL file from master branch demonstrates the issue, generated using synthetic data. Also, attached patch fixes it for me.
While we could have deduced the number of short and long headers and skipped directly to the offset, I found reading one page at a time and using XLogPageHeaderSize() to find header size of each page separately, a much cleaner way. Also, the continuation data is not going to span many pages. So I don't see any harm in doing it that way.
I encountered this on 9.3, but the patch applies to both 9.3 and master. I haven't tested it on other branches, but I have no reason to believe it won't apply or work. I believe we should back patch it all supported branches.
Thanks,
Pavan
--