Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
| От | Tom Lane | 
|---|---|
| Тема | Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument" | 
| Дата | |
| Msg-id | 13903.1475185710@sss.pgh.pa.us обсуждение исходный текст | 
| Ответ на | Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument" (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Ответы | Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with
 "invalid argument" | 
| Список | pgsql-hackers | 
I wrote:
> But what gets my attention in this connection is that it doesn't
> seem to be taking the trouble to open the files in binary mode.
> Could that lead to the reported failure?  Not sure, but it seems
> like at the least it could result in corrupted VM files.
On further thought, it definitely would lead to some sort of
hard-to-diagnose error.  Assuming there's some bytes that look
like \n, Windows read() would expand them to \r\n, which not
only produces garbage vismap data but throws off the page boundaries
in the file.  rewriteVisibilityMap() would not notice that, since
it contains exactly no sanity checking on the page headers,
but what it would notice is getting a short read on what it'll
think is a partial last page.  Then it does this:
       if ((bytesRead = read(src_fd, buffer, BLCKSZ)) != BLCKSZ)       {           close(dst_fd);
close(src_fd);          return getErrorText();       }
 
The read() won't have set errno, since by its lights there's nothing
wrong.  So while we know that getErrorText saw errno == EINVAL, there's
no way to guess what call that might've been left over from.
BTW, just to add to the already long list of mortal sins in this
bit of code, I believe it does 100% the wrong thing on big-endian
hardware.
               uint16        new_vmbits = 0;               ...               /* Copy new visibility map bit to new
formatpage */               memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK);
 
That's going to drop the two new bytes down in exactly the wrong
order if big-endian.
Oh, and for even more fun, this bit will dump core on alignment-picky
machines, if the vmbuf local array starts on an odd boundary:
               ((PageHeader) new_vmbuf)->pd_checksum =                   pg_checksum_page(new_vmbuf, new_blkno);
We might've caught these things earlier if the buildfarm testing
included cross-version upgrades, but of course that requires a
lot of test infrastructure that's not there ...
        regards, tom lane
		
	В списке pgsql-hackers по дате отправления: