I'd been looking over the COPY FROM code tonight when I saw something
pretty scary looking:
/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}
while it might not look too scary by itself, it gets a bit more so
when you learn that the cur_lineno is only 32 bits wide. This will
result in skipping a tuple every time the 32-bit variable wraps back
around to 0 again.
Maybe when this code was written copying > 4 billion rows was just a
far-off dream, but with today's hardware, it really didn't take that
long to see this actually happen for real.
postgres=# create unlogged table t(a int);
CREATE TABLE
Time: 1.339 ms
postgres=# insert into t select 0 from generate_series(1, 4300000000);
INSERT 0 4300000000
Time: 2128367.019 ms (35:28.367)
postgres=# copy t to '/home/ubuntu/t.csv' with (format csv, header);
COPY 4300000000
Time: 2294331.494 ms (38:14.331)
postgres=# truncate t;
TRUNCATE TABLE
Time: 30.367 ms
postgres=# copy t from '/home/ubuntu/t.csv' with (format csv, header);
COPY 4299999999
Time: 2693186.475 ms (44:53.186)
A patch to fix is attached.
(I just made the variable 64-bit)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services