Обсуждение: [ADMIN] pg_dump bug?

Поиск
Список
Период
Сортировка

[ADMIN] pg_dump bug?

От
Andy Nercessian
Дата:
I wonder if anyone has experienced the problem I have with bloated pg_dump files. This problem exists in both standard pg_dumps and --insert-columns type pg_dumps. I am using postgresql 9.5 on Windows 7.

Each table appears to have a 'copy table_name .... from stdin..' in multiple places (hundreds in some cases). The first copy statement appears genuine, but the second and subsequent statements appear to be in the middle of another copy clause between records. The outer copy clause is sometimes from the same table, and sometimes from a different table. Furthermore, from the existence of the number of backslashes \t, \\t, \\\\t which are variable, it would appear as though nesting can be several levels deep.

I'm not copying the actual data because apart from anything else, it's a giant text file. But this is roughly what it looks like:

------------------------------------
copy bills from .... 
Record 1 Column 1 \t Record 1 column 2 \t ....
Record 2 Column 2 \t Record 2 column 2 \t ...
......
copy adverts from .... < this is a nested copy statement unrelated to bills, although occasionally it's another bills copy statement>

Record 88 Column 1 \t Record 88 Column 2 \t ...

-------------------------------------

So in the middle of one copy statement - in between the records being copied, another one mysteriously appears and the records are duplicated. Each record can therefore be found in each of the copy statements, and the total number of such duplications is in the hundreds for some tables.

What could be causing this? I've made only the following observations:
1. The nested loops always occur in the same spot within the outer copy statements (if records have not changed between dumps) and are preceded by the exact same column of the same record in each dump.
2. It is clear that these copy statements are NOT part of any record. 
3. The same behaviour exists for insert style dumps (where there is no copy command).
4. Each copy statement includes the entire table, NOT a selection of records from it.

In years of using pg_dump I have never come across this issue. 


--
Andy 

Re: [ADMIN] pg_dump bug?

От
Scott Mead
Дата:

> On Apr 11, 2017, at 05:20, Andy Nercessian <ahnercessian@gmail.com> wrote:
>
> I wonder if anyone has experienced the problem I have with bloated pg_dump files. This problem exists in both
standardpg_dumps and --insert-columns type pg_dumps. I am using postgresql 9.5 on Windows 7. 
>
> Each table appears to have a 'copy table_name .... from stdin..' in multiple places (hundreds in some cases). The
firstcopy statement appears genuine, but the second and subsequent statements appear to be in the middle of another
copyclause between records. The outer copy clause is sometimes from the same table, and sometimes from a different
table.Furthermore, from the existence of the number of backslashes \t, \\t, \\\\t which are variable, it would appear
asthough nesting can be several levels deep. 
>
> I'm not copying the actual data because apart from anything else, it's a giant text file. But this is roughly what it
lookslike: 
>
> ------------------------------------
> copy bills from ....
> Record 1 Column 1 \t Record 1 column 2 \t ....
> Record 2 Column 2 \t Record 2 column 2 \t ...
> ......
> copy adverts from .... < this is a nested copy statement unrelated to bills, although occasionally it's another bills
copystatement> 
>
> Record 88 Column 1 \t Record 88 Column 2 \t ...
>
> -------------------------------------
>
> So in the middle of one copy statement - in between the records being copied, another one mysteriously appears and
therecords are duplicated. Each record can therefore be found in each of the copy statements, and the total number of
suchduplications is in the hundreds for some tables. 
>
> What could be causing this? I've made only the following observations:
> 1. The nested loops always occur in the same spot within the outer copy statements (if records have not changed
betweendumps) and are preceded by the exact same column of the same record in each dump. 
> 2. It is clear that these copy statements are NOT part of any record.
> 3. The same behaviour exists for insert style dumps (where there is no copy command).
> 4. Each copy statement includes the entire table, NOT a selection of records from it.
>
> In years of using pg_dump I have never come across this issue.

Is it possible that you have more than 1 pg_dump running simultaneously against the same output file?



>
>
> --
> Andy


Re: [ADMIN] pg_dump bug?

От
Andy Nercessian
Дата:
Thank you, Scott. 
That was my first instinct too, so I explored it by carrying out a manually typed pg_dump into psql. The result was identical. The reason I eventually ruled it out is that the nested copy statements are all identically positioned in each pg_dump. It would require some sort of asynchronous writing on the file for this to happen, and in that event, it seems highly improbable that the locations of all these nested statements would be identical.
There is something even stranger. The restore works fine- i.e. there is no evidence that the additional copy statements are actually carried out. Given this fact, I wouldn't bother with this at all, were it not for the fact that the pg_dump files are many many times larger than they need to be.

On Tue, Apr 11, 2017 at 1:33 PM, Scott Mead <scottm@openscg.com> wrote:


> On Apr 11, 2017, at 05:20, Andy Nercessian <ahnercessian@gmail.com> wrote:
>
> I wonder if anyone has experienced the problem I have with bloated pg_dump files. This problem exists in both standard pg_dumps and --insert-columns type pg_dumps. I am using postgresql 9.5 on Windows 7.
>
> Each table appears to have a 'copy table_name .... from stdin..' in multiple places (hundreds in some cases). The first copy statement appears genuine, but the second and subsequent statements appear to be in the middle of another copy clause between records. The outer copy clause is sometimes from the same table, and sometimes from a different table. Furthermore, from the existence of the number of backslashes \t, \\t, \\\\t which are variable, it would appear as though nesting can be several levels deep.
>
> I'm not copying the actual data because apart from anything else, it's a giant text file. But this is roughly what it looks like:
>
> ------------------------------------
> copy bills from ....
> Record 1 Column 1 \t Record 1 column 2 \t ....
> Record 2 Column 2 \t Record 2 column 2 \t ...
> ......
> copy adverts from .... < this is a nested copy statement unrelated to bills, although occasionally it's another bills copy statement>
>
> Record 88 Column 1 \t Record 88 Column 2 \t ...
>
> -------------------------------------
>
> So in the middle of one copy statement - in between the records being copied, another one mysteriously appears and the records are duplicated. Each record can therefore be found in each of the copy statements, and the total number of such duplications is in the hundreds for some tables.
>
> What could be causing this? I've made only the following observations:
> 1. The nested loops always occur in the same spot within the outer copy statements (if records have not changed between dumps) and are preceded by the exact same column of the same record in each dump.
> 2. It is clear that these copy statements are NOT part of any record.
> 3. The same behaviour exists for insert style dumps (where there is no copy command).
> 4. Each copy statement includes the entire table, NOT a selection of records from it.
>
> In years of using pg_dump I have never come across this issue.

Is it possible that you have more than 1 pg_dump running simultaneously against the same output file?



>
>
> --
> Andy



--
Andy Nercessian
+302109427400