Обсуждение: help getting a backtrace from 9.2 on Ubuntu 13.04?
I have a production server running PG 8.4 on RHEL6. I have a development server running PG 9.2 on Ubuntu 13.04. Periodically,I like to take a dump of the production server and load it on the development machine. But I'm having troubles with the 9.2 server crashing when I'm restoring the dump. I'm using the 9.2 version of pg_dump. I've tried restoring a custom-format dump with pg_restore, and I've tried restoring a text-format dump with pqsl,and both of them are crashing on me. The data is too sensitive for me to submit a database dump to the community, but I'd like to submit a stack trace, in thehopes that someone might be able to figure out what's going on. But I'm having some trouble getting this done. I'm looking at the instructions at http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD Which points me to https://wiki.edubuntu.org/DebuggingProgramCrash Which tells me to install the “dbg” version of my package, so I have installed postgresql-9.2-dbg The PG wiki page tells me to download and run list-dbgsym-packages-v2.sh, but the Edubuntu page says that that script hasbeen supplanted by list-symbols-packages-v2.sh. Okay, so I run chris@mu:~/Downloads$ sudo bash list-symbols-packages-v2.sh -p $(pidof -s postgres) and get warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffcbae5000 any idea where I go from here? Disclaimer THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential,intended only for the use of the individual or entity named above. If the reader of this message is not theintended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notifiedthat any dissemination, distribution or copying of this communication is strictly prohibited. If you have receivedthis communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
On 9/9/2013 7:00 AM, Chris Curvey wrote: > any idea where I go from here? don't develop on a newer version of the database than you are deploying on. *maybe* you can use the pgdump from 8.4 to connect to and dump the 9.2 database, but the 9.2 dump is NOT guaranteed to generate 8.4 compatible SQL, in fact its highly likely it won't, as you can already attest to. -- john r pierce 37N 122W somewhere on the middle of the left coast
On 9/9/2013 7:00 AM, Chris Curvey wrote:any idea where I go from here?
don't develop on a newer version of the database than you are deploying on.
*maybe* you can use the pgdump from 8.4 to connect to and dump the 9.2 database, but the 9.2 dump is NOT guaranteed to generate 8.4 compatible SQL, in fact its highly likely it won't, as you can already attest to.
oops, I re-read yur request, you're going the other way.
use the pg_dump from 9.2 to dump the 8.4 database, via either a direct network connection, or an ssh tunnel, or whatever works best. this is guaranteed to work
-- john r pierce 37N 122W somewhere on the middle of the left coast
But I'm having troubles with the 9.2 server crashing when I'm restoring the dump. I'm using the 9.2 version of pg_dump. I've tried restoring a custom-format dump with pg_restore, and I've tried restoring a text-format dump with pqsl, and both of them are crashing on me.
The data is too sensitive for me to submit a database dump to the community, but I'd like to submit a stack trace, in the hopes that someone might be able to figure out what's going on. But I'm having some trouble getting this done.
From: Marcin Mańk [mailto:marcin.mank@gmail.com]
Sent: Monday, September 09, 2013 8:30 PM
To: Chris Curvey
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] help getting a backtrace from 9.2 on Ubuntu 13.04?
On Mon, Sep 9, 2013 at 4:00 PM, Chris Curvey <ccurvey@zuckergoldberg.com> wrote:
But I'm having troubles with the 9.2 server crashing when I'm restoring the dump. I'm using the 9.2 version of pg_dump. I've tried restoring a custom-format dump with pg_restore, and I've tried restoring a text-format dump with pqsl, and both of them are crashing on me.
The data is too sensitive for me to submit a database dump to the community, but I'd like to submit a stack trace, in the hopes that someone might be able to figure out what's going on. But I'm having some trouble getting this done.
Is it crashing on a specific database object? pg_restore -v will tell you how far it went. Then try to restore only that object. Is it perhaps crashing on a specific row?
Try producing a self contained test case (like only the culprit table, anonymized).
Regards
Marcin Mańk
Good advice. I turned on –verbose, and got a ton of output, ending with:
pg_restore: setting owner and privileges for FK CONSTRAINT user_id_refs_id_7ceef80f
pg_restore: setting owner and privileges for FK CONSTRAINT user_id_refs_id_dfbab7d
pg_restore: [archiver (db)] could not execute query: no connection to the server
Command was: -- Completed on 2013-09-09 11:35:16 EDT
pg_restore: [archiver (db)] could not execute query: no connection to the server
Command was: --
-- PostgreSQL database dump complete
–
Which I find really odd, because I specified –no-owner –no-privileges –no-tablespace
chris@mu:/sdb$ pg_restore --dbname=certified_mail_ccc2 --format=c --verbose --clean --no-owner --no-privileges --no-tablespaces -h mu -p 5434 cm_Mon.backup
So now I’m up to three questions. (Why the crash? How to get backtrace? Why are we applying permissions when I said not to?) I guess that’s the nature of the universe. Let me see if I can figure out which table that is and try to create a test case.
Disclaimer
THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential, intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
On 09/10/2013 06:57 AM, Chris Curvey wrote: > *From:*Marcin Mańk [mailto:marcin.mank@gmail.com] > *Sent:* Monday, September 09, 2013 8:30 PM > *To:* Chris Curvey > *Cc:* pgsql-general@postgresql.org > *Subject:* Re: [GENERAL] help getting a backtrace from 9.2 on Ubuntu 13.04? > > On Mon, Sep 9, 2013 at 4:00 PM, Chris Curvey <ccurvey@zuckergoldberg.com > <mailto:ccurvey@zuckergoldberg.com>> wrote: > > But I'm having troubles with the 9.2 server crashing when I'm > restoring the dump. I'm using the 9.2 version of pg_dump. I've > tried restoring a custom-format dump with pg_restore, and I've tried > restoring a text-format dump with pqsl, and both of them are > crashing on me. > > The data is too sensitive for me to submit a database dump to the > community, but I'd like to submit a stack trace, in the hopes that > someone might be able to figure out what's going on. But I'm having > some trouble getting this done. > > Is it crashing on a specific database object? pg_restore -v will tell > you how far it went. Then try to restore only that object. Is it perhaps > crashing on a specific row? > > Try producing a self contained test case (like only the culprit table, > anonymized). > > Regards > > Marcin Mańk > > Good advice. I turned on –verbose, and got a ton of output, ending with: > > pg_restore: setting owner and privileges for FK CONSTRAINT > user_id_refs_id_7ceef80f > > pg_restore: setting owner and privileges for FK CONSTRAINT > user_id_refs_id_dfbab7d > > pg_restore: [archiver (db)] could not execute query: no connection to > the server > > Command was: -- Completed on 2013-09-09 11:35:16 EDT > > pg_restore: [archiver (db)] could not execute query: no connection to > the server At this point I would be more worried about the above, 'no connection to server'. > > Command was: -- > > -- PostgreSQL database dump complete > > – > > Which I find really odd, because I specified –no-owner –no-privileges > –no-tablespace --no-owner does not mean that ownership is not set, just that the ownership from the source database is not carried over. http://www.postgresql.org/docs/9.2/interactive/app-pgrestore.html -O --no-owner Do not output commands to set ownership of objects to match the original database. By default, pg_restore issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created schema elements. These statements will fail unless the initial connection to the database is made by a superuser (or the same user that owns all of the objects in the script). With -O, any user name can be used for the initial connection, and this user will own all the created objects. > > chris@mu:/sdb$ pg_restore --dbname=certified_mail_ccc2 --format=c > --verbose --clean --no-owner --no-privileges --no-tablespaces -h mu -p > 5434 cm_Mon.backup > > So now I’m up to three questions. (Why the crash? How to get > backtrace? Why are we applying permissions when I said not to?) I > guess that’s the nature of the universe. Let me see if I can figure out > which table that is and try to create a test case. > > -- Adrian Klaver adrian.klaver@gmail.com
> > > > Good advice. I turned on –verbose, and got a ton of output, ending with: > > > > pg_restore: setting owner and privileges for FK CONSTRAINT > > user_id_refs_id_7ceef80f > > > > pg_restore: setting owner and privileges for FK CONSTRAINT > > user_id_refs_id_dfbab7d > > > > pg_restore: [archiver (db)] could not execute query: no connection to > > the server > > > > Command was: -- Completed on 2013-09-09 11:35:16 EDT > > > > pg_restore: [archiver (db)] could not execute query: no connection to > > the server This was a red herring. That last constraint that was listed is the last thing in the dump/restore (at least according to--list). Perhaps at the end of pg_restore, it's trying to send the comment to the database ("completed on...") and that'sconfusing things? It does appear that the entire database has been restored, but the cluster restart is disconcerting. > > > At this point I would be more worried about the above, 'no connection to > server'. Yep, that would be the point where the cluster is restarting. > --no-owner does not mean that ownership is not set, just that the ownership > from the source database is not carried over. > > http://www.postgresql.org/docs/9.2/interactive/app-pgrestore.html > > -O > --no-owner > Do not output commands to set ownership of objects to match the original > database. By default, pg_restore issues ALTER OWNER or SET SESSION > AUTHORIZATION statements to set ownership of created schema elements. > These statements will fail unless the initial connection to the database is > made by a superuser (or the same user that owns all of the objects in the > script). With -O, any user name can be used for the initial connection, and > this user will own all the created objects. Does this mean that the --no-owner is a command flag for pg_dump, but is ignored by pg_restore? Let me run a test and see... Disclaimer THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential,intended only for the use of the individual or entity named above. If the reader of this message is not theintended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notifiedthat any dissemination, distribution or copying of this communication is strictly prohibited. If you have receivedthis communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
> From: Marcin Mańk [mailto:marcin.mank@gmail.com]
>
>
> > Is it crashing on a specific database object? pg_restore -v will tell you
> > how far it went. Then try to restore only that object. Is it perhaps
> > crashing on a specific row?
> >
>
>
>
> Good advice. I turned on –verbose, and got a ton of output, ending with:
>
>
>
> pg_restore: setting owner and privileges for FK CONSTRAINT
> user_id_refs_id_7ceef80f
>
> pg_restore: setting owner and privileges for FK CONSTRAINT
> user_id_refs_id_dfbab7d
>
> pg_restore: [archiver (db)] could not execute query: no connection to the
> server
>
> Command was: -- Completed on 2013-09-09 11:35:16 EDT
>
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: Tuesday, September 10, 2013 1:26 PM
To: Chris Curvey
Cc: Marcin Mańk; pgsql-general@postgresql.org
Subject: Re: [GENERAL] help getting a backtrace from 9.2 on Ubuntu 13.04?
On Tue, Sep 10, 2013 at 6:57 AM, Chris Curvey <ccurvey@zuckergoldberg.com> wrote:
> From: Marcin Mańk [mailto:marcin.mank@gmail.com]
>
>
> > Is it crashing on a specific database object? pg_restore -v will tell you
> > how far it went. Then try to restore only that object. Is it perhaps
> > crashing on a specific row?
> >
>
>
>
> Good advice. I turned on –verbose, and got a ton of output, ending with:
>
>
>
> pg_restore: setting owner and privileges for FK CONSTRAINT
> user_id_refs_id_7ceef80f
>
> pg_restore: setting owner and privileges for FK CONSTRAINT
> user_id_refs_id_dfbab7d
>
> pg_restore: [archiver (db)] could not execute query: no connection to the
> server
>
> Command was: -- Completed on 2013-09-09 11:35:16 EDT
>
What does the server log say? It should tell you why the server is restarting.
Cheers,
Jeff
Great thought. Looking through the logs, it appears that all my failures are on a CREATE INDEX. Usually on my biggest table, but often on another table.
2013-09-10 10:09:46 EDT ERROR: canceling autovacuum task
2013-09-10 10:09:46 EDT CONTEXT: automatic analyze of table "certified_mail_ccc2.public.cm_status_history"
2013-09-10 10:15:13 EDT LOG: server process (PID 14386) was terminated by signal 11: Segmentation fault
2013-09-10 10:15:13 EDT DETAIL: Failed process was running: CREATE INDEX cm_envelope_tracking_number ON cm_envelope USING btree (tracking_number);
2013-09-10 10:15:13 EDT LOG: terminating any other active server processes
2013-09-10 10:15:13 EDT WARNING: terminating connection because of crash of another server process
2013-09-10 10:15:13 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
I cannot square this with the fact that when I echo the commands, the last echoed command is about setting privileges.
Another development (possibly unrelated): I tried *dumping* with –no-privileges –no-tablespace –no-owner, and the restore went fine.
Disclaimer
THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential, intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
On 09/10/2013 10:37 AM, Chris Curvey wrote: > > Another development (possibly unrelated): I tried **dumping** with > –no-privileges –no-tablespace –no-owner, and the restore went fine. > Probably has to do with whether you are dumping plain text or custom format: http://www.postgresql.org/docs/9.2/interactive/app-pgdump.html -O --no-owner Do not output commands to set ownership of objects to match the original database. By default, pg_dump issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created database objects. These statements will fail when the script is run unless it is started by a superuser (or the same user that owns all of the objects in the script). To make a script that can be restored by any user, but will give that user ownership of all the objects, specify -O. This option is only meaningful for the plain-text format. For the archive formats, you can specify the option when you call pg_restore. -- Adrian Klaver adrian.klaver@gmail.com
Great thought. Looking through the logs, it appears that all my failures are on a CREATE INDEX. Usually on my biggest table, but often on another table.
2013-09-10 10:09:46 EDT ERROR: canceling autovacuum task
2013-09-10 10:09:46 EDT CONTEXT: automatic analyze of table "certified_mail_ccc2.public.cm_status_history"
2013-09-10 10:15:13 EDT LOG: server process (PID 14386) was terminated by signal 11: Segmentation fault
2013-09-10 10:15:13 EDT DETAIL: Failed process was running: CREATE INDEX cm_envelope_tracking_number ON cm_envelope USING btree (tracking_number);
2013-09-10 10:15:13 EDT LOG: terminating any other active server processes
2013-09-10 10:15:13 EDT WARNING: terminating connection because of crash of another server process
2013-09-10 10:15:13 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
I cannot square this with the fact that when I echo the commands, the last echoed command is about setting privileges.
On Tue, Sep 10, 2013 at 10:37 AM, Chris Curvey <ccurvey@zuckergoldberg.com> wrote:
Great thought. Looking through the logs, it appears that all my failures are on a CREATE INDEX. Usually on my biggest table, but often on another table.
2013-09-10 10:09:46 EDT ERROR: canceling autovacuum task
2013-09-10 10:09:46 EDT CONTEXT: automatic analyze of table "certified_mail_ccc2.public.cm_status_history"
2013-09-10 10:15:13 EDT LOG: server process (PID 14386) was terminated by signal 11: Segmentation fault
2013-09-10 10:15:13 EDT DETAIL: Failed process was running: CREATE INDEX cm_envelope_tracking_number ON cm_envelope USING btree (tracking_number);
2013-09-10 10:15:13 EDT LOG: terminating any other active server processes
2013-09-10 10:15:13 EDT WARNING: terminating connection because of crash of another server process
2013-09-10 10:15:13 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
I cannot square this with the fact that when I echo the commands, the last echoed command is about setting privileges.
A backend is crashing, and taking down the entire PostgreSQL system. The commands you see being echoed are from a different process from the one that triggered the crash, so it is just an innocent bystander which has no useful information. Are you using parallel restore? (If not, why is there someone indexing your biggest table during the restore?)
You will want to get the backtrace of the coredump generated by the crashed backend, not of the running process. Have you tried taking a bt with gdb? You said you couldn't find the symbols, but have you tried it anyway? On CentOS and openSuse I often get warnings about some symbols not being found, but all the symbols I actually need to interpret the backtrace end up being there.
Cheers,Jeff
Many thanks!