Обсуждение: BUG #15299: relation does not exist errors

Поиск
Список
Период
Сортировка

BUG #15299: relation does not exist errors

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15299
Logged by:          Jeff Frost
Email address:      jeff@pgexperts.com
PostgreSQL version: 9.5.13
Operating system:   Ubuntu 14.04
Description:

We recently upgraded some production servers to 9.5.13 on Saturday
(7/21/2018) and they ran fine for 2 days. On Wednesday morning, we started
seeing some strange issues where postgres was reporting tons of "relation"
does not exist errors. Here are some snippets from the logs:

2018-07-25 14:33:53.243
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90363,"PARSE",2018-07-25
13:38:03 GMT,287/3021104,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
[0 of 5 busy]"
2018-07-25 14:33:53.251
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90364,"PARSE",2018-07-25
13:38:03 GMT,287/3021107,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1 ",37,,"sidekiq
4.1.2 procore [0 of 5 busy]"
2018-07-25 14:33:53.252
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90365,"PARSE",2018-07-25
13:38:03 GMT,287/3021109,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1 ",37,,"sidekiq
4.1.2 procore [0 of 5 busy]"
2018-07-25 14:33:53.258
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90366,"PARSE",2018-07-25
13:38:03 GMT,287/3021110,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
[0 of 5 busy]"
2018-07-25 14:33:53.295
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90367,"PARSE",2018-07-25
13:38:03 GMT,287/3021111,0,ERROR,42P01,"relation ""login_informations"" does
not exist",,,
,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
""login_informations"".""id"" = 735211 LIMIT 1 ",37,,"sidekiq
4.1.2 procore [0 of 5 busy]"
2018-07-25 14:33:53.296
GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90368,"PARSE",2018-07-25
13:38:03 GMT,287/3021112,0,ERROR,42P01,"relation ""permission_templates""
does not exist",
,,,,,"SELECT  ""permission_templates"".* FROM ""permission_templates"" WHERE
""permission_templates"".""id"" = 137880 LIMIT 1 ",39,,
"sidekiq 4.1.2 procore [0 of 5 busy]"

Connecting to the DB via psql and issuing a \d login_informations showed
that the table was there. Running one of the queries from the logs by hand
worked fine.

Since we use pgbouncer as a connection pooler, we then connected using psql
through pgbouncer on one of the affected hosts and were able to reproduce
the issue, so the theory was that some postgresql backends had lost track of
relations. We stopped/started pgbouncer to allow those backends to exit and
the issue was resolved.

A little about the DB in case this is helpful:

DB size: 12TB
DB load: 25k xacts/sec
Characterization: 80% read / 20% write
Replication role: primary with 1 direct replica and a few cascaded
replicas

So far haven't been able to reproduce synthetically, but let me know if
there is any more info we can pull from the server if this happens again.


Re: BUG #15299: relation does not exist errors

От
Tom Lane
Дата:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> We recently upgraded some production servers to 9.5.13 on Saturday
> (7/21/2018) and they ran fine for 2 days. On Wednesday morning, we started
> seeing some strange issues where postgres was reporting tons of "relation"
> does not exist errors. Here are some snippets from the logs:

> 2018-07-25 14:33:53.243
> GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90363,"PARSE",2018-07-25
> 13:38:03 GMT,287/3021104,0,ERROR,42P01,"relation ""login_informations"" does
> not exist",,,
> ,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
> ""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
> [0 of 5 busy]"

> Connecting to the DB via psql and issuing a \d login_informations showed
> that the table was there. Running one of the queries from the logs by hand
> worked fine.

My first thought would be something to do with a restrictive search_path.

            regards, tom lane


Re: BUG #15299: relation does not exist errors

От
Jeff Frost
Дата:

> On Jul 26, 2018, at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
>> We recently upgraded some production servers to 9.5.13 on Saturday
>> (7/21/2018) and they ran fine for 2 days. On Wednesday morning, we started
>> seeing some strange issues where postgres was reporting tons of "relation"
>> does not exist errors. Here are some snippets from the logs:
>
>> 2018-07-25 14:33:53.243
>> GMT,"app_db","webapp",34887,"127.0.0.1:56740",5b587d3b.8847,90363,"PARSE",2018-07-25
>> 13:38:03 GMT,287/3021104,0,ERROR,42P01,"relation ""login_informations"" does
>> not exist",,,
>> ,,,"SELECT  ""login_informations"".* FROM ""login_informations"" WHERE
>> ""login_informations"".""id"" = 735211 LIMIT 1",37,,"sidekiq 4.1.2 procore
>> [0 of 5 busy]"
>
>> Connecting to the DB via psql and issuing a \d login_informations showed
>> that the table was there. Running one of the queries from the logs by hand
>> worked fine.
>
> My first thought would be something to do with a restrictive search_path.

Good thought. As far as I can tell we never change it from the default and everything in that DB is in the public
schema.

Re: BUG #15299: relation does not exist errors

От
Tom Lane
Дата:
Jeff Frost <jeff@pgexperts.com> writes:
> On Jul 26, 2018, at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My first thought would be something to do with a restrictive search_path.

> Good thought. As far as I can tell we never change it from the default and everything in that DB is in the public
schema.

Right, but what I'm half-theorizing is that the patches for CVE-2018-1058
affected you somehow.  Maybe something "transiently" installed pg_catalog
(only) as search_path and forgot to remove it again.  It would be easy
to explain it that way if the errors had been coming out of autovacuum
worker processes, for instance, though your log didn't match that exactly.

If you see it again it would be a good idea to look very closely at what
happened right before the errors started.

            regards, tom lane


Re: BUG #15299: relation does not exist errors

От
Jeff Frost
Дата:
On Jul 26, 2018, at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jeff Frost <jeff@pgexperts.com> writes:
>> On Jul 26, 2018, at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My first thought would be something to do with a restrictive search_path.
>
>> Good thought. As far as I can tell we never change it from the default and everything in that DB is in the public
schema.
>
> Right, but what I'm half-theorizing is that the patches for CVE-2018-1058
> affected you somehow.  Maybe something "transiently" installed pg_catalog
> (only) as search_path and forgot to remove it again.  It would be easy
> to explain it that way if the errors had been coming out of autovacuum
> worker processes, for instance, though your log didn't match that exactly.
>
> If you see it again it would be a good idea to look very closely at what
> happened right before the errors started.

Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after
they'recomplete though? 

I have the logs around, so if you know what I should look for, I'd be happy to dig.

Re: BUG #15299: relation does not exist errors

От
Andres Freund
Дата:
On 2018-07-26 13:18:33 -0700, Jeff Frost wrote:
> Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after
they'recomplete though?
 

Yes, that's right.  Creating an extension or calling a function that
internally does a SET search_path seem like a more likely cause.

- Andres


Re: BUG #15299: relation does not exist errors

От
Jeff Frost
Дата:
On Jul 26, 2018, at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2018-07-26 13:18:33 -0700, Jeff Frost wrote:
>> Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after
they'recomplete though? 
>
> Yes, that's right.  Creating an extension or calling a function that
> internally does a SET search_path seem like a more likely cause.

I grepped our code and didn't find anywhere that we set search_path, though it's possible some ruby gem we use could be
doingit. I'll go spelunking there next! 

Re: BUG #15299: relation does not exist errors

От
Jeff Frost
Дата:

> On Jul 26, 2018, at 1:53 PM, Jeff Frost <jeff@pgexperts.com> wrote:
>
> On Jul 26, 2018, at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2018-07-26 13:18:33 -0700, Jeff Frost wrote:
>>> Oh, I see what you're saying. Don't the autovacuum processes connect to their own backends which then exit after
they'recomplete though? 
>>
>> Yes, that's right.  Creating an extension or calling a function that
>> internally does a SET search_path seem like a more likely cause.
>
> I grepped our code and didn't find anywhere that we set search_path, though it's possible some ruby gem we use could
bedoing it. I'll go spelunking there next! 

Wanted to close the loop on this. As usual, Tom was right.

We have a rake task in the application used to pull a production copy of structure.sql which uses pg_dump under the
covers.Because it uses the same conninfo as the app, it happily goes through pgbouncer.  So, when that rake task is
run,pg_dump connects through pgbouncer and happily sets that restrictive search_path. It then creates the schema-only
dumpand upon completion, that connection is returned to the pool as a big fat boobytrap for the next application
connection.