Обсуждение: [BUGS] BUG #14471: PostgreSQL 9.6 "NOT IN" in select causes crash with"ERROR: unknown error"
[BUGS] BUG #14471: PostgreSQL 9.6 "NOT IN" in select causes crash with"ERROR: unknown error"
От
josef.machytka@gmail.com
Дата:
The following bug has been logged on the website: Bug reference: 14471 Logged by: Josef Machytka Email address: josef.machytka@gmail.com PostgreSQL version: 9.6.1 Operating system: Debian jessie Description: I checked stackoverflow and I found several questions complaining about problems with "NOT IN" causing problems in selects. Looks like PostgreSQL 9.6 now very probably has some bug connected with "NOT IN" command. We have select which actually crashes after several hours (!) of run with message "ERROR: unknown error". Although according to explain plan it should be done in ~40 minutes. It is also possible that problem happens only when working with partitioned tables. We have several sets of partitioned tables. Partitions are by months - data are used for billing calculations. Our select which crashes pg 9.6 is like this: select <some columns> from table_1_partition t1 inner join table_2 t2 --also partitioned table on .... where (t1.col1, t2.col1, t1.col2) NOT IN (select col1, col2, col3 from table_3 --also partitioned table group by col1, col2, col3) When I remove NOT IN part everything works OK. Of course I replaced this problematic part with NOT EXISTS construct. But "unknown error" crash with "NOT IN" command is very unfortunate. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
josef.machytka@gmail.com writes: > Looks like PostgreSQL 9.6 now very probably has some bug connected with "NOT > IN" command. We have select which actually crashes after several hours (!) > of run with message "ERROR: unknown error". Although according to explain > plan it should be done in ~40 minutes. This is an interesting report, but without enough information to replicate the problem, we're unlikely to be able to help you. A fragment of a query, with zero information about the underlying tables, is far from enough. The only occurrences of the string "unknown error" that I can find in the source code are in dblink and postgres_fdw (both reflecting cases where the remote server did not return an error message, itself a "shouldn't happen" situation). If you were using either, you didn't say so; but if you were, maybe taking a look in the remote server's log would be useful. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14471: PostgreSQL 9.6 "NOT IN" in select causes crashwith "ERROR: unknown error"
От
Josef Machytka
Дата:
Yes, I am sorry, dblink is involved - I just did not see it as significant.
We start several processes in parallel to speed up whole billing calculation otherwise it would take 10+ hours to calculate everything in serial.
Ok, so at least "unknown error" is explained. We start several processes in parallel to speed up whole billing calculation otherwise it would take 10+ hours to calculate everything in serial.
But problem with "NOT IN" remains.
When I replaced "NOT IN" with "NOT EXISTS" query ended after ~3 hours without any problems. Even over dblink.
So it is not small task but also not something really big (we work with much more data when we calculate statistics from web metrics).
On 21 December 2016 at 16:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
josef.machytka@gmail.com writes:
> Looks like PostgreSQL 9.6 now very probably has some bug connected with "NOT
> IN" command. We have select which actually crashes after several hours (!)
> of run with message "ERROR: unknown error". Although according to explain
> plan it should be done in ~40 minutes.
This is an interesting report, but without enough information to replicate
the problem, we're unlikely to be able to help you. A fragment of a
query, with zero information about the underlying tables, is far from
enough.
The only occurrences of the string "unknown error" that I can find in the
source code are in dblink and postgres_fdw (both reflecting cases where
the remote server did not return an error message, itself a "shouldn't
happen" situation). If you were using either, you didn't say so; but if
you were, maybe taking a look in the remote server's log would be useful.
regards, tom lane
Josef Machytka <josef.machytka@gmail.com> writes: > Yes, I am sorry, dblink is involved - I just did not see it as significant. > We start several processes in parallel to speed up whole billing > calculation otherwise it would take 10+ hours to calculate everything in > serial. > Ok, so at least "unknown error" is explained. Well, we have a theory about where it came from, but still not enough information to improve the behavior. Did you look to see what happened on the remote server? > But problem with "NOT IN" remains. > When I replaced "NOT IN" with "NOT EXISTS" query ended after ~3 hours > without any problems. Even over dblink. You do know that NOT IN and NOT EXISTS behave quite differently with respect to nulls? I'm suspicious that the real problem here is that your query is just wrong when written with NOT IN, and it specifies some unreasonable amount of computation. Possibly something is running out of memory and not dealing with the case very well, leading to the unhelpful error message. FWIW, just about every bug report I've ever seen about NOT IN boiled down to the complainant's subquery returning one or more nulls and the complainant not understanding what will happen if it does. Unfortunately, that's not a bug, it's the behavior required by the SQL standard. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14471: PostgreSQL 9.6 "NOT IN" in select causes crashwith "ERROR: unknown error"
От
Josef Machytka
Дата:
OK, this sounds very interesting.
We already know about many different problems with NULL values because we use heavily different GROUP BYs and WINDOW functions so we replace NULL's everywhere with 'unknown' or similar.
But maybe there is some problem in data import. I will check data.We already know about many different problems with NULL values because we use heavily different GROUP BYs and WINDOW functions so we replace NULL's everywhere with 'unknown' or similar.
On 21 December 2016 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josef Machytka <josef.machytka@gmail.com> writes:
> Yes, I am sorry, dblink is involved - I just did not see it as significant.
> We start several processes in parallel to speed up whole billing
> calculation otherwise it would take 10+ hours to calculate everything in
> serial.
> Ok, so at least "unknown error" is explained.
Well, we have a theory about where it came from, but still not enough
information to improve the behavior. Did you look to see what happened
on the remote server?
> But problem with "NOT IN" remains.
> When I replaced "NOT IN" with "NOT EXISTS" query ended after ~3 hours
> without any problems. Even over dblink.
You do know that NOT IN and NOT EXISTS behave quite differently with
respect to nulls? I'm suspicious that the real problem here is that
your query is just wrong when written with NOT IN, and it specifies
some unreasonable amount of computation. Possibly something is running
out of memory and not dealing with the case very well, leading to the
unhelpful error message.
FWIW, just about every bug report I've ever seen about NOT IN boiled
down to the complainant's subquery returning one or more nulls and
the complainant not understanding what will happen if it does.
Unfortunately, that's not a bug, it's the behavior required by the
SQL standard.
regards, tom lane
Re: [BUGS] BUG #14471: PostgreSQL 9.6 "NOT IN" in select causes crashwith "ERROR: unknown error"
От
Josef Machytka
Дата:
Hi again.
So I checked all logs and it turned out one of databases involved vent into recovery mode because some of its connections was killed.Which looked quite horrible...
On 21 December 2016 at 17:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josef Machytka <josef.machytka@gmail.com> writes:
> Yes, I am sorry, dblink is involved - I just did not see it as significant.
> We start several processes in parallel to speed up whole billing
> calculation otherwise it would take 10+ hours to calculate everything in
> serial.
> Ok, so at least "unknown error" is explained.
Well, we have a theory about where it came from, but still not enough
information to improve the behavior. Did you look to see what happened
on the remote server?
> But problem with "NOT IN" remains.
> When I replaced "NOT IN" with "NOT EXISTS" query ended after ~3 hours
> without any problems. Even over dblink.
You do know that NOT IN and NOT EXISTS behave quite differently with
respect to nulls? I'm suspicious that the real problem here is that
your query is just wrong when written with NOT IN, and it specifies
some unreasonable amount of computation. Possibly something is running
out of memory and not dealing with the case very well, leading to the
unhelpful error message.
FWIW, just about every bug report I've ever seen about NOT IN boiled
down to the complainant's subquery returning one or more nulls and
the complainant not understanding what will happen if it does.
Unfortunately, that's not a bug, it's the behavior required by the
SQL standard.
regards, tom lane
Josef Machytka <josef.machytka@gmail.com> writes: > So I checked all logs and it turned out one of databases involved vent into > recovery mode because some of its connections was killed. > Probably out of memory killer did it. So it caused chain reaction. And > billing query was canceled "due to administration command" and it was shown > through dblink as "unknown error". OK. In that case I'm betting that what dblink saw was a network error on its next attempt to send a command. I've not dug into the code to verify this positively, but I think the behavior will be improved by Joe Conway's recent patch, https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2f802d95b4904dbed3dfdca1b3a607cd085d2e20 regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On 12/23/2016 01:08 AM, Josef Machytka wrote: > Hi again. > So I checked all logs and it turned out one of databases involved vent > into recovery mode because some of its connections was killed. > Probably out of memory killer did it. So it caused chain reaction. And > billing query was canceled "due to administration command" and it was > shown through dblink as "unknown error". > Which looked quite horrible... FYI, I committed a fix for this since the initial report. Before: ---------- test=# -- kill the remote side manually test=# select * from dblink('dbname=test','select pg_sleep(30)'); ERROR: unknown error CONTEXT: Error occurred on dblink connection named "unnamed": could not execute query. After: ---------- test=# -- kill the remote side manually test=# select * from dblink('dbname=test','select pg_sleep(30)'); ERROR: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. CONTEXT: Error occurred on dblink connection named "unnamed": could not execute query. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development