Обсуждение: Parallel Scan Bug: invalid attnum: 0
running pg 9.6.1 on AWS max_parallel_workers_per_gather = 4 This only occurs in some schemas, not all. The affected schemas have approx 1 million rows in the table. Much larger schemas (65 million rows) and smaller schemas(100K rows) do not have the problem. table definition: CREATE TABLE ro_summaries ( day date NOT NULL, entry_firm varchar(15) NOT NULL, route_firm varchar(15) NOT NULL, order_type varchar(3) NOT NULL, symbol varchar(15) NOT NULL, bucket_id int4 NOT NULL, entry_status varchar(2) NOT NULL, covered_order bool NOT NULL, orders_sent int4 NOT NULL DEFAULT 0, shares_sent int4 NOT NULL DEFAULT 0, orders_pos int4 NOT NULL DEFAULT 0, orders_zero int4 NOT NULL DEFAULT 0, orders_neg int4 NOT NULL DEFAULT 0, orders_canceled int4 NOT NULL DEFAULT 0, orders_null int4 NOT NULL DEFAULT 0, shares_pos int4 NOT NULL DEFAULT 0, shares_zero int4 NOT NULL DEFAULT 0, shares_neg int4 NOT NULL DEFAULT 0, shares_canceled int4 NOT NULL DEFAULT 0, shares_null int4 NOT NULL DEFAULT 0, pi_pos numeric NOT NULL DEFAULT 0, pi_neg numeric NOT NULL DEFAULT 0, efq numeric NULL, effective_spread numeric NULL, quoted_spread numeric NULL, realized_spread numeric NULL, speed numeric NULL, part1_shares int4 NULL, trades_total int4 NULL, orders_filled int4 NULL DEFAULT 0, efq_shares int4 NULL, exchange varchar(10) NULL, notional_value numeric NULL ) WITHOUT OIDS TABLESPACE pg_default GO CREATE UNIQUE INDEX rosum_ux01 ON ro_summaries USING btree (day date_ops, order_type text_ops, bucket_id int4_ops, route_firm text_ops, entry_firm text_ops, symbol text_ops, entry_status text_ops, covered_order bool_ops) view definition: CREATE OR REPLACE VIEW ro_summaries_v AS SELECT ro_summaries.day, 'S3'::character varying AS source_code, ro_summaries.entry_firm, ro_summaries.route_firm, ro_summaries.order_type, ro_summaries.symbol, ro_summaries.bucket_id, ro_summaries.entry_status, ro_summaries.covered_order, ro_summaries.orders_sent, ro_summaries.shares_sent, (((ro_summaries.shares_pos + ro_summaries.shares_neg) + ro_summaries.shares_zero) + ro_summaries.shares_null) AS shares_total, (ro_summaries.shares_pos + ro_summaries.shares_zero) AS shares_atorbetter, ro_summaries.shares_pos, ro_summaries.shares_neg, ro_summaries.trades_total, ro_summaries.pi_pos, ro_summaries.pi_neg, ro_summaries.efq, ro_summaries.effective_spread, ro_summaries.quoted_spread, ro_summaries.realized_spread, ro_summaries.speed, ro_summaries.part1_shares, ro_summaries.orders_filled, ro_summaries.efq_shares, ro_summaries.exchange, ro_summaries.notional_value FROM msco_data.ro_summaries; offending query: select count(*) OVER() AS full_count, source_code, sum(shares_atorbetter) shares_atorbetter, 'All ' day_display, 'All ' entry_firm, 'All ' route_firm, 'All ' symbol, 'All ' order_type, 0 bucket_id, case when sum(shares_total) > 0 then (sum(shares_atorbetter)::numeric/sum(shares_total)::numeric)*100 end at_or_better, case WHEN sum((shares_total)) > 0 THEN (sum(shares_pos)::numeric/sum((shares_total))::numeric)::numeric*100 END pi_percent, coalesce(sum(pi_pos),0) pi_amount, case WHEN sum(shares_pos) > 0 THEN (sum(pi_pos)::numeric / sum(shares_pos)::numeric)::numeric ELSE 0 END pi_per_unit, case WHEN sum((shares_total)) > 0 THEN (sum(shares_neg)::numeric/sum((shares_total))::numeric)::numeric*100 END slippage_percent, coalesce(sum(pi_neg),0) slippage_amount, case WHEN sum((efq_shares)) > 0 THEN sum((efq_shares)*effective_spread)::numeric/sum((efq_shares))::numeric END effective_spread, case WHEN sum((efq_shares)) > 0 THEN sum((efq_shares)*quoted_spread)::numeric/sum((efq_shares))::numeric END quoted_spread, case WHEN sum((efq_shares)) > 0 THEN sum((efq_shares)*realized_spread)::numeric/sum((efq_shares))::numeric END realized_spread, case when sum((efq_shares)*quoted_spread) > 0 THEN (sum((efq_shares)*effective_spread)::numeric/sum((efq_shares)*quoted_spread)::numeric) * 100 END EFQ, sum(trades_total) trades, sum((shares_total)) shares, sum(shares_pos) shares_pos, sum(shares_neg) shares_neg, sum(orders_sent) orders, sum(shares_sent) ordered_shares, sum(pi_pos + pi_neg) variance, case when sum((shares_total)) > 0 THEN sum(pi_pos + pi_neg) / sum((shares_total))::numeric END avg_variance, case when sum(orders_sent) > 0 THEN sum(shares_sent) / sum(orders_sent) END avg_shares_order, case when sum(shares_sent) > 0 then least((sum((shares_total))::numeric/sum(shares_sent)::numeric)::numeric*100,100) END pct_shares_executed, sum(notional_value) notional_value, sum(orders_filled) orders_filled, case when sum(part1_shares) > 0 then sum(part1_shares * speed)::numeric / sum(part1_shares)::numeric end speed from ro_summaries_v where day = '2016-11-07' group by source_code ORDER BY day_display desc NULLS LAST LIMIT 25 OFFSET 0; Commenting out source_code (and the group by) allows the query to run. Alternately, commenting out several of the case statements also allows it to run. In the meantime I've set max_parallel_workers_per_gather = 0 and the query works fine. Please let me know if you need any more information.
Steve Randall <srandall@s3.com> writes: > running pg 9.6.1 on AWS > max_parallel_workers_per_gather = 4 > This only occurs in some schemas, not all. The affected schemas have > approx 1 million rows in the table. Much larger schemas (65 million rows) > and smaller schemas(100K rows) do not have the problem. FWIW, I failed to duplicate this from the information provided. Can you get us EXPLAIN output for the troublesome case, or does it fail before printing the EXPLAIN output? How about a stack trace from the errfinish call? regards, tom lane
I have been able to reproduce the error in my development environment. EXPLAIN output is attached. EXPLAIN ANALYZE fails with the error mentioned.
I have to do a little configuring to get the stack trace logs, but it will be a couple of weeks before they are ready. Tomorrow I head to San Francisco for a few days off prior to pgConf.
On Wed, Nov 9, 2016 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Randall <srandall@s3.com> writes:
> running pg 9.6.1 on AWS
> max_parallel_workers_per_gather = 4
> This only occurs in some schemas, not all. The affected schemas have
> approx 1 million rows in the table. Much larger schemas (65 million rows)
> and smaller schemas(100K rows) do not have the problem.
FWIW, I failed to duplicate this from the information provided.
Can you get us EXPLAIN output for the troublesome case, or does it
fail before printing the EXPLAIN output? How about a stack trace
from the errfinish call?
regards, tom lane
Вложения
Steve Randall <srandall@s3.com> writes: > I have been able to reproduce the error in my development environment. > EXPLAIN output is attached. EXPLAIN ANALYZE fails with the error > mentioned. Ah, I've duplicated it. My previous attempt to reverse-engineer your test case was not selecting a partial-aggregation plan. Immediate impression is that the logic for planning partial grouped aggregation did not get the zero-sort-keys case right. In my hands it produces TRAP: BadArgument("!(nkeys > 0)", File: "tuplesort.c", Line: 763) 2016-11-10 10:44:26.955 EST [14415] LOG: server process (PID 15726) was t= erminated by signal 6: Aborted but in a non-assert build of course you'd get some other misbehavior. regards, tom lane
I wrote: > Immediate impression is that the logic for planning partial grouped > aggregation did not get the zero-sort-keys case right. Yeah: after flattening your view, the planner was left with GROUP BY 'S3'::character varying, which is a no-op, but it mistakenly inserted a no-op Sort into the plan to sort by that. One-liner fix is here if you need it right away: https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommitdiff;h=3D7= defc3b97a31537547053946808a83e7234d1b61 regards, tom lane
Thank you. On Thu, Nov 10, 2016 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Immediate impression is that the logic for planning partial grouped > > aggregation did not get the zero-sort-keys case right. > > Yeah: after flattening your view, the planner was left with GROUP BY > 'S3'::character varying, which is a no-op, but it mistakenly inserted > a no-op Sort into the plan to sort by that. One-liner fix is here > if you need it right away: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h= > 7defc3b97a31537547053946808a83e7234d1b61 > > > regards, tom lane >