Обсуждение: BUG #15032: Segmentation fault when running a particular query

Поиск
Список
Период
Сортировка

BUG #15032: Segmentation fault when running a particular query

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15032
Logged by:          Guo Xiang Tan
Email address:      gxtan1990@gmail.com
PostgreSQL version: 10.1
Operating system:   Ubuntu 14.04.4 LTS
Description:

## `SELECT version()`

`PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit`

## Query that results in segmentation fault

```
UPDATE posts
SET avg_time = (x.gmean / 1000)
FROM (SELECT post_timings.topic_id,
             post_timings.post_number,
             round(exp(avg(CASE WHEN msecs > 0 THEN ln(msecs) ELSE 0 END)))
AS gmean
      FROM post_timings
      INNER JOIN posts AS p2
        ON p2.post_number = post_timings.post_number
          AND p2.topic_id = post_timings.topic_id
          AND p2.user_id <> post_timings.user_id
      GROUP BY post_timings.topic_id, post_timings.post_number) AS x
WHERE (x.topic_id = posts.topic_id
  AND x.post_number = posts.post_number
  AND (posts.avg_time <> (x.gmean / 1000)::int OR posts.avg_time IS
NULL));
```

## Error in PostgreSQL log file

```
[163]: [15-1] LOG:  server process (PID 1983) was terminated by signal 11:
Segmentation fault
[163]: [16-1] DETAIL:  Failed process was running: UPDATE posts
    SET avg_time = (x.gmean / 1000)
    FROM (SELECT post_timings.topic_id,
                 post_timings.post_number,
                 round(exp(avg(CASE WHEN msecs > 0 THEN ln(msecs) ELSE 0 END)))
AS gmean
          FROM post_timings
          INNER JOIN posts AS p2
            ON p2.post_number = post_timings.post_number
              AND p2.topic_id = post_timings.topic_id
              AND p2.user_id <> post_timings.user_id
          GROUP BY post_timings.topic_id, post_timings.post_number) AS x
    WHERE (x.topic_id = posts.topic_id
      AND x.post_number = posts.post_number
      AND (posts.avg_time <> (x.gmean / 1000)::int OR posts.avg_time IS
NULL))
    ;
[163]: [17-1] LOG:  terminating any other active server processes
[2003]: [4-1] WARNING:  terminating connection because of crash of another
server process
[2003]: [5-1] DETAIL:  The postmaster has commanded this server process to
roll back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
[2003]: [6-1] HINT:  In a moment you should be able to reconnect to the
database and repeat your command.
[2015]: [4-1] WARNING:  terminating connection because of crash of another
server process
[2015]: [5-1] DETAIL:  The postmaster has commanded this server process to
roll back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
```

## pg_dump for relevant tables


```
--
-- PostgreSQL database dump
--

-- Dumped from database version 10.1
-- Dumped by pg_dump version 10.1

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: posts; Type: TABLE; Schema: public; Owner: -
--

CREATE TABLE posts (
    id integer NOT NULL,
    user_id integer,
    topic_id integer NOT NULL,
    post_number integer NOT NULL,
    raw text NOT NULL,
    cooked text NOT NULL,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    reply_to_post_number integer,
    reply_count integer DEFAULT 0 NOT NULL,
    quote_count integer DEFAULT 0 NOT NULL,
    deleted_at timestamp without time zone,
    off_topic_count integer DEFAULT 0 NOT NULL,
    like_count integer DEFAULT 0 NOT NULL,
    incoming_link_count integer DEFAULT 0 NOT NULL,
    bookmark_count integer DEFAULT 0 NOT NULL,
    avg_time integer,
    score double precision,
    reads integer DEFAULT 0 NOT NULL,
    post_type integer DEFAULT 1 NOT NULL,
    vote_count integer DEFAULT 0 NOT NULL,
    sort_order integer,
    last_editor_id integer,
    hidden boolean DEFAULT false NOT NULL,
    hidden_reason_id integer,
    notify_moderators_count integer DEFAULT 0 NOT NULL,
    spam_count integer DEFAULT 0 NOT NULL,
    illegal_count integer DEFAULT 0 NOT NULL,
    inappropriate_count integer DEFAULT 0 NOT NULL,
    last_version_at timestamp without time zone NOT NULL,
    user_deleted boolean DEFAULT false NOT NULL,
    reply_to_user_id integer,
    percent_rank double precision DEFAULT 1.0,
    notify_user_count integer DEFAULT 0 NOT NULL,
    like_score integer DEFAULT 0 NOT NULL,
    deleted_by_id integer,
    edit_reason character varying(255),
    word_count integer,
    version integer DEFAULT 1 NOT NULL,
    cook_method integer DEFAULT 1 NOT NULL,
    wiki boolean DEFAULT false NOT NULL,
    baked_at timestamp without time zone,
    baked_version integer,
    hidden_at timestamp without time zone,
    self_edits integer DEFAULT 0 NOT NULL,
    reply_quoted boolean DEFAULT false NOT NULL,
    via_email boolean DEFAULT false NOT NULL,
    raw_email text,
    public_version integer DEFAULT 1 NOT NULL,
    action_code character varying(255),
    image_url character varying
);


--
-- Name: TABLE posts; Type: COMMENT; Schema: public; Owner: -
--

COMMENT ON TABLE posts IS 'If you want to query public posts only, use the
badge_posts view.';


--
-- Name: COLUMN posts.post_number; Type: COMMENT; Schema: public; Owner: -
--

COMMENT ON COLUMN posts.post_number IS 'The position of this post in the
topic. The pair (topic_id, post_number) forms a natural key on the posts
table.';


--
-- Name: COLUMN posts.raw; Type: COMMENT; Schema: public; Owner: -
--

COMMENT ON COLUMN posts.raw IS 'The raw Markdown that the user entered into
the composer.';


--
-- Name: COLUMN posts.cooked; Type: COMMENT; Schema: public; Owner: -
--

COMMENT ON COLUMN posts.cooked IS 'The processed HTML that is presented in a
topic.';


--
-- Name: COLUMN posts.reply_to_post_number; Type: COMMENT; Schema: public;
Owner: -
--

COMMENT ON COLUMN posts.reply_to_post_number IS 'If this post is a reply to
another, this column is the post_number of the post it''s replying to. [FKEY
posts.topic_id, posts.post_number]';


--
-- Name: COLUMN posts.reply_quoted; Type: COMMENT; Schema: public; Owner:
-
--

COMMENT ON COLUMN posts.reply_quoted IS 'This column is true if the post
contains a quote-reply, which causes the in-reply-to indicator to be
absent.';


--
-- Name: topics; Type: TABLE; Schema: public; Owner: -
--

CREATE TABLE topics (
    id integer NOT NULL,
    title character varying(255) NOT NULL,
    last_posted_at timestamp without time zone,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    views integer DEFAULT 0 NOT NULL,
    posts_count integer DEFAULT 0 NOT NULL,
    user_id integer,
    last_post_user_id integer NOT NULL,
    reply_count integer DEFAULT 0 NOT NULL,
    featured_user1_id integer,
    featured_user2_id integer,
    featured_user3_id integer,
    avg_time integer,
    deleted_at timestamp without time zone,
    highest_post_number integer DEFAULT 0 NOT NULL,
    image_url character varying(255),
    like_count integer DEFAULT 0 NOT NULL,
    incoming_link_count integer DEFAULT 0 NOT NULL,
    category_id integer,
    visible boolean DEFAULT true NOT NULL,
    moderator_posts_count integer DEFAULT 0 NOT NULL,
    closed boolean DEFAULT false NOT NULL,
    archived boolean DEFAULT false NOT NULL,
    bumped_at timestamp without time zone NOT NULL,
    has_summary boolean DEFAULT false NOT NULL,
    vote_count integer DEFAULT 0 NOT NULL,
    archetype character varying(255) DEFAULT 'regular'::character varying
NOT NULL,
    featured_user4_id integer,
    notify_moderators_count integer DEFAULT 0 NOT NULL,
    spam_count integer DEFAULT 0 NOT NULL,
    pinned_at timestamp without time zone,
    score double precision,
    percent_rank double precision DEFAULT 1.0 NOT NULL,
    subtype character varying(255),
    slug character varying(255),
    deleted_by_id integer,
    participant_count integer DEFAULT 1,
    word_count integer,
    excerpt character varying(1000),
    pinned_globally boolean DEFAULT false NOT NULL,
    pinned_until timestamp without time zone,
    fancy_title character varying(400),
    highest_staff_post_number integer DEFAULT 0 NOT NULL,
    featured_link character varying,
    CONSTRAINT has_category_id CHECK (((category_id IS NOT NULL) OR
((archetype)::text <> 'regular'::text))),
    CONSTRAINT pm_has_no_category CHECK (((category_id IS NULL) OR
((archetype)::text <> 'private_message'::text)))
);


--
-- Name: TABLE topics; Type: COMMENT; Schema: public; Owner: -
--

COMMENT ON TABLE topics IS 'To query public topics only: SELECT ... FROM
topics t LEFT INNER JOIN categories c ON (t.category_id = c.id AND
c.read_restricted = false)';


--
-- Name: post_timings; Type: TABLE; Schema: public; Owner: -
--

CREATE TABLE post_timings (
    topic_id integer NOT NULL,
    post_number integer NOT NULL,
    user_id integer NOT NULL,
    msecs integer NOT NULL
);


--
-- Name: posts_id_seq; Type: SEQUENCE; Schema: public; Owner: -
--

CREATE SEQUENCE posts_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;


--
-- Name: posts_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
--

ALTER SEQUENCE posts_id_seq OWNED BY posts.id;


--
-- Name: topics_id_seq; Type: SEQUENCE; Schema: public; Owner: -
--

CREATE SEQUENCE topics_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;


--
-- Name: topics_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
--

ALTER SEQUENCE topics_id_seq OWNED BY topics.id;


--
-- Name: posts id; Type: DEFAULT; Schema: public; Owner: -
--

ALTER TABLE ONLY posts ALTER COLUMN id SET DEFAULT
nextval('posts_id_seq'::regclass);


--
-- Name: topics id; Type: DEFAULT; Schema: public; Owner: -
--

ALTER TABLE ONLY topics ALTER COLUMN id SET DEFAULT
nextval('topics_id_seq'::regclass);


--
-- Name: topics forum_threads_pkey; Type: CONSTRAINT; Schema: public; Owner:
-
--

ALTER TABLE ONLY topics
    ADD CONSTRAINT forum_threads_pkey PRIMARY KEY (id);


--
-- Name: posts posts_pkey; Type: CONSTRAINT; Schema: public; Owner: -
--

ALTER TABLE ONLY posts
    ADD CONSTRAINT posts_pkey PRIMARY KEY (id);


--
-- Name: idx_posts_created_at_topic_id; Type: INDEX; Schema: public; Owner:
-
--

CREATE INDEX idx_posts_created_at_topic_id ON posts USING btree (created_at,
topic_id) WHERE (deleted_at IS NULL);


--
-- Name: idx_posts_deleted_posts; Type: INDEX; Schema: public; Owner: -
--

CREATE INDEX idx_posts_deleted_posts ON posts USING btree (topic_id,
post_number) WHERE (deleted_at IS NOT NULL);


--
-- Name: idx_posts_user_id_deleted_at; Type: INDEX; Schema: public; Owner:
-
--

CREATE INDEX idx_posts_user_id_deleted_at ON posts USING btree (user_id)
WHERE (deleted_at IS NULL);


--
-- Name: idx_topics_front_page; Type: INDEX; Schema: public; Owner: -
--

CREATE INDEX idx_topics_front_page ON topics USING btree (deleted_at,
visible, archetype, category_id, id);


--
-- Name: idx_topics_user_id_deleted_at; Type: INDEX; Schema: public; Owner:
-
--

CREATE INDEX idx_topics_user_id_deleted_at ON topics USING btree (user_id)
WHERE (deleted_at IS NULL);


--
-- Name: idxtopicslug; Type: INDEX; Schema: public; Owner: -
--

CREATE INDEX idxtopicslug ON topics USING btree (slug) WHERE ((deleted_at IS
NULL) AND (slug IS NOT NULL));


--
-- Name: index_forum_threads_on_bumped_at; Type: INDEX; Schema: public;
Owner: -
--

CREATE INDEX index_forum_threads_on_bumped_at ON topics USING btree
(bumped_at DESC);


--
-- Name: index_post_timings_on_user_id; Type: INDEX; Schema: public; Owner:
-
--

CREATE INDEX index_post_timings_on_user_id ON post_timings USING btree
(user_id);


--
-- Name: index_posts_on_reply_to_post_number; Type: INDEX; Schema: public;
Owner: -
--

CREATE INDEX index_posts_on_reply_to_post_number ON posts USING btree
(reply_to_post_number);


--
-- Name: index_posts_on_topic_id_and_post_number; Type: INDEX; Schema:
public; Owner: -
--

CREATE UNIQUE INDEX index_posts_on_topic_id_and_post_number ON posts USING
btree (topic_id, post_number);


--
-- Name: index_posts_on_user_id_and_created_at; Type: INDEX; Schema: public;
Owner: -
--

CREATE INDEX index_posts_on_user_id_and_created_at ON posts USING btree
(user_id, created_at);


--
-- Name: index_topics_on_created_at_and_visible; Type: INDEX; Schema:
public; Owner: -
--

CREATE INDEX index_topics_on_created_at_and_visible ON topics USING btree
(created_at, visible) WHERE ((deleted_at IS NULL) AND ((archetype)::text <>
'private_message'::text));


--
-- Name: index_topics_on_id_and_deleted_at; Type: INDEX; Schema: public;
Owner: -
--

CREATE INDEX index_topics_on_id_and_deleted_at ON topics USING btree (id,
deleted_at);


--
-- Name: index_topics_on_lower_title; Type: INDEX; Schema: public; Owner:
-
--

CREATE INDEX index_topics_on_lower_title ON topics USING btree
(lower((title)::text));


--
-- Name: index_topics_on_pinned_at; Type: INDEX; Schema: public; Owner: -
--

CREATE INDEX index_topics_on_pinned_at ON topics USING btree (pinned_at)
WHERE (pinned_at IS NOT NULL);


--
-- Name: index_topics_on_pinned_globally; Type: INDEX; Schema: public;
Owner: -
--

CREATE INDEX index_topics_on_pinned_globally ON topics USING btree
(pinned_globally) WHERE pinned_globally;


--
-- Name: post_timings_summary; Type: INDEX; Schema: public; Owner: -
--

CREATE INDEX post_timings_summary ON post_timings USING btree (topic_id,
post_number);


--
-- Name: post_timings_unique; Type: INDEX; Schema: public; Owner: -
--

CREATE UNIQUE INDEX post_timings_unique ON post_timings USING btree
(topic_id, post_number, user_id);


--
-- PostgreSQL database dump complete
--
```

## Stack trace after following
https://wiki.edubuntu.org/DebuggingProgramCrash

```
Program received signal SIGSEGV, Segmentation fault.
index_markpos (scan=0x0) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:373
373    /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:
No such file or directory.

(gdb) bt
#0  index_markpos (scan=0x0) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:373
#1  0x000055a812746c68 in ExecMergeJoin (pstate=0x55a8131bc778) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeMergejoin.c:1188
#2  0x000055a81272cf3f in ExecProcNode (node=0x55a8131bc778) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250
#3  EvalPlanQualNext (epqstate=epqstate@entry=0x55a81318c518) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:3005
#4  0x000055a81272d342 in EvalPlanQual (estate=estate@entry=0x55a81318c018,
epqstate=epqstate@entry=0x55a81318c518,
relation=relation@entry=0x7f4e4e25ab68, rti=1, lockmode=<optimized out>,
tid=tid@entry=0x7ffd54492330, priorXmax=8959603) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:2521
#5  0x000055a812747af7 in ExecUpdate (mtstate=mtstate@entry=0x55a81318c468,
tupleid=tupleid@entry=0x7ffd54492450, oldtuple=oldtuple@entry=0x0,
slot=<optimized out>, slot@entry=0x55a8131a2f08,
planSlot=planSlot@entry=0x55a81319db60,
epqstate=epqstate@entry=0x55a81318c518, estate=0x55a81318c018, canSetTag=1
'\001') at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1113
#6  0x000055a8127480c5 in ExecModifyTable (pstate=0x55a81318c468) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1770
#7  0x000055a81272a28b in ExecProcNode (node=0x55a81318c468) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250
#8  ExecutePlan (execute_once=<optimized out>, dest=0x7f4e4e18d7a8,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_UPDATE, use_parallel_mode=<optimized out>,
planstate=0x55a81318c468, estate=0x55a81318c018) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:1722
#9  standard_ExecutorRun (queryDesc=0x55a813156838, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:363
#10 0x000055a812864f05 in ProcessQuery (plan=<optimized out>,
sourceText=0x55a8130f3b08 "UPDATE posts\n", ' ' <repeats 24 times>, "SET
avg_time = (x.gmean / 1000)\n", ' ' <repeats 24 times>, "FROM (SELECT
post_timings.topic_id,\n", ' ' <repeats 37 times>,
"post_timings.post_number,\n        "..., params=0x0, queryEnv=0x0,
dest=0x7f4e4e18d7a8, completionTag=0x7ffd544927f0 "") at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/pquery.c:161
#11 0x000055a81286515b in PortalRunMulti
(portal=portal@entry=0x55a81318a008, isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x7f4e4e18d7a8, altdest=altdest@entry=0x7f4e4e18d7a8,
completionTag=completionTag@entry=0x7ffd544927f0 "") at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/pquery.c:1286
#12 0x000055a812865e04 in PortalRun (portal=portal@entry=0x55a81318a008,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
run_once=run_once@entry=1 '\001', dest=dest@entry=0x7f4e4e18d7a8,
altdest=altdest@entry=0x7f4e4e18d7a8, completionTag=0x7ffd544927f0 "") at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/pquery.c:799
#13 0x000055a8128618dc in exec_simple_query (query_string=0x55a8130f3b08
"UPDATE posts\n", ' ' <repeats 24 times>, "SET avg_time = (x.gmean /
1000)\n", ' ' <repeats 24 times>, "FROM (SELECT post_timings.topic_id,\n", '
' <repeats 37 times>, "post_timings.post_number,\n        "...) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/postgres.c:1099
#14 0x000055a812862df6 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x55a81309b738, dbname=0x55a813053b78
"boingboing_discourse", username=<optimized out>) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/postgres.c:4088
#15 0x000055a81259cfe9 in BackendRun (port=0x55a813092c70) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:4357
#16 BackendStartup (port=0x55a813092c70) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:4029
#17 ServerLoop () at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:1753
#18 0x000055a8127f236b in PostmasterMain (argc=3, argv=<optimized out>) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:1361
#19 0x000055a81259e3e5 in main (argc=3, argv=0x55a813051a90) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/main/main.c:228

(gdb) bt full
#0  index_markpos (scan=0x0) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:373
        __func__ = "index_markpos"
#1  0x000055a812746c68 in ExecMergeJoin (pstate=0x55a8131bc778) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeMergejoin.c:1188
        node = 0x55a8131bc778
        joinqual = 0x55a8131bcac8
        otherqual = 0x0
        qualResult = <optimized out>
        compareResult = <optimized out>
        innerPlan = 0x55a8131c8958
        innerTupleSlot = <optimized out>
        outerPlan = 0x55a8131bdd60
        outerTupleSlot = <optimized out>
        econtext = 0x55a8131bc888
        doFillOuter = 0 '\000'
        doFillInner = 0 '\000'
        __func__ = "ExecMergeJoin"
#2  0x000055a81272cf3f in ExecProcNode (node=0x55a8131bc778) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250
No locals.
#3  EvalPlanQualNext (epqstate=epqstate@entry=0x55a81318c518) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:3005
        oldcontext = 0x55a813146198
#4  0x000055a81272d342 in EvalPlanQual (estate=estate@entry=0x55a81318c018,
epqstate=epqstate@entry=0x55a81318c518,
relation=relation@entry=0x7f4e4e25ab68, rti=1, lockmode=<optimized out>,
tid=tid@entry=0x7ffd54492330, priorXmax=8959603) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:2521
        slot = <optimized out>
        copyTuple = 0x7f4e4e167d68
#5  0x000055a812747af7 in ExecUpdate (mtstate=mtstate@entry=0x55a81318c468,
tupleid=tupleid@entry=0x7ffd54492450, oldtuple=oldtuple@entry=0x0,
slot=<optimized out>, slot@entry=0x55a8131a2f08,
planSlot=planSlot@entry=0x55a81319db60,
epqstate=epqstate@entry=0x55a81318c518, estate=0x55a81318c018, canSetTag=1
'\001') at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1113
        epqslot = <optimized out>
        lockmode = LockTupleNoKeyExclusive
        tuple = <optimized out>
        resultRelInfo = 0x55a81318c2b8
        resultRelationDesc = 0x7f4e4e25ab68
        result = <optimized out>
        hufd = {ctid = {ip_blkid = {bi_hi = 2, bi_lo = 29090}, ip_posid =
7}, xmax = 8959603, cmax = 4294967295}
        recheckIndexes = 0x0
        __func__ = "ExecUpdate"
#6  0x000055a8127480c5 in ExecModifyTable (pstate=0x55a81318c468) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1770
        node = 0x55a81318c468
        estate = 0x55a81318c018
        operation = CMD_UPDATE
        saved_resultRelInfo = 0x0
        resultRelInfo = 0x55a81318c2b8
        subplanstate = 0x55a81318d238
        junkfilter = 0x55a8131a50a8
        slot = 0x55a8131a2f08
        tupleid = 0x7ffd54492450
        tuple_ctid = {ip_blkid = {bi_hi = 2, bi_lo = 29090}, ip_posid =
11}
        oldtupdata = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo =
0}, ip_posid = 26680}, t_tableOid = 21928, t_data = 0x55a81318c018}
        oldtuple = 0x0
        __func__ = "ExecModifyTable"
#7  0x000055a81272a28b in ExecProcNode (node=0x55a81318c468) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250
No locals.
#8  ExecutePlan (execute_once=<optimized out>, dest=0x7f4e4e18d7a8,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_UPDATE, use_parallel_mode=<optimized out>,
planstate=0x55a81318c468, estate=0x55a81318c018) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:1722
        slot = <optimized out>
        current_tuple_count = 0
#9  standard_ExecutorRun (queryDesc=0x55a813156838, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:363
        estate = 0x55a81318c018
        operation = CMD_UPDATE
        dest = 0x7f4e4e18d7a8
        sendTuples = <optimized out>
        __func__ = "standard_ExecutorRun"
#10 0x000055a812864f05 in ProcessQuery (plan=<optimized out>,
sourceText=0x55a8130f3b08 "UPDATE posts\n", ' ' <repeats 24 times>, "SET
avg_time = (x.gmean / 1000)\n", ' ' <repeats 24 times>, "FROM (SELECT
post_timings.topic_id,\n", ' ' <repeats 37 times>,
"post_timings.post_number,\n        "..., params=0x0, queryEnv=0x0,
dest=0x7f4e4e18d7a8, completionTag=0x7ffd544927f0 "") at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/pquery.c:161
        queryDesc = 0x55a813156838
#11 0x000055a81286515b in PortalRunMulti
(portal=portal@entry=0x55a81318a008, isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x7f4e4e18d7a8, altdest=altdest@entry=0x7f4e4e18d7a8,
completionTag=completionTag@entry=0x7ffd544927f0 "") at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/pquery.c:1286
        pstmt = 0x7f4e4e18d6c8
        active_snapshot_set = 1 '\001'
        stmtlist_item = 0x7f4e4e18d758
#12 0x000055a812865e04 in PortalRun (portal=portal@entry=0x55a81318a008,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
run_once=run_once@entry=1 '\001', dest=dest@entry=0x7f4e4e18d7a8,
altdest=altdest@entry=0x7f4e4e18d7a8, completionTag=0x7ffd544927f0 "") at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/pquery.c:799
        save_exception_stack = 0x7ffd54492a20
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {1, -2410647209184299319,
94180362802640, 140726017533936, 94180363247624, 94180362802688,
-2410647209169619255, -8493467030827796791}, __mask_was_saved = 0,
__saved_mask = {__val = {8192, 0, 0, 12099560782854432544, 0,
94180356443286, 64, 140726017533776, 94180363247624, 88, 94180362802640, 2,
139974294427512, 140726017533808, 94180354932849, 2}}}}
        result = <optimized out>
        nprocessed = <optimized out>
        saveTopTransactionResourceOwner = 0x55a8130a1938
        saveTopTransactionContext = 0x55a8130a1bd8
        saveActivePortal = 0x0
        saveResourceOwner = 0x55a8130a1938
        savePortalContext = 0x0
        saveMemoryContext = 0x55a8130a1bd8
        __func__ = "PortalRun"
#13 0x000055a8128618dc in exec_simple_query (query_string=0x55a8130f3b08
"UPDATE posts\n", ' ' <repeats 24 times>, "SET avg_time = (x.gmean /
1000)\n", ' ' <repeats 24 times>, "FROM (SELECT post_timings.topic_id,\n", '
' <repeats 37 times>, "post_timings.post_number,\n        "...) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/postgres.c:1099
        parsetree = 0x55a81311d5d0
        portal = 0x55a81318a008
        snapshot_set = <optimized out>
        commandTag = <optimized out>
        completionTag =

"\000(IT\375\177\000\000\001u\230\022\250U\000\000\060(IT\375\177\000\000(\022\346\022\001\000\000\000\234$\336\022\250U\000\000\234$\336\022\250U\000\000\b;\017\023\250U\000\000\001\000\000\000\000\000\000"
        querytree_list = <optimized out>
        plantree_list = <optimized out>
        receiver = 0x7f4e4e18d7a8
        format = 0
        dest = DestRemote
        parsetree_list = 0x55a81311d620
        parsetree_item = 0x55a81311d600
        save_log_statement_stats = 0 '\000'
        was_logged = 0 '\000'
        isTopLevel = 1 '\001'
        msec_str =
"\000(IT\375\177\000\000\001u\230\022\250U\000\000\060(IT\375\177\000\000(\022\346\022\001\000\000"
        __func__ = "exec_simple_query"
#14 0x000055a812862df6 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x55a81309b738, dbname=0x55a813053b78
"boingboing_discourse", username=<optimized out>) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/tcop/postgres.c:4088
        query_string = 0x55a8130f3b08 "UPDATE posts\n", ' ' <repeats 24
times>, "SET avg_time = (x.gmean / 1000)\n", ' ' <repeats 24 times>, "FROM
(SELECT post_timings.topic_id,\n", ' ' <repeats 37 times>,
"post_timings.post_number,\n        "...
        firstchar = 81
        input_message = {data = 0x55a8130f3b08 "UPDATE posts\n", ' '
<repeats 24 times>, "SET avg_time = (x.gmean / 1000)\n", ' ' <repeats 24
times>, "FROM (SELECT post_timings.topic_id,\n", ' ' <repeats 37 times>,
"post_timings.post_number,\n        "..., len = 942, maxlen = 1024, cursor =
942}
        local_sigjmp_buf = {{__jmpbuf = {140726017534400,
-2410647209056373047, 94180362270520, 1, 94180362270200, 94180362234992,
-2410647208949418295, -8493467028741261623}, __mask_was_saved = 1,
__saved_mask = {__val = {0, 140726017534704, 94180352583639, 94180359415580,
140726017534620, 94180359731456, 94180356159802, 140726017534928, 686, 0,
94180359927584, 94180362234992, 139974255290296, 94180359464352,
206158430256, 140726017534912}}}}
        send_ready_for_query = 0 '\000'
        disable_idle_in_transaction_timeout = <optimized out>
        __func__ = "PostgresMain"
#15 0x000055a81259cfe9 in BackendRun (port=0x55a813092c70) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:4357
        ac = 1
        secs = 570264982
        usecs = 174473
        i = 1
        av = 0x55a81309b738
        maxac = <optimized out>
#16 BackendStartup (port=0x55a813092c70) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:4029
        bn = <optimized out>
        pid = <optimized out>
#17 ServerLoop () at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:1753
        rmask = {fds_bits = {64, 0 <repeats 15 times>}}
        selres = <optimized out>
        now = <optimized out>
        readmask = {fds_bits = {112, 0 <repeats 15 times>}}
        last_lockfile_recheck_time = 1516949750
        last_touch_time = 1516949630
        __func__ = "ServerLoop"
#18 0x000055a8127f236b in PostmasterMain (argc=3, argv=<optimized out>) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/postmaster/postmaster.c:1361
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = 1 '\001'
        i = <optimized out>
        output_config_variable = <optimized out>
        __func__ = "PostmasterMain"
#19 0x000055a81259e3e5 in main (argc=3, argv=0x55a813051a90) at
/build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/main/main.c:228
No locals.
```

## Configuration changes

```
shared_buffers = "1GB"
work_mem = "16MB"
maintenance_work_mem = "16MB"
``` 


Do feel free to let me know if I'm required to provide more information and
I'll be happy to do so.



Re: BUG #15032: Segmentation fault when running a particular query

От
Tom Lane
Дата:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> ## Query that results in segmentation fault

Unsurprisingly, the given info is not enough to reproduce the crash.
However, looking at the stack trace:

> (gdb) bt
> #0  index_markpos (scan=0x0) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:373
> #1  0x000055a812746c68 in ExecMergeJoin (pstate=0x55a8131bc778) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeMergejoin.c:1188
> #2  0x000055a81272cf3f in ExecProcNode (node=0x55a8131bc778) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250
> #3  EvalPlanQualNext (epqstate=epqstate@entry=0x55a81318c518) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:3005
> #4  0x000055a81272d342 in EvalPlanQual (estate=estate@entry=0x55a81318c018,
> epqstate=epqstate@entry=0x55a81318c518,
> relation=relation@entry=0x7f4e4e25ab68, rti=1, lockmode=<optimized out>,
> tid=tid@entry=0x7ffd54492330, priorXmax=8959603) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:2521
> #5  0x000055a812747af7 in ExecUpdate (mtstate=mtstate@entry=0x55a81318c468,
> tupleid=tupleid@entry=0x7ffd54492450, oldtuple=oldtuple@entry=0x0,
> slot=<optimized out>, slot@entry=0x55a8131a2f08,
> planSlot=planSlot@entry=0x55a81319db60,
> epqstate=epqstate@entry=0x55a81318c518, estate=0x55a81318c018, canSetTag=1
> '\001') at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1113

it seems fairly clear that somebody passed a NULL scandesc pointer
to index_markpos.  Looking at the only two callers of that function,
this must mean that either an IndexScan's node->iss_ScanDesc or an
IndexOnlyScan's node->ioss_ScanDesc was null.  (We don't see
ExecIndexMarkPos in the trace because the compiler optimized the tail
call.)  And that leads me to commit 09529a70b, which changed the logic
in those node types to allow initialization of the index scandesc to be
delayed to the first tuple fetch, rather than necessarily performed during
ExecInitNode.

Because this is happening inside an EvalPlanQual, it's unsurprising
that we'd be taking an unusual code path.  I believe what happened
was that the IndexScan node returned a jammed-in EPQ tuple on its
first call, and so hadn't opened the scandesc at all, while
ExecMergeJoin would do an ExecMarkPos if the tuple matched (which
it typically would if we'd gotten to EPQ), whereupon kaboom.

It's tempting to think that this is an oversight in commit 09529a70b
and we need to rectify it by something along the lines of teaching
ExecIndexMarkPos and ExecIndexOnlyMarkPos to initialize the scandesc
if needed before calling index_markpos.

However, on further reflection, it seems like this is a bug of far
older standing, to wit that ExecIndexMarkPos/ExecIndexRestrPos
are doing entirely the wrong thing when EPQ is active.  It's not
meaningful, or at least not correct, to be messing with the index
scan state at all in that case.  Rather, what the scan is supposed
to do is return the single jammed-in EPQ tuple, and what "restore"
ought to mean is "clear my es_epqScanDone flag so that that tuple
can be returned again".

It's not clear to me whether the failure to do that has any real
consequences though.  It would only matter if there's more than
one tuple available on the outer side of the mergejoin, which
I think there never would be in an EPQ situation.  Still, if there
ever were more outer tuples, the mergejoin would misbehave and maybe
even crash itself (because it probably assumes that restoring to
a point where there had been a tuple would allow it to re-fetch
that tuple successfully).

So what I'm inclined to do is teach the mark/restore infrastructure
to do the right thing with EPQ state when EPQ is active.  But I'm not
clear on whether that needs to be back-patched earlier than v10.

            regards, tom lane


Re: BUG #15032: Segmentation fault when running a particular query

От
Guo Xiang Tan
Дата:
Unsurprisingly, the given info is not enough to reproduce the crash.

We could only reproduce this on our production PostgreSQL cluster. I took a full pg_dump and restored into a PostgreSQL cluster (10.1) locally but could not reproduce the seg fault. If it helps, the data in the cluster was restored via pg_dump (10.1) from a PostgreSQL 9.5 cluster.

On Sat, Jan 27, 2018 at 2:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> ## Query that results in segmentation fault

Unsurprisingly, the given info is not enough to reproduce the crash.
However, looking at the stack trace:

> (gdb) bt
> #0  index_markpos (scan=0x0) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/access/index/indexam.c:373
> #1  0x000055a812746c68 in ExecMergeJoin (pstate=0x55a8131bc778) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeMergejoin.c:1188
> #2  0x000055a81272cf3f in ExecProcNode (node=0x55a8131bc778) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/include/executor/executor.h:250
> #3  EvalPlanQualNext (epqstate=epqstate@entry=0x55a81318c518) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:3005
> #4  0x000055a81272d342 in EvalPlanQual (estate=estate@entry=0x55a81318c018,
> epqstate=epqstate@entry=0x55a81318c518,
> relation=relation@entry=0x7f4e4e25ab68, rti=1, lockmode=<optimized out>,
> tid=tid@entry=0x7ffd54492330, priorXmax=8959603) at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/execMain.c:2521
> #5  0x000055a812747af7 in ExecUpdate (mtstate=mtstate@entry=0x55a81318c468,
> tupleid=tupleid@entry=0x7ffd54492450, oldtuple=oldtuple@entry=0x0,
> slot=<optimized out>, slot@entry=0x55a8131a2f08,
> planSlot=planSlot@entry=0x55a81319db60,
> epqstate=epqstate@entry=0x55a81318c518, estate=0x55a81318c018, canSetTag=1
> '\001') at
> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeModifyTable.c:1113

it seems fairly clear that somebody passed a NULL scandesc pointer
to index_markpos.  Looking at the only two callers of that function,
this must mean that either an IndexScan's node->iss_ScanDesc or an
IndexOnlyScan's node->ioss_ScanDesc was null.  (We don't see
ExecIndexMarkPos in the trace because the compiler optimized the tail
call.)  And that leads me to commit 09529a70b, which changed the logic
in those node types to allow initialization of the index scandesc to be
delayed to the first tuple fetch, rather than necessarily performed during
ExecInitNode.

Because this is happening inside an EvalPlanQual, it's unsurprising
that we'd be taking an unusual code path.  I believe what happened
was that the IndexScan node returned a jammed-in EPQ tuple on its
first call, and so hadn't opened the scandesc at all, while
ExecMergeJoin would do an ExecMarkPos if the tuple matched (which
it typically would if we'd gotten to EPQ), whereupon kaboom.

It's tempting to think that this is an oversight in commit 09529a70b
and we need to rectify it by something along the lines of teaching
ExecIndexMarkPos and ExecIndexOnlyMarkPos to initialize the scandesc
if needed before calling index_markpos.

However, on further reflection, it seems like this is a bug of far
older standing, to wit that ExecIndexMarkPos/ExecIndexRestrPos
are doing entirely the wrong thing when EPQ is active.  It's not
meaningful, or at least not correct, to be messing with the index
scan state at all in that case.  Rather, what the scan is supposed
to do is return the single jammed-in EPQ tuple, and what "restore"
ought to mean is "clear my es_epqScanDone flag so that that tuple
can be returned again".

It's not clear to me whether the failure to do that has any real
consequences though.  It would only matter if there's more than
one tuple available on the outer side of the mergejoin, which
I think there never would be in an EPQ situation.  Still, if there
ever were more outer tuples, the mergejoin would misbehave and maybe
even crash itself (because it probably assumes that restoring to
a point where there had been a tuple would allow it to re-fetch
that tuple successfully).

So what I'm inclined to do is teach the mark/restore infrastructure
to do the right thing with EPQ state when EPQ is active.  But I'm not
clear on whether that needs to be back-patched earlier than v10.

                        regards, tom lane

Re: BUG #15032: Segmentation fault when running a particular query

От
Tom Lane
Дата:
Guo Xiang Tan <gxtan1990@gmail.com> writes:
>> Unsurprisingly, the given info is not enough to reproduce the crash.

> We could only reproduce this on our production PostgreSQL cluster.

Since the problem occurs during EvalPlanQual, it'd only manifest when
the update touched a concurrently-updated row, which is why it's hard
to reproduce if you aren't deliberately trying to trigger it.  But
I've been able to reproduce a crash at the same spot, and have made
a test case using the isolationtester infrastructure (attached).

I'll look into an actual fix tomorrow.

            regards, tom lane

diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 7ff6f6b..53a32e6 100644
*** a/src/test/isolation/specs/eval-plan-qual.spec
--- b/src/test/isolation/specs/eval-plan-qual.spec
*************** setup
*** 21,33 ****
   CREATE TABLE table_b (id integer, value text);
   INSERT INTO table_a VALUES (1, 'tableAValue');
   INSERT INTO table_b VALUES (1, 'tableBValue');
  }

  teardown
  {
   DROP TABLE accounts;
   DROP TABLE p CASCADE;
!  DROP TABLE table_a, table_b;
  }

  session "s1"
--- 21,36 ----
   CREATE TABLE table_b (id integer, value text);
   INSERT INTO table_a VALUES (1, 'tableAValue');
   INSERT INTO table_b VALUES (1, 'tableBValue');
+
+  CREATE TABLE jointest AS SELECT generate_series(1,10) AS id, 0 AS data;
+  CREATE INDEX ON jointest(id);
  }

  teardown
  {
   DROP TABLE accounts;
   DROP TABLE p CASCADE;
!  DROP TABLE table_a, table_b, jointest;
  }

  session "s1"
*************** step "updateforss"    {
*** 78,83 ****
--- 81,96 ----
      UPDATE table_b SET value = 'newTableBValue' WHERE id = 1;
  }

+ # these tests exercise mark/restore during EPQ recheck, cf bug #15032
+
+ step "selectjoinforupdate"    {
+     set enable_hashjoin to 0;
+     set enable_seqscan to 0;
+     explain (costs off)
+     select * from jointest a join jointest b on a.id=b.id for update;
+     select * from jointest a join jointest b on a.id=b.id for update;
+ }
+

  session "s2"
  setup        { BEGIN ISOLATION LEVEL READ COMMITTED; }
*************** step "readforss"    {
*** 104,109 ****
--- 117,123 ----
      WHERE ta.id = 1 FOR UPDATE OF ta;
  }
  step "wrtwcte"    { UPDATE table_a SET value = 'tableAValue2' WHERE id = 1; }
+ step "wrjt"    { UPDATE jointest SET data = 42 WHERE id = 7; }
  step "c2"    { COMMIT; }

  session "s3"
*************** permutation "wx2" "partiallock" "c2" "c1
*** 135,137 ****
--- 149,152 ----
  permutation "wx2" "lockwithvalues" "c2" "c1" "read"
  permutation "updateforss" "readforss" "c1" "c2"
  permutation "wrtwcte" "readwcte" "c1" "c2"
+ permutation "wrjt" "selectjoinforupdate" "c2" "c1"

Re: BUG #15032: Segmentation fault when running a particular query

От
Tom Lane
Дата:
I wrote:
> I'll look into an actual fix tomorrow.

I've pushed a fix for this.  I concluded that there's no real reason
to touch branches before 10: the index_markpos call is useless in the
case in question, but it doesn't do any more than waste some cycles.

            regards, tom lane


Re: BUG #15032: Segmentation fault when running a particular query

От
Guo Xiang Tan
Дата:
Hi Tom, 

We've upgraded to 10.2 and are not getting segmentation faults while running the query anymore. Thank you for the fix.

On Sun, Jan 28, 2018 at 2:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> I'll look into an actual fix tomorrow.

I've pushed a fix for this.  I concluded that there's no real reason
to touch branches before 10: the index_markpos call is useless in the
case in question, but it doesn't do any more than waste some cycles.

                        regards, tom lane