Обсуждение: BUG #17926: Segfault in SELECT

Поиск
Список
Период
Сортировка

BUG #17926: Segfault in SELECT

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17926
Logged by:          Erik Oomen
Email address:      erik@nib4.nl
PostgreSQL version: 15.2
Operating system:   Debian11 bullseye arm64
Description:

We are facing an issue where a SELECT can cause a segfault, unfortunately
the backtrace is not useful:
Reading symbols from /lib/aarch64-linux-gnu/libffi.so.7...
Reading symbols from
/root/.cache/debuginfod_client/b07109d62d007af8c6bffe3d76561e03711a1a6c/debuginfo...
0x0000ffff88d1311c in __GI_epoll_pwait (epfd=4, events=0xaaab109dd330,
maxevents=1, timeout=timeout@entry=-1, set=set@entry=0x0) at
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
Download failed: Invalid argument.  Continuing without source file
./misc/../sysdeps/unix/sysv/linux/epoll_pwait.c.
42    ../sysdeps/unix/sysv/linux/epoll_pwait.c: No such file or directory.
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00010000017fedec in ?? ()
(gdb) bt
#0  0x00010000017fedec in ?? ()
#1  0x0000ffff017febd0 in ?? ()
#2  0x0000aaab10af7fa8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The query uses a 3 partitions of a table where all fields have an brin
index.
This query will segfault:
SELECT count(*)
FROM dw
WHERE (dw.ts >= '2022-09-01' AND dw.ts <= '2022-10-30')
AND dw.source='type3' AND dw.customer='123.456';

The query should return 0 because the customer does not exist.
Removing the dw.customer or dw.source constraint will make the segfault not
occur.
Also using 'set enable_bitmapscan to off' will not trigger the segfault.

Details:
 PostgreSQL 15.2 (Debian 15.2-1.pgdg110+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit


Re: BUG #17926: Segfault in SELECT

От
Daniel Gustafsson
Дата:
> On 8 May 2023, at 22:12, PG Bug reporting form <noreply@postgresql.org> wrote:

> The query uses a 3 partitions of a table where all fields have an brin
> index.

One index per column or a multi-colum index?

> This query will segfault:
> SELECT count(*)
> FROM dw
> WHERE (dw.ts >= '2022-09-01' AND dw.ts <= '2022-10-30')
> AND dw.source='type3' AND dw.customer='123.456';
>
> The query should return 0 because the customer does not exist.
> Removing the dw.customer or dw.source constraint will make the segfault not
> occur.
> Also using 'set enable_bitmapscan to off' will not trigger the segfault.

I was unable to reproduce this, can you share the table schema with the index
creations (or a version of the schema which reproduce the problem).

Have you tried recreating the indexes, in case there is corruption, or tried it
on a table which has the same schema but no index?

--
Daniel Gustafsson




Re: BUG #17926: Segfault in SELECT

От
Erik Oomen
Дата:

> Op 9 mei 2023, om 10:38 heeft Daniel Gustafsson <daniel@yesql.se> het volgende geschreven:
>
>> On 8 May 2023, at 22:12, PG Bug reporting form <noreply@postgresql.org> wrote:
>
>> The query uses a 3 partitions of a table where all fields have an brin
>> index.
>
> One index per column or a multi-colum index?

One index per column.  However, if  I add a multicolumn index (on ts and source) it will not crash.

>
>> This query will segfault:
>> SELECT count(*)
>> FROM dw
>> WHERE (dw.ts >= '2022-09-01' AND dw.ts <= '2022-10-30')
>> AND dw.source='type3' AND dw.customer='123.456';
>>
>> The query should return 0 because the customer does not exist.
>> Removing the dw.customer or dw.source constraint will make the segfault not
>> occur.
>> Also using 'set enable_bitmapscan to off' will not trigger the segfault.
>
> I was unable to reproduce this, can you share the table schema with the index
> creations (or a version of the schema which reproduce the problem).
>

create table if not exists dw (
    activity_id bigserial,
    activity text not null,
    ts timestamptz not null,
    customer text,
    impact int default 0,
    feature_1 text,
    feature_2 text,
    feature_3 jsonb,
    link text,
    source text,
    source_id text,
    environment text
) partition by range(ts);

CREATE INDEX IF NOT EXISTS index_activity ON dw USING brin(activity);
CREATE INDEX IF NOT EXISTS index_customer ON dw USING brin(customer);
CREATE INDEX IF NOT EXISTS index_source ON dw USING brin(source);
CREATE INDEX IF NOT EXISTS index_source_id ON dw USING brin(source_id);
CREATE INDEX IF NOT EXISTS index_ts ON dw USING brin(ts);
CREATE INDEX IF NOT EXISTS index_environment ON dw USING brin(environment);

> Have you tried recreating the indexes, in case there is corruption, or tried it
> on a table which has the same schema but no index?

It still segfaults after recreating the indexes, Without index no segfault.

>
> --
> Daniel Gustafsson
>
>

Erik.




Re: BUG #17926: Segfault in SELECT

От
Alvaro Herrera
Дата:
On 2023-May-08, PG Bug reporting form wrote:

> Program received signal SIGSEGV, Segmentation fault.
> 0x00010000017fedec in ?? ()
> (gdb) bt
> #0  0x00010000017fedec in ?? ()
> #1  0x0000ffff017febd0 in ?? ()
> #2  0x0000aaab10af7fa8 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> The query uses a 3 partitions of a table where all fields have an brin
> index.
> This query will segfault:
> SELECT count(*)
> FROM dw
> WHERE (dw.ts >= '2022-09-01' AND dw.ts <= '2022-10-30')
> AND dw.source='type3' AND dw.customer='123.456';
> 
> The query should return 0 because the customer does not exist.
> Removing the dw.customer or dw.source constraint will make the segfault not
> occur.
> Also using 'set enable_bitmapscan to off' will not trigger the segfault.

I'm not sure how to go about debugging this problem -- without a stack
trace, we don't know where to look.  I tried to reproduce it without
success, but I didn't try to add any data.  Which partitions do you
have, and how much data?  Does EXPLAIN (without ANALYZE) work, and if so
what does it report?

Would it be possible for you to run the query under 'rr record'?
There are some instructions here:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework

One possibility is that the index structure is corrupted in some way.
pageinspect's functions be useful, but I suppose you'd have to scan the
whole index in order to find what went wrong.  We don't have any tooling
for that ... I guess amcheck support would be a nice addition.

I guess the other option is that things are failing at optimizer or
executor setup time.  Having one index per column is unusual.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes"   (Germán Poo)