Re: pgsql: Extended statistics on expressions

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: pgsql: Extended statistics on expressions
Дата
Msg-id d20c9df2-9d2b-780c-3e04-b90dc7b9a5b8@enterprisedb.com
обсуждение исходный текст
Ответ на Re: pgsql: Extended statistics on expressions  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: pgsql: Extended statistics on expressions  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-committers

On 3/31/21 10:05 AM, David Rowley wrote:
> Hi Tomas,
> 
> I'm debugging a crash after running sqllancer on current master. The
> first bad commit seems to be this one.
> 
> The crash stack trace is:
> 
> Program received signal SIGSEGV, Segmentation fault.
> pg_detoast_datum_packed (datum=0x5) at fmgr.c:1759
> 1759 if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum))
> (gdb) bt
> #0  pg_detoast_datum_packed (datum=0x5) at fmgr.c:1759
> #1  0x000055e460c99da2 in textlike (fcinfo=0x7fff1bbc39e0) at like.c:287
> #2  0x000055e460d44dde in FunctionCall2Coll
> (flinfo=flinfo@entry=0x7fff1bbc3ae0, collation=<optimized out>,
> arg1=<optimized out>, arg2=<optimized out>) at fmgr.c:1163
> #3  0x000055e460bd6dad in mcv_get_match_bitmap (clauses=<optimized
> out>, keys=0x55e4614627e8, exprs=0x0, mcvlist=<optimized out>,
> is_or=<optimized out>, root=<optimized out>) at mcv.c:1706
> #4  0x000055e460bd965d in mcv_clauselist_selectivity
> (root=root@entry=0x55e46145fca0, stat=stat@entry=0x55e461462790,
> clauses=clauses@entry=0x55e46144cf08, varRelid=varRelid@entry=0,
> jointype=jointype@entry=JOIN_INNER,
>     sjinfo=sjinfo@entry=0x0, rel=0x55e461461f18,
> basesel=0x7fff1bbc3c88, totalsel=0x7fff1bbc3c90) at mcv.c:2043
> #5  0x000055e460bd6364 in statext_mcv_clauselist_selectivity
> (is_or=<optimized out>, estimatedclauses=<optimized out>,
> rel=<optimized out>, sjinfo=<optimized out>, jointype=<optimized out>,
> varRelid=<optimized out>,
>     clauses=<optimized out>, root=<optimized out>) at extended_stats.c:1916
> #6  statext_clauselist_selectivity (root=root@entry=0x55e46145fca0,
> clauses=clauses@entry=0x55e461462ad0, varRelid=varRelid@entry=0,
> jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0,
> rel=0x55e461461f18,
>     estimatedclauses=0x7fff1bbc3d28, is_or=false) at extended_stats.c:1948
> #7  0x000055e460b0e973 in clauselist_selectivity_ext
> (root=root@entry=0x55e46145fca0, clauses=0x55e461462ad0,
> varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
> sjinfo=0x0, use_extended_stats=true) at clausesel.c:155
> #8  0x000055e460b0e346 in clause_selectivity_ext
> (root=root@entry=0x55e46145fca0, clause=0x55e46144c8f8,
> varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
> sjinfo=sjinfo@entry=0x0,
>     use_extended_stats=use_extended_stats@entry=true) at clausesel.c:838
> #9  0x000055e460b0e1a4 in clauselist_selectivity_or
> (use_extended_stats=<optimized out>, sjinfo=<optimized out>,
> jointype=JOIN_INNER, varRelid=0, clauses=0x55e461462cc0,
> root=0x55e46145fca0) at clausesel.c:414
> #10 clause_selectivity_ext (root=0x55e46145fca0, clause=<optimized
> out>, varRelid=0, jointype=JOIN_INNER, sjinfo=<optimized out>,
> use_extended_stats=use_extended_stats@entry=true) at clausesel.c:851
> #11 0x000055e460b0e863 in clauselist_selectivity_ext
> (root=root@entry=0x55e46145fca0, clauses=0x55e46144cd28,
> varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
> sjinfo=sjinfo@entry=0x0,
>     use_extended_stats=use_extended_stats@entry=true) at
> ../../../../src/include/nodes/pg_list.h:260
> #12 0x000055e460b0eacf in clauselist_selectivity
> (root=root@entry=0x55e46145fca0, clauses=<optimized out>,
> varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
> sjinfo=sjinfo@entry=0x0) at clausesel.c:108
> #13 0x000055e460b16992 in set_baserel_size_estimates
> (root=root@entry=0x55e46145fca0, rel=rel@entry=0x55e461461f18) at
> costsize.c:4753
> #14 0x000055e460b0b971 in set_plain_rel_size (rte=<optimized out>,
> rel=0x55e461461f18, root=0x55e46145fca0) at allpaths.c:583
> #15 set_rel_size (root=0x55e46145fca0, rel=0x55e461461f18, rti=1,
> rte=0x55e4614220a8) at allpaths.c:412
> #16 0x000055e460b0d8c0 in set_base_rel_sizes (root=<optimized out>) at
> allpaths.c:323
> 
> I'm struggling a bit to reproduce this consistently.
> 
> If you do:
> 
> psql -c "CREATE ROLE sqlancer;" postgres
> created test;
> psql -f database0.sql test
> 
> then run the following set of commands:
> 
> insert into t1 values(1234,false),(5678,true);
> analyze;
> delete from t1;
> SELECT t1.c0 FROM ONLY t1 WHERE
>
((((((((upper('%|1j]<#^j\???u,???D'))LIKE((((('h???x')||(178227136)))||((('(-2073106655,58629343]'::int4range)*('(-1903439243,-1774128827)'::int4range)))))))AND((((t1.c0)
> IN (-91026522, 0.6000845835151706))OR(t1.c1)))))OR(t1.c1)))OR(CAST(t1.c1
> AS BOOLEAN)));
> 
> It seems to be a corrupt Datum in the 2nd argument to textlike().
> 
> (gdb) frame 3
> #3  0x000055e460bd6dad in mcv_get_match_bitmap (clauses=<optimized
> out>, keys=0x55e461480148, exprs=0x0, mcvlist=<optimized out>,
> is_or=<optimized out>, root=<optimized out>) at mcv.c:1706
> 1706 match = DatumGetBool(FunctionCall2Coll(&opproc,
> (gdb) p i
> $7 = 0
> (gdb) p idx
> $8 = 2
> 
> I've not studied the code in great detail, but per above, idx == 2 and
> you're doing item->values[idx].  Isn't the index of 2 out of bounds of
> the most_common_vals column below?
> 
> test1=# select most_common_vals from pg_stats_ext;
>   most_common_vals
> ---------------------
>  {{1234,f},{5678,t}}
> (1 row)
> 
> The code that sets idx to 2 (mcv_match_expression()) looks a bit
> weird.  These stats don't have any exprs and the "expr" past to
> mcv_match_expression is an OpExpr.  The function just goes and sets
> idx = bms_num_members(keys); then does 0 loops due to the empty
> "exprs" and returns 2. The Assert() does not help catch not finding
> anything since it checks idx >= bms_num_members(keys), which is 2.
> 
> I'm not quite sure how this is all meant to work.  Are you able to look further?
> 

Thanks for the report, I'll take a look. You're right this seems like an
out-of-bounds access, but mcv_match_expression is only expected to be
run on expressions we know are in the statistics (because we pick the
statistics like that). Clearly, that does not happen here, not sure why.

It's quite weird that we end up running textlike(), when the statistics
is on (double precision, boolean) columns ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-committers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: pgsql: Rework planning and execution of UPDATE and DELETE.
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: pgsql: Extended statistics on expressions