Re: Reduce planning time for large NOT IN lists containing NULL
| От | Ilia Evdokimov |
|---|---|
| Тема | Re: Reduce planning time for large NOT IN lists containing NULL |
| Дата | |
| Msg-id | 4c761b02-5a60-4076-aa0c-9c6fef06e2c1@tantorlabs.com обсуждение |
| Ответ на | Re: Reduce planning time for large NOT IN lists containing NULL (David Geier <geidav.pg@gmail.com>) |
| Ответы |
Re: Reduce planning time for large NOT IN lists containing NULL
|
| Список | pgsql-hackers |
On 2/24/26 11:29, David Geier wrote:
Using array_contains_nulls() seems fine. In case the IN list doesn't contain NULL, the function can immediately bail thanks to the !ARR_HASNULL() check in the beginning. It only needs to iterate over the NULL-bitmap, if it exists. This is the case if there's actually a NULL element in the array, or if the array initially contained NULL and all NULLs got removed subsequently. If we ever find the latter case to matter we could remove the NULL-bitmap in array_set_element() / array_set_element_expanded(), when the last NULL element got removed.Could you clarify what exactly this additional test meant to verify?Zsolt's test case creates an array that initially contains NULL. The NULL element is subsequently replaced by a non-NULL value but array_set_element_expanded() keeps the NULL-bitmap around. With that, your ARR_ISNULL() check bails and causes the selectivity estimation to incorrectly return 0.
Ah, right - thanks for the clarification. I agree.
Regarding the regression test: the suggestion test case is good, but there is not a straightforward way to expose the estimated row count without also showing the costs, and costs are unstable. To avoid that, I reused the parsing approach already present in stats_ext.sql to extract only the estimated row count from EXPLAIN.
Since the test table contains exactly 1000 rows and we run ANALYZE, all rows are included in the statistics sample. Therefore the estimate for x <> ALL(array[1, 99, 2]) is deterministically 997 rows, and the test stable and ensures we detect the incorrect early-zero estimate.
Let me know if you'd prefer a different approach. I've attached v4 patch.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
Вложения
В списке pgsql-hackers по дате отправления: