Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Дата
Msg-id 575aa0fe-311d-d3d0-1f1a-abb1b9665a63@enterprisedb.com
обсуждение исходный текст
Ответ на Re: BRIN minmax multi - incorrect distance for infinite timestamp/date  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: BRIN minmax multi - incorrect distance for infinite timestamp/date  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
On 10/18/23 12:47, Ashutosh Bapat wrote:
> On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Hi,
>>
>> Here's a couple cleaned-up patches fixing the various discussed here.
>> I've tried to always add a regression test demonstrating the issue
>> first, and then fix it in the next patch.
> 
> It will be good to commit the test changes as well.
> 

I do plan to commit them, ofc. I was just explaining why I'm adding the
tests first, and then fixing the issue in a separate commit.

>>
>> In particular, this deals with these issues:
>>
>> 1) overflows in distance calculation for large timestamp values (0002)
> 
> I could reduce the SQL for timestamp overflow test to just
> -- test overflows during CREATE INDEX with extreme timestamp values
> CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);
> 
> SET datestyle TO iso;
> 
> INSERT INTO brin_timestamp_test VALUES
> ('4713-01-01 00:00:30 BC'),
> ('294276-12-01 00:00:01');
> 
> CREATE INDEX ON brin_timestamp_test USING brin (a
> timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
> 
> I didn't understand the purpose of adding 60 odd values to the table.
> It didn't tell which of those values triggers the overflow. Minimal
> set above is much easier to understand IMO. Using a temporary table
> just avoids DROP TABLE statement. But I am ok if you want to use
> non-temporary table with DROP.
> 
> Code changes in 0002 look fine. Do we want to add a comment "cast to a
> wider datatype to avoid overflow"? Or is that too explicit?
> 
> The code changes fix the timestamp issue but there's a diff in case of
> 

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

>>
>> 2) incorrect subtraction in distance for date values (0003)
> 
> The test case for date brin index didn't crash though. Even after
> applying 0003 patch. The reason why date subtraction can't overflow is
> a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
> because of the code below
> #define IS_VALID_DATE(d) \
> ((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
> (d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
> This prevents the lower side to be well within the negative int32
> overflow threshold and we always subtract higher value from the lower
> one. May be good to elaborate this? A later patch does use float 8
> casting eliminating "any" possibility of overflow. So the comment may
> not be necessary after squashing all the changes.
> 

Not sure what you mean by "crash". Yes, it doesn't hit an assert,
because there's none when calculating distance for date. It however
should fail in the query plan check due to the incorrect order of
subtractions.

Also, the commit message does not claim to fix overflow. In fact, it
says it can't overflow ...

>>
>> 3) incorrect distance for infinite date/timestamp values (0005)
> 
> The tests could use a minimal set of rows here too.
> 
> The code changes look fine and fix the problem seen with the tests alone.
> 

OK

>>
>> 4) failing distance for extreme interval values (0007)
> 
> I could reproduce the issue with a minimal set of values
> -- test handling of overflow for interval values
> CREATE TABLE brin_interval_test(a INTERVAL);
> 
> INSERT INTO brin_interval_test VALUES
> ('177999985 years'),
> ('-178000000 years');
> 
> CREATE INDEX ON brin_interval_test USING brin (a
> interval_minmax_multi_ops) WITH (pages_per_range=1);
> DROP TABLE brin_interval_test;
> 
> The code looks fine and fixed the issue seen with the test.
> 
> We may want to combine various test cases though. Like the test adding
> infinity and extreme values could be combined. Also the number of
> values it inserts in the table for the reasons stated above.
> 

I prefer not to do that. I find it more comprehensible to keep the tests
separate / testing different things. If the tests were expensive to
setup or something like that, that'd be a different situation.

>>
>> All the problems except "2" have been discussed earlier, but this seems
>> a bit more serious than the other issues, as it's easier to hit. It
>> subtracts the values in the opposite order (smaller - larger), so the
>> distances are negated. Which means we actually merge the values from the
>> most distant ones, and thus are "guaranteed" to build very a very
>> inefficient summary. People with multi-minmax indexes on "date" columns
>> probably will need to reindex.
>>
> 
> Right. Do we highlight that in the commit message so that the person
> writing release notes picks it up from there?
> 

Yes, I think I'll mention what impact each issue can have on indexes.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: The danger of deleting backup_label
Следующее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: Query execution in Perl TAP tests needs work