Re: interval_ops shall stop using btequalimage (deduplication)

Поиск
Список
Период
Сортировка
От Donghang Lin
Тема Re: interval_ops shall stop using btequalimage (deduplication)
Дата
Msg-id CAA=D8a0J2N-waEPmbqT7xkySNZUbFnX5O80mnLxTxALBY2VcfQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: interval_ops shall stop using btequalimage (deduplication)  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: interval_ops shall stop using btequalimage (deduplication)  (Noah Misch <noah@leadboat.com>)
Re: interval_ops shall stop using btequalimage (deduplication)  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
> I've also caught btree posting lists where one TID refers to a '1d' heap
> tuple, while another TID refers to a '24h' heap tuple.  amcheck complains.
Index-only scans can return the '1d' bits where the actual tuple had the '24h'
bits.

Have a build without the patch. I can't reproduce amcheck complaints in release mode
where all these statements succeed. 
  create table t (c interval);
  insert into t select x from generate_series(1,500), (values ('1 year 1 month'::interval), ('1 year 30 days')) t(x);
  select distinct c from t;
  create index ti on t (c);
  select bt_index_check('ti'::regclass);
On following query, the query seems to return the right result sets, 
index-only scan doesn't seem to be mislead by the misuse of btequalimage 

  postgres=# vacuum (INDEX_CLEANUP on) t;
VACUUM
postgres=# explain (analyze, buffers) select c::text, count(c) from t group by c::text;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=49.27..49.29 rows=1 width=40) (actual time=3.329..3.333 rows=2 loops=1)
   Group Key: (c)::text
   Batches: 1  Memory Usage: 24kB
   Buffers: shared hit=6
   ->  Index Only Scan using ti on t  (cost=0.28..44.27 rows=1000 width=48) (actual time=0.107..2.269 rows=1000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=6
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.319 ms
 Execution Time: 3.432 ms
(11 rows)

postgres=# select c::text, count(c) from t group by c::text;
       c        | count
----------------+-------
 1 year 1 mon   |   500
 1 year 30 days |   500
(2 rows)

>  * Generic "equalimage" support function.
> *
> * B-Tree operator classes whose equality function could safely be replaced by
> * datum_image_eq() in all cases can use this as their "equalimage" support
> * function.
It seems to me that as long as a data type has a deterministic sort but not necessarily be equalimage,
it should be able to support deduplication.  e.g for interval type, we add a byte wise tie breaker 
after '24h' and '1day' are compared equally. In the btree, '24h' and '1day' are still adjacent, 
'1day' is always sorted before '24h' in a btree page, can we do dedup for each value 
without problem? 
The assertion will still be triggered as it's testing btequalimage, but I'll defer it for now. 
Wanted to know if the above idea sounds sane first...


Donghang Lin
(ServiceNow)


On Thu, Oct 12, 2023 at 4:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Oct 12, 2023 at 4:10 PM Noah Misch <noah@leadboat.com> wrote:
> > exactly one case like that post-fix (interval_ops is at least the only
> > affected core code opfamily), so why not point that out directly with
> > a HINT? A HINT could go a long way towards putting the problem in
> > context, without really adding a special case, and without any real
> > question of users being misled.
>
> Works for me.  Added.

Looks good. Thanks!

--
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: False "pg_serial": apparent wraparound” in logs
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Add support for AT LOCAL