Re: Zedstore - compressed in-core columnar storage

Поиск
Список
Период
Сортировка
От Ashwin Agrawal
Тема Re: Zedstore - compressed in-core columnar storage
Дата
Msg-id CALfoeitVUvPHJEjNacxSXmaH4OL+vmjmRGuToMSn=p_UGjHWaA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Zedstore - compressed in-core columnar storage  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers

On Tue, Apr 16, 2019 at 9:15 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I'm not sure it's that clear cut, actually. Sure, it's not the usual
(block,item) pair so it's not possible to jump to the exact location, so
it's not the raw physical identifier as regular TID. But the data are
organized in a btree, with the TID as a key, so it does actually provide
some information about the location.

From representation perspective its logical identifier. But yes since                                                                   
is used as used as key to layout datum's, there exists pretty good                                                                      
correlation between TIDs and physical location. Can consider it as                                                                      
clustered based on TID.

I've asked about BRIN indexes elsewhere in this thread, which I think is
related to this question, because that index type relies on TID providing
sufficient information about location. And I think BRIN indexes are going
to be rather important for colstores (and formats like ORC have something
very similar built-in).

But maybe all we'll have to do is define the ranges differently - instead
of "number of pages" we may define them as "number of rows" and it might
be working.

BRIN indexes work for zedstore right now. A block range maps to                                                                      
just a range of TIDs in zedstore, as pointed out above. When one converts a                                                                  
zstid to an ItemPointer, can get the "block number" from the                                                                            
ItemPointer, like from a normal heap TID. It doesn't mean the direct                                                                    
physical location of the row in zedstore, but that's fine.                                                                              
                                                                                                                                        
It might be sub-optimal in some cases. For example if one zedstore                                                                      
page contains TIDs 1-1000, and another 1000-2000, and the entry in the                                                                  
BRIN index covers TIDs 500-1500, have to access both zedstore                                                                           
pages. Would be better if the cutoff points in the BRIN index would                                                                     
match the physical pages of the zedstore. But it still works, and is                                                                    
probably fine in practice.                                                                                                              
                                                                                                                                        
Plan is to add integrated BRIN index in zedstore, means keep min-max                                                                    
values for appropriate columns within page. This will not help to                                                                       
eliminate the IO as external BRIN index does but helps to skip                                                                          
uncompression and visibility checks etc... for blocks not matching the                                                                  
conditions.                                                                                                                             
                                                                                                                                        
Just to showcase brin works for zedstore, played with hands-on example
mentioned in [1].                                                                                                                   
                                                                                                                                        
With btree index on zedstore
                                                                               QUERY PLAN                                                                                                                                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------                                                                                                          
 Aggregate  (cost=4351.50..4351.51 rows=1 width=32) (actual time=1267.140..1267.140 rows=1 loops=1)                                                                                                                                                                                
   ->  Index Scan using idx_ztemperature_log_log_timestamp on ztemperature_log  (cost=0.56..4122.28 rows=91686 width=4) (actual time=0.117..1244.112 rows=86400 loops=1)                                                                                                           
         Index Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp without time zone))                                                                                                              
 Planning Time: 0.240 ms                                                                                                                                                                                                                                                           
 Execution Time: 1269.016 ms                                                                                                                                                                                                                                                       
(5 rows)                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                   
With brin index on zedstore.
Note: Bitmap index for zedstore currently scans all the columns.
Scanning only required columns for query is yet to be implemented.                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                   
                                                                                       QUERY PLAN                                                                                                                                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------                                                                                           
 Finalize Aggregate  (cost=217538.85..217538.86 rows=1 width=32) (actual time=54.167..54.167 rows=1 loops=1)                                                                                                                                                                       
   ->  Gather  (cost=217538.63..217538.84 rows=2 width=32) (actual time=53.967..55.184 rows=3 loops=1)                                                                                                                                                                             
         Workers Planned: 2                                                                                                                                                                                                                                                        
         Workers Launched: 2                                                                                                                                                                                                                                                       
         ->  Partial Aggregate  (cost=216538.63..216538.64 rows=1 width=32) (actual time=42.956..42.957 rows=1 loops=3)                                                                                                                                                            
               ->  Parallel Bitmap Heap Scan on ztemperature_log  (cost=59.19..216446.98 rows=36660 width=4) (actual time=3.571..35.904 rows=28800 loops=3)                                                                                                                        
                     Recheck Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp without time zone))                                                                                                
                     Rows Removed by Index Recheck: 3968                                                                                                                                                                                                                           
                     Heap Blocks: lossy=381                                                                                                                                                                                                                                        
                     ->  Bitmap Index Scan on idx_ztemperature_log_log_timestamp  (cost=0.00..37.19 rows=98270 width=0) (actual time=1.201..1.201 rows=7680 loops=1)                                                                                                               
                           Index Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp without time zone))                                                                                            
 Planning Time: 0.240 ms                                                                                                                                                                                                                                                           
 Execution Time: 55.341 ms                                                                                                                                                                                                                                                         
(13 rows)                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
 schema_name |             index_name             | index_ratio | index_size | table_size                                                                                                                                                                                          
-------------+------------------------------------+-------------+------------+------------                                                                                                                                                                                         
 public      | idx_ztemperature_log_log_timestamp |           0 | 80 kB      | 1235 MB                                                                                                                                                                                             
(1 row)                                                                                                                                                                                                                                                                            

1] https://www.postgresql.fastware.com/blog/brin-indexes-what-are-they-and-how-do-you-use-them

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Calling PrepareTempTablespaces in BufFileCreateTemp
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Regression test coverage of GiST index build is awful