Re: Minmax indexes
От | Amit Kapila |
---|---|
Тема | Re: Minmax indexes |
Дата | |
Msg-id | CAA4eK1+YFF0wskPQhYs2kLW0abJMmFUhOotTcS9VgCkLQ_WgBA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Minmax indexes (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Fri, Sep 27, 2013 at 11:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Sep 26, 2013 at 1:46 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Amit Kapila escribió: >>> On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera >>> <alvherre@2ndquadrant.com> wrote: >> > >> >>> On Windows, patch gives below compilation errors: >>> src\backend\access\minmax\mmtuple.c(96): error C2057: expected >>> constant expression >> >> I have fixed all these compile errors (fix attached). Thanks for >> reporting them. I'll post a new version shortly. > > Thanks for fixing it. In last few days I had spent some time > reading about minmax or equivalent indexes in other databases (Netezza > and Oracle) and going through some parts of your proposal. Its a bit > bigger patch and needs much more time, but I would like to share my > findings/thoughts I had developed till now. > > Firstly about interface and use case, as far as I could understand > other databases provide this index automatically rather than having a > separate Create Index command which may be because such an index can > be mainly useful when the data is ordered or if it's distributed in > such a way that it's quite useful for repeatedly executing queries. > You have proposed it as a command which means user needs to take care > of it which I find is okay for first version, later may be we can also > have some optimisations so that it can get created automatically. > For the page range, If I read correctly, currently you have used hash > define, do you want to expose it to user in some way like GUC or > maintain it internally and assign the right value based on performance > of different queries? > > Operations on this index seems to be very fast, like Oracle has this > as an in-memory structure and I read in Netezza that write operations > doesn't carry any significant overhead for zone maps as compare to > other indexes, so shouldn't we consider it to be without WAL logged? > OTOH I think because these structures get automatically created in > those databases, so it might be okay but if we provide it as a > command, then user might be bothered if he didn't find it > automatically on server restart. > > Few Questions and observations: > 1. > + When a new heap tuple is inserted in a summarized page range, it is > possible to > + compare the existing index tuple with the new heap tuple. If the > heap tuple is > + outside the minimum/maximum boundaries given by the index tuple for > any indexed > + column (or if the new heap tuple contains null values but the index tuple > + indicate there are no nulls), it is necessary to create a new index tuple with > + the new values. To do this, a new index tuple is inserted, and the > reverse range > + map is updated to point to it. The old index tuple is left in > place, for later > + garbage collection. > > > Is there a reason why we can't directly update the value rather then > new insert in index, as I understand for other indexes like btree > we do this because we might need to rollback, but here even if after > updating the min or max value, rollback happens, it will not cause > any harm (tuple loss). > > 2. > + If the reverse range map points to an invalid TID, the corresponding > page range > + is not summarized. > > 3. > It might be better if you can mention when range map will point to an > invalid TID, it's not explained in your proposal, but you have used it > in you proposal to explain some other things. > > 4. > Range reverse map is a good terminology, but isn't Range translation > map better. I don't mind either way, it's just a thought came to my > mind while understanding concept of Range Reverse map. > > 5. > /* > * As above, except that instead of scanning the complete heap, only the given > * range is scanned. Scan to end-of-rel can be signalled by passing > * InvalidBlockNumber as end block number. > */ > double > IndexBuildHeapRangeScan(Relation heapRelation, > Relation indexRelation, > IndexInfo *indexInfo, > bool allow_sync, > BlockNumber start_blockno, > BlockNumber numblocks, > IndexBuildCallback callback, > void *callback_state) > > In comments you have used end block number, which parameter does it > refer to? I could see only start_blockno and numb locks? > > 6. > currently you are passing 0 as start block and InvalidBlockNumber as > number of blocks, what's the logic for it? > return IndexBuildHeapRangeScan(heapRelation, indexRelation, > indexInfo, allow_sync, > 0, InvalidBlockNumber, > callback, callback_state); I got it, I think here it means scan all the pages. > 7. > In mmbuildCallback, it only add's tuple to minmax index, if it > satisfies page range, else this can lead to waste of big scan incase > page range is large (1280 pages as you mentiones in one of your > mails). Why can't we include it end of scan? > > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: