Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
От | shveta malik |
---|---|
Тема | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Дата | |
Msg-id | CAJpy0uC3Ke0cxkwhqfW3tYkaOU=qr2p_80XTW7jweiYMrHuwKw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (Peter Smith <smithpb2250@gmail.com>) |
Список | pgsql-hackers |
On Mon, Jul 14, 2025 at 2:08 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shveta, > > Thanks for your README questions. > > On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > Thank You for working on this. I started going through the README and > > tried running simple tests, have few concerns: > > > > 1) > > I am not able to understand section 4.2 'WOS-to-ROS conversion'. When > > whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID? > > Whiteout WOS remembers the tuple that needs to be deleted on the next > WOS-to-ROS transfer. There is a TID/CRID mapping, so the intended > meaning of “delete 4” in this diagram was “delete the ROS data which > has CRID 4”. > > > And when does delete-vector X represents? > > The delete vector is a bitset for knowing which records of ROS are > marked for deletion. IIUC, the bits of the “delete vector” are what > were previously in the “Whiteout ROS” -- i.e. the bits were set during > the previous WOS-to-ROS transfers. > > Updating the delete vector bits is cheap, but it is more expensive to > reconcile with ROS to delete the ROS data, so that happens only > periodically when some threshold is exceeded. See README 2.5.3 > “Garbage Collection”. But, the diagram is showing the result of > garbage collection at the same time as the WOS-to-ROS transfer. > > The “X” in the diagram was supposed to represent that the bit is set > to mark the CRID 2 columns for deletion. I’ve changed this now to be > 0’s and 1’s, which makes it consistent with the other description > about the delete vector in “2.3.3”. e.g. 1 means "marked for delete". > > > I did not get why ColA-2, > > ColA-4 and ColB-2, ColB-4 were removed in resultant data? > > The record (of CRID 4) was in the “Whiteout WOS”, so during > WOS-to-ROS transfer, the “delete vector” bit 4 would become set to > mark CRID 4 for deletion in the ROS. > > The “garbage collection” (aka “deleted-rows-collection”) happens > according to some threshold; however, this diagram shows what happens > when the threshold is reached at the same time as the WOS-to-ROS > transfer. > > e.g. So AFTER case shows result of the garbage collection as well: > > The “ColA-2 │ ColB-2” was removed because the delete vector bit 2 was > already set > The “ColA-4 │ ColB-4” was removed because the delete vector bit 4 > would become set (from having previously been in the Whiteout WOS) > > ~ > > Notice the CRIDs in the before/after are different because they were > renumbered after garbage collection. > > > Is the diagram complete? > > Yes, the diagram was complete, but hopefully it is easier to > understand now that I have made a few minor changes to it. FYI, see > also the PGConf.dev 2025 presentation slides [1] – it might help > understanding to see similar information presented slightly > differently. Thanks for improving the diagram. Now the flow is clear. > > 2) > > We can make the definition consistent at both places as the first one > > gives a feeling that rows are marked for deletion in WOS while the > > second one says ROS. > > > > Whiteout WOS = Record of WOS rows marked for deletion > > Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS > > > > Fixed. Both are now using 2nd wording. > > > 3) > > It is not part of README. But please help me understand the meaning > > and usage of this GUC in VCI context: > > vci.max_devices: Sets the maximum device number which can be attached. > > > > The WOS–to–ROS transfer can be done by background workers. IIUC, the > patch 0002 currently includes code for inspecting Linux devices > associated with tablespaces where any VCI indexes reside. The purpose > of this inspection is to discover the IO load so that VCI can > determine the best time to launch the background worker – e.g. > launching may be delayed longer if the system is deemed too busy. The > “vci.max_devices” puts a limit on the number of devices that can be > handled. All this logic is inherited from the product where this VCI > patch originated; I feel some of this may be overly complex for the > OSS patch's first version. We may be able to simplify/remove parts of > this logic – maybe even this GUC. Thanks for the explanation. The concept is quite tricky though. I will also look at the code to understand it better. > ... > > ------------- > > > > Few typos in README: > > a) Each VCI indexed column is stored as an internal relations. > > --relations --> relation > > Fixed. > > > > > b) Records are addresses by CRID (Columnar Record ID) instead of by TID. > > --addresses->addressed > > Fixed. > > > > > c) Extents can be found by ID using offsets in a column "meta-data" > > internal relation. > > -- by ID using offsets? Do you mean 'by using offsets' alone? > > I was trying to say the appropriate offset can be found using the > extent ID as the offset-array index. I’ve reworded this in the README > to be clearer. > > > > > d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE > > Fixed. > > ~~~ > > Please see the updated README. > > ====== > [1] slides - https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdf > > Kind Regards, > Peter Smith. > Fujitsu Australia
В списке pgsql-hackers по дате отправления: