Re: Zedstore - compressed in-core columnar storage

Поиск
Список
Период
Сортировка
От Alexandra Wang
Тема Re: Zedstore - compressed in-core columnar storage
Дата
Msg-id CACiyaSryy-=9K6k5kL6rJdSK5FAjJb63axyQdjQez5VofDdQXA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Zedstore - compressed in-core columnar storage  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Ответы Re: Zedstore - compressed in-core columnar storage  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Список pgsql-hackers


On Tue, Aug 27, 2019 at 12:03 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
My point is, once we find the leaf page containing the given tid, we go through each item in the page until we find the data corresponding to the given tid which means we kind of perform a sequential scan at the page level. I'm referring to the below loop in zsbt_attr_scan_fetch_array().

        for (off = FirstOffsetNumber; off <= maxoff; off++)
        {
            ItemId      iid = PageGetItemId(page, off);
            ZSAttributeArrayItem *item = (ZSAttributeArrayItem *) PageGetItem(page, iid);

            if (item->t_endtid <= nexttid)
                continue;

            if (item->t_firsttid > nexttid)
                break;


But that's not true for IndexScan in case of heap table because there the index tuple contains the exact physical location of tuple in the heap. So, there is no need to scan the entire page.

You are correct that we currently go through each item in the leaf page that
contains the given tid, specifically, the logic to retrieve all the attribute
items inside a ZSAttStream is now moved to decode_attstream() in the latest
code, and then in zsbt_attr_fetch() we again loop through each item we
previously retrieved from decode_attstream() and look for the given tid. One
optimization we can to is to tell decode_attstream() to stop decoding at the
tid we are interested in. We can also apply other tricks to speed up the
lookups in the page, for fixed length attribute, it is easy to do binary search
instead of linear search, and for variable length attribute, we can probably
try something that we didn't think of yet. 
 

1) In zsundo_insert_finish(), there is a double call to BufferGetPage(undobuf); Is that required ?

Fixed, thanks!
 

2) In zedstoream_fetch_row(), why is zsbt_tid_begin_scan() being called twice? I'm referring to the below code.

    if (fetch_proj->num_proj_atts == 0)
    {
        ....
        ....
        zsbt_tid_begin_scan(rel, tid, tid + 1,
                            snapshot,
                            &fetch_proj->tid_scan);
        fetch_proj->tid_scan.serializable = true;

        for (int i = 1; i < fetch_proj->num_proj_atts; i++)
        {
            int         attno = fetch_proj->proj_atts[i];

            zsbt_attr_begin_scan(rel,  reldesc, attno,
                                 &fetch_proj->attr_scans[i - 1]);
        }
        MemoryContextSwitchTo(oldcontext);

        zsbt_tid_begin_scan(rel, tid, tid + 1, snapshot, &fetch_proj->tid_scan);
     }

I removed the second call, thanks!

 
Also, for all types of update operation (be it key or non-key update) we create a new tid for the new version of tuple. Can't we use the tid associated with the old tuple for the cases where there is no concurrent transactions to whom the old tuple is still visible.

Zedstore currently implement update as delete+insert, hence the old tid is not
reused. We don't store the tuple in our UNDO log, and we only store the
transaction information in the UNDO log. Reusing the tid of the old tuple means
putting the old tuple in the UNDO log, which we have not implemented yet.


Thanks for reporting, this is very helpful! Patches are welcome as well!

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free orcorruption (!prev)
Следующее
От: Nikolay Samokhvalov
Дата:
Сообщение: Re: PostgreSQL and Real Application Testing (RAT)