Sent from my iPad
On 05-Sep-2013, at 8:58, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> (2013/09/05 3:59), Alvaro Herrera wrote:
>> Tomas Vondra wrote:
>>
>>> My idea was to keep the per-database stats, but allow some sort of
>>> "random" access - updating / deleting the records in place, adding
>>> records etc. The simplest way I could think of was adding a simple
>>> "index" - a mapping of OID to position in the stat file.
>>>
>>> I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or
>>> something like that. This would make it quite simple to access existing
>>> record
>>>
>>> 1: get position from the index
>>> 2: read sizeof(Entry) from the file
>>> 3: if it's update, just overwrite the bytes, for delete set isdeleted
>>> flag (needs to be added to all entries)
>>>
>>> or reading all the records (just read the whole file as today).
>>
>> Sounds reasonable. However, I think the index should be a real index,
>> i.e. have a tree structure that can be walked down, not just a plain
>> array. If you have a 400 MB stat file, then you must have about 4
>> million tables, and you will not want to scan such a large array every
>> time you want to find an entry.
>
> I thought an array structure at first.
>
> But, for now, I think we should have a real index for the
> statistics data because we already have several index storages,
> and it will allow us to minimize read/write operations.
>
> BTW, what kind of index would be preferred for this purpose?
> btree or hash?
>
> If we use btree, do we need "range scan" thing on the statistics
> tables? I have no idea so far.
>
The thing I am interested in is range scan. That is the reason I wish to explore range tree usage here, maybe as a
secondaryindex.
Regards,
Atri