AW: [HACKERS] newoid in invapi.c

Поиск
Список
Период
Сортировка
От Zeugswetter Andreas
Тема AW: [HACKERS] newoid in invapi.c
Дата
Msg-id 01BD4B58.F9B5FA20@pc9358.sd.spardat.at
обсуждение исходный текст
Список pgsql-hackers
>> This might lead the large object implementation to confuse
>> large object relations with other relations.
>>
>> According to me this is a bug. I'm I right?
>
>Yes, and no. LargeObjects are supposed to run within a transaction (if you
>don't then some fun things happen), and (someone correct me if I'm wrong)
>if newoid() is called from within the transaction, it is safe?
>
I see no evidence in the code that suggests that it is safe in transactions.
The GetNewObjectIdBlock() function which generates the OID blocks _does_
acquire a spinlock before it generates a new block of oids so usually all
will be well.
But sometimes ((a chance of <usercount>/32) when there <usercount> active
users
for the same db) the newoid might have a quite different value than
fileoid+1.

Again I see no evidence in the code that it is safe in transactions. I only
see evidence that it will _usually_ work.

yes, but currently it is very hard to produce this behavior, since we still only have table locks.
You would need more than 32 lob tables, accessed concurrently (not sure on that) ?
This area has to be specially watched when page, or row locks get implemented.

Actually I wonder how it could be efficiently made safe within transactions
given
that the oids generated are guaranteed to be unique within an
_entire_ postgres installation. This would seem to imply that, effectively,
only one transaction would be possible at the same time in an entire
postgresql database.

I think this is why a lot of us (hands up) want to reduce the use of oid's in user tables,
user tables would only have oid's iff the table is created with 'with oid'.
Per default normal user tables would not have oid's. I strongly support this as a strategy.

My current strategy to solve this problem involves the use of a new
system catalog which I call pg_large_object. This catalog contains
information  about each large object in the system.

hmmm ...   another bottleneck ? one table many users ?

Currently the information maintained is:
- identification of heap and index relations used by the large_object
- the size of the large object
- information about the type of the large object.
I still need to figure out how to create a new _unique_ index on a system
catalog using information in the indexing.h file.

I would propose a strategy, where the large object is referenced by a physical position (ctid)
and is stored in one file per lob column. You have to always remember, that filesystems
only behave well if they have less than xx members per directory xx usually beeing between 1000 - 25000
More members per directory will get file stat times of 20 ms and more, not to forget about
the many open files. While it is hard to have 20000+ tables it is easy to have millions of rows,
definitely too much for one directory file (this is not OS specific).
I would also suggest to hard link large objects to an owning row. Meaning, if the row is deleted
the lob is also deleted. I would not make this a trigger issue at the user, or type programmer level,
but handle it generically in the backend. Writing a lob type is hard enough to not make it
even more complex.

Given an oid this table allow us to determine if it is a valid large object.
I think this is necesary (to be able to maintain referential integrity) if
we're ever
going to have large object type.

Similarly I have defined a table pg_tuple which allows one to
determine if a given oid is a valid tuple.

please remember, that a la long not all user tuples can have oids. This would always be
a major performance problem.

This together with some other minor changes allows some cool
object oriented features for postgresql.

Yes, definitely. I don't know how to resolve my inner conflict on the two seemingly contrary issues,
performance versus OO features.

Fancy the idea of persistent Java object which live in postgresql databases?

Anyway if it all works as expected I'll submit some patches.

Thanks,
Maurice

Andreas




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Martin
Дата:
Сообщение: Re: [HACKERS] How to...?
Следующее
От: Mattias Kregert
Дата:
Сообщение: Re: [HACKERS] AUTO_INCREMENT suggestion