Re: autovacuum and orphaned large objects

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: autovacuum and orphaned large objects
Дата
Msg-id CA+TgmoYZioo0pg7USBrOVgYx5uDcE0d_NcxsLqNyZQ-kX7FGYw@mail.gmail.com
обсуждение исходный текст
Ответ на autovacuum and orphaned large objects  (Euler Taveira de Oliveira <euler@timbira.com>)
Список pgsql-hackers
On Mon, Oct 24, 2011 at 10:25 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> On 24-10-2011 10:57, Robert Haas wrote:
>>
>> I think the main reason why vacuumlo is a contrib module rather than
>> in core is that it is just a heuristic, and it might not be what
>> everyone wants to do.  You could store a bunch of large objects in the
>> database and use the returned OIDs to generate links that you email to
>> users, and then when the user clicks on the link we retrieve the
>> corresponding LO and send it to the user over HTTP.  In that design,
>> there are no tables in the database at all, yet the large objects
>> aren't orphaned.
>>
> Uau, what a strange method to solve a problem and possibly bloat your
> database. No, I'm not suggesting that we forbid it. The proposed method
> could cleanup orphaned LO in 95% (if not 99%) of the use cases.
>
> I've never heard someone using LO like you describe it. It seems strange
> that someone distributes an OID number but (s)he does not store its
> reference at the same database. Yes, it is a possibility but ...

I guess we could make it an optional behavior, but once you go that
far then you have to wonder whether what's really needed here is a
general-purpose task scheduler.  I mean, the autovacuum launcher's
idea about how often to vacuum the database won't necessarily match
the user's idea of how often they want to vacuum away large objects -
and if the user is doing something funky (like storing arrays of large
object OIDs, or inexplicably storing them using numeric or int8) then
putting it in the backend removes a considerable amount of
flexibility.  Another case where vacuumlo will fall over is if you
have a very, very large table with an OID column, but with lots of
duplicate values so that the number of OIDs actually referenced is
much smaller.  You might end up doing a table scan on the large table
every time this logic kicks in, and that might suck.

I'm sort of unexcited about the idea of doing a lot of engineering
around this; it seems to me that the only reasons we still have a
separate large object facility rather than just letting everyone go
through regular tables with toastable columns are (1) the size limit
is 2GB rather than 1GB and (2) you can read and write parts of objects
rather than the whole thing.  If we're going to do some more
engineering here, I'd rather set our sights a little higher.
Complaints I often hear about the large object machinery include (1)
2GB is still not enough, (2) 4 billion large objects is not enough,
(3) the performance is inadequate, particularly with large numbers of
large objects from possibly-unrelated subsystems slammed into a single
table, and (4) it would be nice to be able to partial reads and writes
on any toastable field, not just large objects.  I'm not saying that
the problem you're complaining about isn't worth fixing in the
abstract, and if it seemed like a nice, clean fix I'd be all in favor,
but I just don't think it's going to be very simple, and for the
amount of work involved I'd rather get a little bit more bang for the
buck.

Of course, you don't have to agree with me on any of this; I'm just
giving you my take on it.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: Online base backup from the hot-standby
Следующее
От: "Omar Bettin"
Дата:
Сообщение: R: [9.1] unusable for large views (SOLVED)