Re: autovacuum and orphaned large objects
От | Robert Haas |
---|---|
Тема | Re: autovacuum and orphaned large objects |
Дата | |
Msg-id | CA+TgmoYZioo0pg7USBrOVgYx5uDcE0d_NcxsLqNyZQ-kX7FGYw@mail.gmail.com обсуждение исходный текст |
Ответ на | autovacuum and orphaned large objects (Euler Taveira de Oliveira <euler@timbira.com>) |
Список | pgsql-hackers |
On Mon, Oct 24, 2011 at 10:25 AM, Euler Taveira de Oliveira <euler@timbira.com> wrote: > On 24-10-2011 10:57, Robert Haas wrote: >> >> I think the main reason why vacuumlo is a contrib module rather than >> in core is that it is just a heuristic, and it might not be what >> everyone wants to do. You could store a bunch of large objects in the >> database and use the returned OIDs to generate links that you email to >> users, and then when the user clicks on the link we retrieve the >> corresponding LO and send it to the user over HTTP. In that design, >> there are no tables in the database at all, yet the large objects >> aren't orphaned. >> > Uau, what a strange method to solve a problem and possibly bloat your > database. No, I'm not suggesting that we forbid it. The proposed method > could cleanup orphaned LO in 95% (if not 99%) of the use cases. > > I've never heard someone using LO like you describe it. It seems strange > that someone distributes an OID number but (s)he does not store its > reference at the same database. Yes, it is a possibility but ... I guess we could make it an optional behavior, but once you go that far then you have to wonder whether what's really needed here is a general-purpose task scheduler. I mean, the autovacuum launcher's idea about how often to vacuum the database won't necessarily match the user's idea of how often they want to vacuum away large objects - and if the user is doing something funky (like storing arrays of large object OIDs, or inexplicably storing them using numeric or int8) then putting it in the backend removes a considerable amount of flexibility. Another case where vacuumlo will fall over is if you have a very, very large table with an OID column, but with lots of duplicate values so that the number of OIDs actually referenced is much smaller. You might end up doing a table scan on the large table every time this logic kicks in, and that might suck. I'm sort of unexcited about the idea of doing a lot of engineering around this; it seems to me that the only reasons we still have a separate large object facility rather than just letting everyone go through regular tables with toastable columns are (1) the size limit is 2GB rather than 1GB and (2) you can read and write parts of objects rather than the whole thing. If we're going to do some more engineering here, I'd rather set our sights a little higher. Complaints I often hear about the large object machinery include (1) 2GB is still not enough, (2) 4 billion large objects is not enough, (3) the performance is inadequate, particularly with large numbers of large objects from possibly-unrelated subsystems slammed into a single table, and (4) it would be nice to be able to partial reads and writes on any toastable field, not just large objects. I'm not saying that the problem you're complaining about isn't worth fixing in the abstract, and if it seemed like a nice, clean fix I'd be all in favor, but I just don't think it's going to be very simple, and for the amount of work involved I'd rather get a little bit more bang for the buck. Of course, you don't have to agree with me on any of this; I'm just giving you my take on it. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: