Обсуждение: Storing many big files in database- should I do it?
Hello, I have a web application where users upload/share files. After file is uploaded it is copied to S3 and all subsequent downloads are done from there. So in a file's lifetime it's accessed only twice- when created and when copied to S3. Files are documents, of different size from few kilobytes to 200 Megabytes. Number of files: thousands to hundreds of thousands. My dilemma is - Should I store files in PGSQL database or store in filesystem and keep only metadata in database? I see the possible cons of using PGSQL as storage: - more network bandwidth required comparing to access NFS-mounted filesystem ? - if database becomes corrupt you can't recover individual files - you can't backup live database unless you install complicated replication add-ons - more CPU required to store/retrieve files (comparing to filesystem access) - size overhead, e.g. storing 1000 bytes will take 1000 bytes in database + 100 bytes for db metadata, index, etc. with lot of files this will be a lot of overhead. Are these concerns valid? Anyone had this kind of design problem and how did you solve it? Thanks.
Rod wrote: > Hello, > > I have a web application where users upload/share files. > After file is uploaded it is copied to S3 and all subsequent downloads > are done from there. > So in a file's lifetime it's accessed only twice- when created and > when copied to S3. > > Files are documents, of different size from few kilobytes to 200 > Megabytes. Number of files: thousands to hundreds of thousands. > > My dilemma is - Should I store files in PGSQL database or store in > filesystem and keep only metadata in database? > > I see the possible cons of using PGSQL as storage: > - more network bandwidth required comparing to access NFS-mounted filesystem ? > - if database becomes corrupt you can't recover individual files > - you can't backup live database unless you install complicated > replication add-ons > - more CPU required to store/retrieve files (comparing to filesystem access) > - size overhead, e.g. storing 1000 bytes will take 1000 bytes in > database + 100 bytes for db metadata, index, etc. with lot of files > this will be a lot of overhead. > > Are these concerns valid? > Anyone had this kind of design problem and how did you solve it? > S3 storage is not suitable for running a RDBMS. An RDBMS wants fast low latency storage using 8k block random reads and writes. S3 is high latency and oriented towards streaming
No, I'm not storing RDBMS in S3. I didn't write that in my post. S3 is used as CDN, only for downloading files. On Tue, Apr 27, 2010 at 6:54 PM, John R Pierce <pierce@hogranch.com> wrote: > Rod wrote: >> >> Hello, >> >> I have a web application where users upload/share files. >> After file is uploaded it is copied to S3 and all subsequent downloads >> are done from there. >> So in a file's lifetime it's accessed only twice- when created and >> when copied to S3. >> >> Files are documents, of different size from few kilobytes to 200 >> Megabytes. Number of files: thousands to hundreds of thousands. >> >> My dilemma is - Should I store files in PGSQL database or store in >> filesystem and keep only metadata in database? >> >> I see the possible cons of using PGSQL as storage: >> - more network bandwidth required comparing to access NFS-mounted >> filesystem ? >> - if database becomes corrupt you can't recover individual files >> - you can't backup live database unless you install complicated >> replication add-ons >> - more CPU required to store/retrieve files (comparing to filesystem >> access) >> - size overhead, e.g. storing 1000 bytes will take 1000 bytes in >> database + 100 bytes for db metadata, index, etc. with lot of files >> this will be a lot of overhead. >> >> Are these concerns valid? >> Anyone had this kind of design problem and how did you solve it? >> > > S3 storage is not suitable for running a RDBMS. > An RDBMS wants fast low latency storage using 8k block random reads and > writes. S3 is high latency and oriented towards streaming > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
No, I'm not storing RDBMS in S3. I didn't write that in my post.
S3 is used as CDN, only for downloading files.
So you are storing your files on S3 ?
Why should you store those files additionally in a PostgreSQL database?
If you want to keep track of them / remember metadata, hashes will do the job with much less memory.
Harald
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
2010/4/27 Rod <cckramer@gmail.com>: > Hello, > > I have a web application where users upload/share files. > After file is uploaded it is copied to S3 and all subsequent downloads > are done from there. > So in a file's lifetime it's accessed only twice- when created and > when copied to S3. > > Files are documents, of different size from few kilobytes to 200 > Megabytes. Number of files: thousands to hundreds of thousands. > > My dilemma is - Should I store files in PGSQL database or store in > filesystem and keep only metadata in database? > > I see the possible cons of using PGSQL as storage: > - more network bandwidth required comparing to access NFS-mounted filesystem ? > - if database becomes corrupt you can't recover individual files > - you can't backup live database unless you install complicated > replication add-ons > - more CPU required to store/retrieve files (comparing to filesystem access) > - size overhead, e.g. storing 1000 bytes will take 1000 bytes in > database + 100 bytes for db metadata, index, etc. with lot of files > this will be a lot of overhead. > > Are these concerns valid? yes > Anyone had this kind of design problem and how did you solve it? store your files in a filesystem, and keep the path to the file (plus metadata, acl, etc...) in database. > > Thanks. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain
S3 is not primary storage for the files, it's a distribution system. We want to be able to switch form S3 to other CDN if required. So, "Master" copies of files is kept on private server. Question is should it be database of filesystem. On Tue, Apr 27, 2010 at 7:03 PM, Massa, Harald Armin <chef@ghum.de> wrote: >> No, I'm not storing RDBMS in S3. I didn't write that in my post. >> S3 is used as CDN, only for downloading files. > > > So you are storing your files on S3 ? > > Why should you store those files additionally in a PostgreSQL database? > > If you want to keep track of them / remember metadata, hashes will do the > job with much less memory. > > Harald > > -- > GHUM Harald Massa > persuadere et programmare > Harald Armin Massa > Spielberger Straße 49 > 70435 Stuttgart > 0173/9409607 > no fx, no carrier pigeon > - > %s is too gigantic of an industry to bend to the whims of reality >
On Tuesday 27 April 2010 11.17:42 Cédric Villemain wrote: > > Anyone had this kind of design problem and how did you solve it? > > store your files in a filesystem, and keep the path to the file (plus > metadata, acl, etc...) in database. ... and be careful that db and file storage do not go out of sync. But if files are ever only added and possibly removed (but never changed), this is not too hard: * be sure to commit db transaction only after file has been written to disk (use fsync or similar to be sure!) (For file deletions: first delete db metadata, then delete the file.) * be sure to detect failed writes and abort the db transaction or otherwise properly handle errors while storing the file. * occasionally run a clean-up to remove files that were written to filesystem where the db metadata was not stored. Should be a rare case but it probably will happen. PostgreSQL support 2PC (PREPARE and then COMMIT as separate steps); you may want to use this (PREPARE database transaction, then do filesystem operations. If filessystem operation fails, you cann ROLLBACK the db connection, otherwise COMMIT.) That way, you don't lose transactional semantics. Backup requires some more thought. I guess you could use some kind of volume management to get filesysstem snapshots, but you have to be sure the fs snapshot reflects the point in time when the database backup was made. Depending on load / availability requirements you may get away with stopping data modification at the application level for a few seconds until the db backup has started and the filesystem snapshot has been created. cheers -- vbi -- featured product: PostgreSQL - http://postgresql.org
Вложения
On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote:
What type of filesystem is good for this? A filesystem with support for storing tens of thousands of files in a single directory, or should one play the 41/56/34/41563489.ext game?
Are there any open source systems which handle keeping a filesystem and database in sync for this purpose, or is it a wheel that keeps getting reinvented?
I know "store your files in a filesystem" is the best long-term solution. But it's just so much easier to just throw everything in the database.
store your files in a filesystem, and keep the path to the file (plus
metadata, acl, etc...) in database.
What type of filesystem is good for this? A filesystem with support for storing tens of thousands of files in a single directory, or should one play the 41/56/34/41563489.ext game?
Are there any open source systems which handle keeping a filesystem and database in sync for this purpose, or is it a wheel that keeps getting reinvented?
I know "store your files in a filesystem" is the best long-term solution. But it's just so much easier to just throw everything in the database.
On Tuesday 27 April 2010 5:45:43 pm Anthony wrote: > On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain < > > cedric.villemain.debian@gmail.com> wrote: > > store your files in a filesystem, and keep the path to the file (plus > > metadata, acl, etc...) in database. > > What type of filesystem is good for this? A filesystem with support for > storing tens of thousands of files in a single directory, or should one > play the 41/56/34/41563489.ext game? > > Are there any open source systems which handle keeping a filesystem and > database in sync for this purpose, or is it a wheel that keeps getting > reinvented? > > I know "store your files in a filesystem" is the best long-term solution. > But it's just so much easier to just throw everything in the database. In the for what it is worth department check out this Wiki: http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystems -- Adrian Klaver adrian.klaver@gmail.com
2010/4/28 Adrian Klaver <adrian.klaver@gmail.com>: > On Tuesday 27 April 2010 5:45:43 pm Anthony wrote: >> On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain < >> >> cedric.villemain.debian@gmail.com> wrote: >> > store your files in a filesystem, and keep the path to the file (plus >> > metadata, acl, etc...) in database. >> >> What type of filesystem is good for this? A filesystem with support for >> storing tens of thousands of files in a single directory, or should one >> play the 41/56/34/41563489.ext game? I'll prefer go with XFS or ext{3-4}. In both case with a path game. You path game will let you handle the scalability of your uploads. (so the first increment is the first directory) something like 1/2/3/4/foo.file 2/2/3/4/bar.file etc... You might explore a hash function or something that split a SHA1(or other) sum of the file to get the path. >> >> Are there any open source systems which handle keeping a filesystem and >> database in sync for this purpose, or is it a wheel that keeps getting >> reinvented? >> >> I know "store your files in a filesystem" is the best long-term solution. >> But it's just so much easier to just throw everything in the database. > > In the for what it is worth department check out this Wiki: > http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystems and postgres fuse also :-D > > -- > Adrian Klaver > adrian.klaver@gmail.com > -- Cédric Villemain
Things to consider when not storing them in the DB:
1) Backups of DB are incomplete without a corresponding backup of the files.
2) No transactional integrity between filesystem and DB, so you will have to deal with orphans from both INSERT and DELETE (assuming you don't also update the files).
3) No built in ability for replication, such as WAL shipping
Big downside for the DB is that all large objects appear to be stored together in pg_catalog.pg_largeobject, which seems axiomatically troubling that you know you have lots of big data, so you then store them together, and then worry about running out of 'loids'.
David
On 4/29/2010 2:10 AM, Cédric Villemain wrote:
1) Backups of DB are incomplete without a corresponding backup of the files.
2) No transactional integrity between filesystem and DB, so you will have to deal with orphans from both INSERT and DELETE (assuming you don't also update the files).
3) No built in ability for replication, such as WAL shipping
Big downside for the DB is that all large objects appear to be stored together in pg_catalog.pg_largeobject, which seems axiomatically troubling that you know you have lots of big data, so you then store them together, and then worry about running out of 'loids'.
David
On 4/29/2010 2:10 AM, Cédric Villemain wrote:
2010/4/28 Adrian Klaver <adrian.klaver@gmail.com>:On Tuesday 27 April 2010 5:45:43 pm Anthony wrote:On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain < cedric.villemain.debian@gmail.com> wrote:store your files in a filesystem, and keep the path to the file (plus metadata, acl, etc...) in database.What type of filesystem is good for this? A filesystem with support for storing tens of thousands of files in a single directory, or should one play the 41/56/34/41563489.ext game?I'll prefer go with XFS or ext{3-4}. In both case with a path game. You path game will let you handle the scalability of your uploads. (so the first increment is the first directory) something like 1/2/3/4/foo.file 2/2/3/4/bar.file etc... You might explore a hash function or something that split a SHA1(or other) sum of the file to get the path.Are there any open source systems which handle keeping a filesystem and database in sync for this purpose, or is it a wheel that keeps getting reinvented? I know "store your files in a filesystem" is the best long-term solution. But it's just so much easier to just throw everything in the database.In the for what it is worth department check out this Wiki: http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystemsand postgres fuse also :-D-- Adrian Klaver adrian.klaver@gmail.com
On 4/29/2010 12:07 PM, David Wall wrote: > > > Big downside for the DB is that all large objects appear to be stored > together in pg_catalog.pg_largeobject, which seems axiomatically > troubling that you know you have lots of big data, so you then store > them together, and then worry about running out of 'loids'. Huh ??? isn't that point of using bytea or text datatypes. I could have sworn bytea does not use large object interface it uses TOAST or have i gone insane Many people encode the binary data in Base64 and store as text data type?? Then never have to deal with escaping bytea data type. Which i have found can be a pain All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by ourproprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other informationproprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses.If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified thatany unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have receivedthis e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mailimmediately. Thank you.
Le 29/04/2010 18:45, Justin Graf a écrit : > On 4/29/2010 12:07 PM, David Wall wrote: >> >> >> Big downside for the DB is that all large objects appear to be stored >> together in pg_catalog.pg_largeobject, which seems axiomatically >> troubling that you know you have lots of big data, so you then store >> them together, and then worry about running out of 'loids'. > Huh ??? isn't that point of using bytea or text datatypes. > > I could have sworn bytea does not use large object interface it uses > TOAST or have i gone insane > You're not insane :) Put it another way: bytea values are not stored in the pg_largeobject catalog. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com
>> Huh ??? isn't that point of using bytea or text datatypes. >> >> I could have sworn bytea does not use large object interface it uses >> TOAST or have i gone insane >> >> > You're not insane :) > > Put it another way: bytea values are not stored in the pg_largeobject > catalog. > I missed the part that BYTEA was being used since it's generally not a good way for starting large binary data because you are right that BYTEA requires escaping across the wire (client to backend) both directions, which for true binary data (like compressed/encrypted data, images or other non-text files) makes for a lot of expansion in size and related memory. BYTEA and TEXT both can store up to 1GB of data (max field length), which means even less "file size" supported if you use TEXT with base64 coding. LO supports 2GB of data. In JDBC, typically BYTEA is used with byte[] or binary stream while LOs with BLOB. I think LOs allow for streaming with the backend, too, but not sure about that, whereas I'm pretty sure BYTEA/TEXT move all the data together you it will be in memory all or nothing. Of course, to support larger file storage than 1GB or 2GB, you'll have to create your own "toast" like capability to split them into multiple rows. David
On 4/29/2010 1:51 PM, David Wall wrote: > >> Put it another way: bytea values are not stored in the pg_largeobject >> catalog. > > I missed the part that BYTEA was being used since it's generally not a > good way for starting large binary data because you are right that > BYTEA requires escaping across the wire (client to backend) both > directions, which for true binary data (like compressed/encrypted > data, images or other non-text files) makes for a lot of expansion in > size and related memory. > > BYTEA and TEXT both can store up to 1GB of data (max field length), > which means even less "file size" supported if you use TEXT with > base64 coding. LO supports 2GB of data. In JDBC, typically BYTEA is > used with byte[] or binary stream while LOs with BLOB. I think LOs > allow for streaming with the backend, too, but not sure about that, > whereas I'm pretty sure BYTEA/TEXT move all the data together you it > will be in memory all or nothing. > > Of course, to support larger file storage than 1GB or 2GB, you'll have > to create your own "toast" like capability to split them into multiple > rows. > > David > Outside of videos/media streams what other kind of data is going to be 1gig in size. Thats allot of data still even still today. We all talk about 1 gig and 2 gig limits on this, but really who has bumped into that on regular bases??? Every time i hear about that not being big enough the person is trying to shoe horn in media files into the database, which is insane All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by ourproprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other informationproprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses.If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified thatany unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have receivedthis e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mailimmediately. Thank you.
On Thu, Apr 29, 2010 at 1:51 PM, David Wall <d.wall@computer.org> wrote: > I missed the part that BYTEA was being used since it's generally not a good > way for starting large binary data because you are right that BYTEA requires > escaping across the wire (client to backend) both directions, which for true > binary data (like compressed/encrypted data, images or other non-text files) > makes for a lot of expansion in size and related memory. what?? postgresql supports binary data in both directions without escaping. here is how i do it with libpqtypes: PGbytea b; b.data = some_pointer; b.len = data_length; res = PGexecf(conn, "insert into table values (%bytea*);", b); merlin
Justin Graf wrote: > On 4/29/2010 12:07 PM, David Wall wrote: > > > > > > Big downside for the DB is that all large objects appear to be stored > > together in pg_catalog.pg_largeobject, which seems axiomatically > > troubling that you know you have lots of big data, so you then store > > them together, and then worry about running out of 'loids'. > Huh ??? isn't that point of using bytea or text datatypes. > > I could have sworn bytea does not use large object interface it uses > TOAST or have i gone insane Each toasted object also requires an OID, so you cannot have more than 4 billion toasted attributes in a table. I've never seen this to be a problem in real life, but if you're talking about having that many large objects, then it will be a problem with toast too. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Each toasted object also requires an OID, so you cannot have more than 4 > billion toasted attributes in a table. > I've never seen this to be a problem in real life, but if you're talking > about having that many large objects, then it will be a problem with > toast too. However, that toast limit is per-table, whereas the pg_largeobject limit is per-database. So for example if you have a partitioned table then the toast limit only applies per partition. With large objects you'd fall over at 4G objects (probably quite a bit less in practice) no matter what. regards, tom lane
On 4/29/2010 3:18 PM, Tom Lane wrote: > Alvaro Herrera<alvherre@commandprompt.com> writes: > > However, that toast limit is per-table, whereas the pg_largeobject limit > is per-database. So for example if you have a partitioned table then > the toast limit only applies per partition. With large objects you'd > fall over at 4G objects (probably quite a bit less in practice) no > matter what. > > regards, tom lane > has there been any thought of doing something similar to MS filestream ???? http://msdn.microsoft.com/en-us/library/cc949109.aspx it seems to overcome all the draw backs of storing files in the DB. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by ourproprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other informationproprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses.If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified thatany unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have receivedthis e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mailimmediately. Thank you.
On Apr 29, 2010, at 10:45 AM, Justin Graf wrote: > Many people encode the binary data in Base64 and store as text data > type?? Then never have to deal with escaping bytea data type. Which i > have found can be a pain Damn. Wish I'd thought of that ;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice