Обсуждение: Storing Video's or vedio file in DB.

Поиск
Список
Период
Сортировка

Storing Video's or vedio file in DB.

От
VENKTESH GUTTEDAR
Дата:
Hello,

    I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.

    I am working on a application where i will get video files from mobile app, and i have to store in the server,
   so my question is how do i store video's or video files in DB, or do i need to store only the link of that video file, if yes then where i have to store the file and how to access that file to display in an html page.

   if anyone is helping me out then give me all details related to storing videos in db, not just hints. like (what datatype i have to use.? how to encode and decode.? how to accept files through http post)

   Help would be appreciated.

   Thank you. 

--
Regards :
Venktesh Guttedar.

Re: Storing Video's or vedio file in DB.

От
Albe Laurenz
Дата:
VENKTESH GUTTEDAR wrote:
> I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.
> 
> I am working on a application where i will get video files from mobile app, and i have to store in the server,
> so my question is how do i store video's or video files in DB, or do i need to store only the link
> of that video file, if yes then where i have to store the file and how to access that file to display
> in an html page.
> 
> if anyone is helping me out then give me all details related to storing videos in db, not just
> hints. like (what datatype i have to use.? how to encode and decode.? how to accept files through http
> post)

I cannot help you with the web end of things (wrong forum), but
on the database side you have, as you said, the choice between
binary data in the database or files on the file system and links to them in the database.

I'll tell you the pros and cons.
If you store data in the database, you don't have to worry about consistency,
which will make development simpler.
The downside is that you might end up with a huge database that you will have
to backup and maintain, and the performance will be worse than reading files
from the file system.

I'd consider
- how big are the files
- how many of them are there
- whether performance is critical
- how much time you can invest in development
- how critical is consistency

The data type to use would be "bytea", unless the files are very large or you need
support for streaming in the database, in which case "large objects" would have
advantages.  The big downsides of large objects are that they are not as simple as
bytea and you'll have to maintain consistency between large objects and references
to them in tables.

Yours,
Laurenz Albe

Re: Storing Video's or vedio file in DB.

От
Vincent Veyron
Дата:
On Wed, 17 Dec 2014 15:09:40 +0530
VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote:



>    so my question is how do i store video's or video files in DB, or do i
> need to store only the link of that video file, if yes then where i have to
> store the file and how to access that file to display in an html page.

If you store your file in the file system, all you need is to store the path to it. Display a link to the file in your
html,and the server will deliver the file to the user's web server, which in turn will open it with the proper program
onthe user's machine. 

I do it with all sorts of extensions (docx, xlxs, pdf...), it works very well.

--
                    Salutations, Vincent Veyron

https://marica.fr/
Gestion des contentieux, des dossiers de sinistres assurance et des contrats pour le service juridique



Re: Storing Video's or vedio file in DB.

От
Thomas Kellerer
Дата:
Albe Laurenz schrieb am 17.12.2014 um 11:07:
> and the performance will be worse than reading files from the file system.

There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server.
It showed that the database might actually be faster than the file system.

As this topic comes up at my workplace every now and then as well, I created a little web application (Java/JDBC) to
testthis on Postgres and possibly other DBMS.  

Turns out the Postgres as well isn't really slower at this than the file system.

For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms
whereasthe average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs
usingthe PK of the table.  

"Reading from the filesystem" means looking up the path for the file in the database table and then reading the file
fromthe filesystem.  

For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column vs. 260ms reading
thefile from disk. 

The tests were done locally on my Windows laptop.
I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on the figures and
NTFSis not known for being blazingly fast. So maybe those figures will change. 


My tests however do not take into account the actual time it takes to send the binary data from the server to the
client(=browser). It might well be possible that serving the file through an Apache Web Server directly is faster than
servingthe file through a JEE Servlet. My intention was to measure the raw read speed of the binary data from the
mediumwhere it is stored.  


> The downside is that you might end up with a huge database
> that you will have to backup and maintain

I don't really buy the argument with the backup: the amount of data to be backed up is essentially the same.
With both solutions you can have incremental backups.

Another downside you didn't mentioned is the fact that you have to distribute the files in the filesystem properly.
Having thousands or even millions of files in a single directory is not going to be maintenance friendly either.



Regards
Thomas


[1] http://research.microsoft.com/apps/pubs/default.aspx?id=64525

Re: Storing Video's or vedio file in DB.

От
Alban Hertroys
Дата:
On 17 December 2014 at 13:55, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Albe Laurenz schrieb am 17.12.2014 um 11:07:
>> and the performance will be worse than reading files from the file system.
>
> There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server.
> It showed that the database might actually be faster than the file system.
>
> As this topic comes up at my workplace every now and then as well, I created a little web application (Java/JDBC) to
testthis on Postgres and possibly other DBMS. 
>
> Turns out the Postgres as well isn't really slower at this than the file system.
>
> For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms
whereasthe average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs
usingthe PK of the table. 
>
> "Reading from the filesystem" means looking up the path for the file in the database table and then reading the file
fromthe filesystem. 

With how many blobs/files did you test this? I'm asking because PG
stores all blobs in a single table. On a file-system, if all files are
stored in a single directory, the situation is similar.
However, a file-system has the ability to store files in several
directories instead of just one, which is often claimed to improve
file-locating performance.

Seeing as the read performance of a file (once it's been located) from
the file-system versus a blob appears similar, the difference in time
for locating the file might well be relevant here.

Interesting to see this was tested with MS SQL and therefore limited
to NTFS. It's probably useful to test this with other file-systems,
such as ZFS or UFS (with DIRHASH!), etc.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Storing Video's or vedio file in DB.

От
Albe Laurenz
Дата:
Thomas Kellerer wrote:
> Albe Laurenz schrieb am 17.12.2014 um 11:07:
>> and the performance will be worse than reading files from the file system.
> 
> There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server.
> It showed that the database might actually be faster than the file system.
> 
> As this topic comes up at my workplace every now and then as well, I created a little web application
> (Java/JDBC) to test this on Postgres and possibly other DBMS.
> 
> Turns out the Postgres as well isn't really slower at this than the file system.
> 
> For small files around 50k both perform similar: the average time to read the blob from a bytea column
> was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test
> uses 50 threads to read the blobs using the PK of the table.
> 
> "Reading from the filesystem" means looking up the path for the file in the database table and then
> reading the file from the filesystem.
> 
> For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column
> vs. 260ms reading the file from disk.
> 
> The tests were done locally on my Windows laptop.
> I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on
> the figures and NTFS is not known for being blazingly fast. So maybe those figures will change.

That must be some strangeness of the web application, que no?
PostgreSQL must do everything that a direct file access does, right?
Plus some extra processing (load the data into shared_buffers, ...).
Given that, do you have any explanation for what you observed?

> My tests however do not take into account the actual time it takes to send the binary data from the
> server to the client (=browser). It might well be possible that serving the file through an Apache Web
> Server directly is faster than serving the file through a JEE Servlet. My intention was to measure the
> raw read speed of the binary data from the medium where it is stored.

Why not compare 'SELECT ...' with psql (or libpq) with 'cat ...' to measure the difference?

>> The downside is that you might end up with a huge database
>> that you will have to backup and maintain
> 
> I don't really buy the argument with the backup: the amount of data to be backed up is essentially the
> same.
> With both solutions you can have incremental backups.

The amount to back up will stay roughly the same, granted.
But isn't backup/restore of a large database more cumbersome than backup/restore of a file system?
And a major upgrade of a large database is more painful, right?

> Another downside you didn't mentioned is the fact that you have to distribute the files in the
> filesystem properly.
> Having thousands or even millions of files in a single directory is not going to be maintenance
> friendly either.

That's right, you have to spend some thought on how to store the files as well.

Yours,
Laurenz Albe

Re: Storing Video's or vedio file in DB.

От
Vincent Veyron
Дата:
On Wed, 17 Dec 2014 13:55:46 +0100
Thomas Kellerer <spam_eater@gmx.net> wrote:
>
> Another downside you didn't mentioned is the fact that you have to distribute the files in the filesystem properly.
> Having thousands or even millions of files in a single directory is not going to be maintenance friendly either.
>

That's not a very hard problem to solve. Just write a script that splits the directory into 1000 file chunks, I did it
withperl from scratch in a few hours. 

Now my database dumps in seconds, and rsync takes care of new files just as fast.


--
                    Salutations, Vincent Veyron

https://libremen.com/
Legal case, contract and insurance claim management software


Re: Storing Video's or vedio file in DB.

От
Vincent Veyron
Дата:
On Wed, 17 Dec 2014 16:51:10 +0530
VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote:

Hi Venktesh,

[you should not reply directly to me; hit reply all to post your messages to the list]

>
>     can you help me in achieving this, i mean how to store it in a file
> system how to fetch it from there.

Nope. It's not postgresql related, it really depends on what tools you use to build your web site (in your case Django
whichI know nothing about) 

>     and storing in a file system and giving the path will be secure.?
>
>

That's part of the configuration of your web server, you'll have to study their documentation/lists




--
                    Salutations, Vincent Veyron

https://libremen.com/
Legal case, contract and insurance claim management software


Re: Storing Video's or vedio file in DB.

От
Alan Hodgson
Дата:
On Wednesday, December 17, 2014 01:55:46 PM Thomas Kellerer wrote:
> Albe Laurenz schrieb am 17.12.2014 um 11:07:
> > and the performance will be worse than reading files from the file system.
>
> There is a Microsoft research [1] (from 2006) which tested this "myth" using
> SQL Server. It showed that the database might actually be faster than the
> file system.
>
> As this topic comes up at my workplace every now and then as well, I created
> a little web application (Java/JDBC) to test this on Postgres and possibly
> other DBMS.
>
> Turns out the Postgres as well isn't really slower at this than the file
> system.
>
> For small files around 50k both perform similar: the average time to read
> the blob from a bytea column was around 2ms whereas the average time to
> read the blob from the filesystem was around 1ms. The test uses 50 threads
> to read the blobs using the PK of the table.
>
> "Reading from the filesystem" means looking up the path for the file in the
> database table and then reading the file from the filesystem.
>
> For larger files around 250k Postgres was actually faster in my tests: 130ms
> reading the bytea column vs. 260ms reading the file from disk.
>
> The tests were done locally on my Windows laptop.
> I didn't have time yet to do this on a Linux server. I expect the filesystem
> to have some impact on the figures and NTFS is not known for being
> blazingly fast. So maybe those figures will change.
>
>
> My tests however do not take into account the actual time it takes to send
> the binary data from the server to the client (=browser). It might well be
> possible that serving the file through an Apache Web Server directly is
> faster than serving the file through a JEE Servlet. My intention was to
> measure the raw read speed of the binary data from the medium where it is
> stored.

You can get the data from disk about as fast, but actually serving it results
in a large CPU hit that isn't present when serving files.

And if you're using bytea, your app server has to allocate memory to hold at
least one full copy of the file (I seem to recall that it works out to 2
copies, actually, but it's been a while since I tried it). Most languages
aren't good about releasing that memory, so that hit stays around until the
process gets recycled.

For a low volume app, both might be acceptable - any modern CPU can swamp most
outbound bandwidth even while decoding bytea. But it is a large amount of
overhead compared to a web server just dumping files into a network buffer
straight from disk cache.

Also, maintaining large tables still sucks. You can partition them to make
things friendlier. pg_upgrade makes things nicer, but it can't always be used,
so major version upgrades can still be a problem.

On the plus side, all your data is in one place, which makes it cluster-
friendly and easy to delete files when needed, and makes taking consistent
backups much simpler.


Re: Storing Video's or vedio file in DB.

От
John R Pierce
Дата:
On 12/17/2014 4:55 AM, Thomas Kellerer wrote:
> Turns out the Postgres as well isn't really slower at this than the file system.
>
> For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms
whereasthe average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs
usingthe PK of the table. 
>
> "Reading from the filesystem" means looking up the path for the file in the database table and then reading the file
fromthe filesystem. 
>
> For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column vs. 260ms
readingthe file from disk. 

with videos, we're likely looking at file sizes in the 100MB to
multi-gigabyte range, unless these are just short snippets.   I'm not
very familiar with django and python, does it have facilities to stream
a very large record, or does it always transfer the whole thing as a
chunk in memory?   Does it have PostgreSQL Large Object support?

Also, serving video via a webserver, this is generally done with a html5
or flash streaming server, where the web application generates the
embedded link to the video, but the video itself comes from said
streaming thing....  those streaming things are less likely to be able
to read a object out of postgres than they are to stream from the file
system.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Storing Video's or vedio file in DB.

От
Merlin Moncure
Дата:
On Wed, Dec 17, 2014 at 3:39 AM, VENKTESH GUTTEDAR
<venkteshguttedar@gmail.com> wrote:
> Hello,
>
>     I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.
>
>     I am working on a application where i will get video files from mobile
> app, and i have to store in the server,
>    so my question is how do i store video's or video files in DB, or do i
> need to store only the link of that video file, if yes then where i have to
> store the file and how to access that file to display in an html page.
>
>    if anyone is helping me out then give me all details related to storing
> videos in db, not just hints. like (what datatype i have to use.? how to
> encode and decode.? how to accept files through http post)

Storing files in the db:
*) 1 gb limit (2 gb for large objects)
*) how you read and write the data really matters.  to do this
seriously, writing the client in C and using binary wire format is
suggested
*) you can manage binary transfers in context of transaction, which is
nice (no db/fs sync issues)
*) backups will become a real headache using standard pg_dump methods
*) database will do some compression for you. however be advised pg
compression is a cpu cruncher
*) memory consumption can be multiples of largest file size

Storing files in the fs
*) have to manage database records as pointers.  synchronization issue suck
*) backups will still suck
*) performance will be somewhat faster depending on how exactly the data is read
*) memory consumption is minimal

Personally having done it both ways I personally don't recommend
storing files in the database if they are going to be very large which
I would estimate to around 10mb or so assuming you have lots of data
to store.  There are just too many cases where pg will open up the
whole file in memory where a well written streaming interface against
the o/s will send it right off the disk.  Also in my opinion to have
anything approximating good performance in pg with binary transfers we
are definitely talking a C client.

File systems have been aggressively enhanced over the years to try and
keep fragmentation low in the face of high write activity.  The
database doesn't do this and if you are dealing with a large amount of
files in the face of concurrent updating and deleting I expect you
could start running into severe fragmentation issues over time.

merlin


Re: Storing Video's or vedio file in DB.

От
Jonathan Vanasco
Дата:
I wouldn't even store it on the filesystem if I could avoid that.
Most people I know will assign the video a unique identifier (which is stored in the database) and then store the video
filewith a 3rd party (e.g. Amazon S3). 

1. This is often cheaper.  Videos take up a lot of disk space.  Having to ensure 2-3 copies of a file as a failover is
notfun. 
2. It offloads work from internal servers.  Why deal with connections that are serving a static file if you can avoid
it?

In terms of FS vs DB (aside from the open vs streaming which was already brought up)

I think the big issue with storing large files in the database is the input/output connection.
Postgres has a specified number of max connections available, and each one has some overhead to operate. Meanwhile, a
serverlike nginx can handle 10k connections easily, and with little or no overhead.  While the speed is comparable to
theOS, you end up using a resource from a limited database connection pool.  And you run the risk of a slow/dropped
clienttying up the connection.   
Why allocate a resource to these operations, when there are more lightweight alternatives that won't tie up a database
connection? 



Re: Storing Video's or vedio file in DB.

От
Arthur Silva
Дата:

This! I'm surprised it took so long to somebody suggest an object store.

On Dec 17, 2014 9:22 PM, "Jonathan Vanasco" <postgres@2xlp.com> wrote:

I wouldn't even store it on the filesystem if I could avoid that.
Most people I know will assign the video a unique identifier (which is stored in the database) and then store the video file with a 3rd party (e.g. Amazon S3).

1. This is often cheaper.  Videos take up a lot of disk space.  Having to ensure 2-3 copies of a file as a failover is not fun.
2. It offloads work from internal servers.  Why deal with connections that are serving a static file if you can avoid it?

In terms of FS vs DB (aside from the open vs streaming which was already brought up)

I think the big issue with storing large files in the database is the input/output connection.
Postgres has a specified number of max connections available, and each one has some overhead to operate. Meanwhile, a server like nginx can handle 10k connections easily, and with little or no overhead.  While the speed is comparable to the OS, you end up using a resource from a limited database connection pool.  And you run the risk of a slow/dropped client tying up the connection.
Why allocate a resource to these operations, when there are more lightweight alternatives that won't tie up a database connection ?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Storing Video's or vedio file in DB.

От
Adrian Klaver
Дата:
On 12/17/2014 07:37 PM, Arthur Silva wrote:
> This! I'm surprised it took so long to somebody suggest an object store.

I thought they did, a file system:)

>
> On Dec 17, 2014 9:22 PM, "Jonathan Vanasco" <postgres@2xlp.com
> <mailto:postgres@2xlp.com>> wrote:
>
>
>     I wouldn't even store it on the filesystem if I could avoid that.
>     Most people I know will assign the video a unique identifier (which
>     is stored in the database) and then store the video file with a 3rd
>     party (e.g. Amazon S3).
>
>     1. This is often cheaper.  Videos take up a lot of disk space.
>     Having to ensure 2-3 copies of a file as a failover is not fun.
>     2. It offloads work from internal servers.  Why deal with
>     connections that are serving a static file if you can avoid it?
>
>     In terms of FS vs DB (aside from the open vs streaming which was
>     already brought up)
>
>     I think the big issue with storing large files in the database is
>     the input/output connection.
>     Postgres has a specified number of max connections available, and
>     each one has some overhead to operate. Meanwhile, a server like
>     nginx can handle 10k connections easily, and with little or no
>     overhead.  While the speed is comparable to the OS, you end up using
>     a resource from a limited database connection pool.  And you run the
>     risk of a slow/dropped client tying up the connection.
>     Why allocate a resource to these operations, when there are more
>     lightweight alternatives that won't tie up a database connection ?
>
>
>
>     --



--
Adrian Klaver
adrian.klaver@aklaver.com