Обсуждение: easy way to acquire height / width from images (PNG, JPEG) stored asbytea?

Поиск
Список
Период
Сортировка

easy way to acquire height / width from images (PNG, JPEG) stored asbytea?

От
Achilleas Mantzios
Дата:
Hello Dear List,

we have a table holding email attachments as bytea, and we would like to 
filter out images of small dimensions, which are not of any value to our 
logic.

I took a look at pg_image extension, tested it, and it proved 
problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped 
the extension and uninstalled this as soon as fsck finally finished.

So I would like to ask you, basically we have PNGs and JPEGs, is there 
an easy way of parsing their headers and getting info about their 
dimensions?

I could write a C function for that. For PNG it is quite easy but for 
JPEG it gets a little bit complicated, albeit doable, just asking for 
something out of the box. Currently we load images (in our java 
enterprise system) and filter them in Java, but this brings wildfly down 
to its knees pretty easy and quickly.


Thank you and happy Easter.




Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?

От
Adam Brusselback
Дата:
Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.

You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.

Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?

От
Achilleas Mantzios
Дата:

On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:

Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.

Yes I thought of that, but those are coming automatically from our mail server (via synonym), we have written an alias : a program that parses and stores emails. This is generic, I wouldn't like to add specific code (or specific columns)  just for image attachments. However I dig the idea of the indexes.
You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.
As I describe above, those attachments are nowhere as files. They are email attachments. Also we got about half TB of them.

Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?

От
Steve Atkins
Дата:
On 17/04/2020 13:37, Achilleas Mantzios wrote:
> Hello Dear List,
>
> we have a table holding email attachments as bytea, and we would like 
> to filter out images of small dimensions, which are not of any value 
> to our logic.
>
> I took a look at pg_image extension, tested it, and it proved 
> problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped 
> the extension and uninstalled this as soon as fsck finally finished.

If running an extension crashed your server you should look at how / 
why, especially if it corrupted your filesystem.

That shouldn't happen on a correctly configured system, so the 
underlying issue might cause you other problems. Crashing postgresql, 
sure, but not anything that impacts the rest of the server.

Cheers,
   Steve




Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?

От
Achilleas Mantzios
Дата:
On 17/4/20 5:47 μ.μ., Steve Atkins wrote:

>
> If running an extension crashed your server you should look at how / 
> why, especially if it corrupted your filesystem.
> That shouldn't happen on a correctly configured system, so the 
> underlying issue might cause you other problems. Crashing postgresql, 
> sure, but not anything that impacts the rest of the server.
>
Hello, This machine runs several extensions with no issues (even pljava 
for Christ's sake, our heavy modified version of DBMirror, and lots of 
our own C functions included among others), two bhyve VMs running 
ubuntu, and one jail. + it functions as my workstation as well (wildfly, 
eclipse, etc). And it can run for years, without reboot.

Apparently lousy memory management (consumed all 32GB of RAM + 8GB swap) 
by pg_image didn't crash postgresql but brought the system to its knees. 
Plus this extension was lastly touched in 2013, go figure.

> Cheers,
>   Steve
>
>
>



Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?

От
Imre Samu
Дата:
> it killed my 200+ days uptime FreeBSD box :( .
> As I describe above, those attachments are nowhere as files. 
> They are email attachments. Also we got about half TB of them.

it is possible - that  some image is a "decompression bomb" ?

"Because of the efficient compression method used in Portable Network Graphics (PNG) files, a small PNG file can expand tremendously, acting as a "decompression bomb". Malformed PNG chunks can consume a large amount of CPU and wall-clock time and large amounts of memory, up to all memory available on a system, causing a Denial of Service (DoS). Libpng-1.4.1 has been revised to use less CPU time and memory, and provides functions that applications can use to further defend against such files."

Regards,
 Imre


Achilleas Mantzios <achill@matrix.gatewaynet.com> ezt írta (időpont: 2020. ápr. 17., P, 16:39):

On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:

Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.

Yes I thought of that, but those are coming automatically from our mail server (via synonym), we have written an alias : a program that parses and stores emails. This is generic, I wouldn't like to add specific code (or specific columns)  just for image attachments. However I dig the idea of the indexes.
You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.
As I describe above, those attachments are nowhere as files. They are email attachments. Also we got about half TB of them.

Re: easy way to acquire height / width from images (PNG, JPEG) storedas bytea?

От
Achilleas Mantzios
Дата:


On 17/4/20 6:14 μ.μ., Imre Samu wrote:
> it killed my 200+ days uptime FreeBSD box :( .
> As I describe above, those attachments are nowhere as files. 
> They are email attachments. Also we got about half TB of them.

it is possible - that  some image is a "decompression bomb" ?

"Because of the efficient compression method used in Portable Network Graphics (PNG) files, a small PNG file can expand tremendously, acting as a "decompression bomb". Malformed PNG chunks can consume a large amount of CPU and wall-clock time and large amounts of memory, up to all memory available on a system, causing a Denial of Service (DoS). Libpng-1.4.1 has been revised to use less CPU time and memory, and provides functions that applications can use to further defend against such files."

Regards,
 Imre


Thank you  a lot Imre. Great info.




Achilleas Mantzios <achill@matrix.gatewaynet.com> ezt írta (időpont: 2020. ápr. 17., P, 16:39):

On 17/4/20 4:09 μ.μ., Adam Brusselback wrote:

Why not extract and store that metadata with the image rather than trying to extract it to filter on at query time? That way you can index your height and width columns to speed up that filtering if necessary.

Yes I thought of that, but those are coming automatically from our mail server (via synonym), we have written an alias : a program that parses and stores emails. This is generic, I wouldn't like to add specific code (or specific columns)  just for image attachments. However I dig the idea of the indexes.
You may be able to write a wrapper for a command line tool like imagemagic or something so you can call that from a function to determine the size if you did want to stick with extracting that at query time.
As I describe above, those attachments are nowhere as files. They are email attachments. Also we got about half TB of them.