Обсуждение: OIDs - file objects, are damaged by PostgreSQL.

Поиск
Список
Период
Сортировка

OIDs - file objects, are damaged by PostgreSQL.

От
"Purusothaman A"
Дата:

Hi all,

I am using Postgresql 8.2.

I am using client side api to upload/download files to/from postgresql using calls lo_export()/lo_import();
If I download a file from postgresql, few weeks later, files object's contents got damaged.

I don't know why.

Do any of you have encountered same problem?

Regards,
Purusothaman A


--
http://PurusothamanA.wordpress.com/

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
Richard Huxton
Дата:
Purusothaman A wrote:
> Hi all,
>
> I am using Postgresql 8.2.

8.2.which?

> I am using client side api to upload/download files to/from postgresql
> using
> calls lo_export()/lo_import();
> If I download a file from postgresql, few weeks later, files object's
> contents got damaged.
>
> I don't know why.
>
> Do any of you have encountered same problem?

It's not impossible that PostgreSQL is corrupting your large-objects,
but it does seem unlikely. In what way were the large-object's contents
corrupted? Was the file-length the same? Do you know exactly when it got
corrupted?

--
   Richard Huxton
   Archonet Ltd

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
"Purusothaman A"
Дата:
Thanks Richard Huxton for your reply.

I use client side api for uploading and downloading files.

Its not happening immediately. But when database grows with data, file object got corrupted.

My table structure is as follows.
             Table "public.conf"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 key    | character varying(50)  | not null
 value  | character varying(100) |
Indexes:
    "conf_pkey" PRIMARY KEY, btree ("key")

Content of this table is,
         key         | value
---------------------+--------
 HX                  | 101800
 MASK                | 101801
 Rockey4ND           | 101802
 Threshold           | 60
 Authentication Mode | 2
(5 rows)

In the above, value of HX, MASK, Rockey4ND is 101800, 101801, 101802 (which was returned by lo_import());

Actually for some peculiar reason I  kept "Value" field as var char instead of oid. (this could be reason?...)

This problem occurs only few weeks after uploading files.

Regards,
Purusothaman A
 
On 5/16/07, Richard Huxton <dev@archonet.com> wrote:
Purusothaman A wrote:
> Hi all,
>
> I am using Postgresql 8.2.

8.2.which?

> I am using client side api to upload/download files to/from postgresql
> using
> calls lo_export()/lo_import();
> If I download a file from postgresql, few weeks later, files object's
> contents got damaged.
>
> I don't know why.
>
> Do any of you have encountered same problem?

It's not impossible that PostgreSQL is corrupting your large-objects,
but it does seem unlikely. In what way were the large-object's contents
corrupted? Was the file-length the same? Do you know exactly when it got
corrupted?

--
   Richard Huxton
   Archonet Ltd



--
http://PurusothamanA.wordpress.com/

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
Richard Huxton
Дата:
Purusothaman A wrote:
> Thanks Richard Huxton for your reply.
>
> I use client side api for uploading and downloading files.
>
> Its not happening immediately. But when database grows with data, file
> object got corrupted.

Yes, but *HOW* - is it a different file, length is different, what?

> My table structure is as follows.
>             Table "public.conf"
> Column |          Type          | Modifiers
> --------+------------------------+-----------
> key    | character varying(50)  | not null
> value  | character varying(100) |
> Indexes:
>    "conf_pkey" PRIMARY KEY, btree ("key")
>
> Content of this table is,
>         key         | value
> ---------------------+--------
> HX                  | 101800
> MASK                | 101801
> Rockey4ND           | 101802
> Threshold           | 60
> Authentication Mode | 2
> (5 rows)
>
> In the above, value of HX, MASK, Rockey4ND is 101800, 101801, 101802 (which
> was returned by lo_import());

I find it unlikely that "2" and "60" were returned by lo_import() as
OIDs available for large-objects. You've either got:
1. Some other part of your application(s) overwriting "value"
2. Old data still in "value"
3. On-disk corruption due to crashes/hardware malfunction.
4. You're not showing real values

> Actually for some peculiar reason I  kept "Value" field as var char instead
> of oid. (this could be reason?...)

Hmm - well it's clearly not right, but I don't see how it can cause
errors like this.

> This problem occurs only few weeks after uploading files.

You still haven't said precisely what the problem is.

--
   Richard Huxton
   Archonet Ltd

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
"Purusothaman A"
Дата:
Dear Richard Huxton,

Thanks for your quick reply.

only the first 3 values(HX, MASK, Rockey4ND) are file object's oid value. the other two are are not oid values.

I have shown original output values displayed by postgresql client.

I can explain more.

1. HX is a XML file. after downloading that file I opened that file in word pad application.
In that I have noticed that nearly 20 characters of last line lost.
2. Rockey4ND is a dll file. I was unable to use that dll in my application.

In both cases, I checked file size. Corrupted files are smaller when compare to the original one what I uploaded to postgresql.

Usually this problem arises only after the database become large.

Any suggestion to rectify this problem would be nice of you.

Thanks.

Regards,
Purusothaman A

On 5/23/07, Richard Huxton <dev@archonet.com> wrote:
Purusothaman A wrote:
> Thanks Richard Huxton for your reply.
>
> I use client side api for uploading and downloading files.
>
> Its not happening immediately. But when database grows with data, file
> object got corrupted.

Yes, but *HOW* - is it a different file, length is different, what?

> My table structure is as follows.
>             Table "public.conf"
> Column |          Type          | Modifiers
> --------+------------------------+-----------
> key    | character varying(50)  | not null
> value  | character varying(100) |
> Indexes:
>    "conf_pkey" PRIMARY KEY, btree ("key")
>
> Content of this table is,
>         key         | value
> ---------------------+--------
> HX                  | 101800
> MASK                | 101801
> Rockey4ND           | 101802
> Threshold           | 60
> Authentication Mode | 2
> (5 rows)
>
> In the above, value of HX, MASK, Rockey4ND is 101800, 101801, 101802 (which
> was returned by lo_import());

I find it unlikely that "2" and "60" were returned by lo_import() as
OIDs available for large-objects. You've either got:
1. Some other part of your application(s) overwriting "value"
2. Old data still in "value"
3. On-disk corruption due to crashes/hardware malfunction.
4. You're not showing real values

> Actually for some peculiar reason I  kept "Value" field as var char instead
> of oid. (this could be reason?...)

Hmm - well it's clearly not right, but I don't see how it can cause
errors like this.

> This problem occurs only few weeks after uploading files.

You still haven't said precisely what the problem is.

--
   Richard Huxton
   Archonet Ltd



--
http://PurusothamanA.wordpress.com/

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
Richard Huxton
Дата:
Purusothaman A wrote:
> Dear Richard Huxton,
>
> Thanks for your quick reply.
>
> only the first 3 values(HX, MASK, Rockey4ND) are file object's oid value.
> the other two are are not oid values.

Umm - OK. Can I suggest perhaps having different tables for different
types of data?

> I have shown original output values displayed by postgresql client.
>
> I can explain more.
>
> 1. HX is a XML file. after downloading that file I opened that file in word
> pad application.
> In that I have noticed that nearly 20 characters of last line lost.
> 2. Rockey4ND is a dll file. I was unable to use that dll in my application.
>
> In both cases, I checked file size. Corrupted files are smaller when
> compare
> to the original one what I uploaded to postgresql.

OK, so the data seems OK up until that point?

> Usually this problem arises only after the database become large.

I can't think of any reason why that would make a difference. But, this
does give us a clue. If you have successfully downloaded these files
before, that rules out certain forms of failure.

> Any suggestion to rectify this problem would be nice of you.

Reading through recent release notes, I can't see anything mentioning
lo_import/export, large objects or similar.

http://www.postgresql.org/docs/8.2/static/release-8-2-4.html

Well, your data is stored in pg_largeobject. If you run a SELECT you can
see how it's broken into chunks.
   SELECT loid,pageno,length(data) FROM pg_largeobject ;

On my system, a full chunk is 2048 bytes long. What does the last chunk
of your HX object (101800) look like? Is it a full chunk? Does it end
where your downloaded file ends?

If the data is OK in pg_largeobject then we know we have a problem with
lo_export-ing or saving to a file.

If not, then we know we have a problem with something deleting or
overwriting chunks in pg_largeobject. That would surprise me, because I
don't think there's anything special about pg_largeobject - it's just a
table with chunks of bytea data in it.

Just to recap - you're using lo_import() and lo_export from C (or at
least via libpq) to read/write these files directly to your filesystem.
You've not been seeing crashes and you don't think you've got hardware
problems.

--
   Richard Huxton
   Archonet Ltd

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
"Purusothaman A"
Дата:
Richard Huxton,

Thanks for your detailed reply.

I am maintaining various database of same kind in postgresql.
Here I have shown various corrupted last line of output of select * from pg_largeobject where oid = xxxxxx; in 5 databases.

I have used '\o e:\\filename.xml' before executing query and inspected the output in that file.

Kindly look at the end of line in all 5 outputs.
First 3 output shows few missing characters.
But last 2 output is the perfect one.

1. Output of SFRS2 database:
--------------------------------------------
 101177 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>-2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface_defau
(631 rows)

2. Output of SFRS1 database:
---------------------------------------------
 41642 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>-2.9928278923034668 </stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface_default>\015\012</openc
(631 rows)

3. Output of FASP_AVT database:
--------------------------------------------------
 101800 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>-2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface_default>\015\012</openc
(631 rows)

4. Output of SFRS database: (not yet corrupted)
----------------------------------------------------------------------
 24038 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>- 2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface_default>\015\012</opencv_storage>\015\012
(631 rows)

5. Output of FASP_TEST database: (Not yet corrupted)
--------------------------------------------------------------------------------
 106310 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>- 2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface_default>\015\012</opencv_storage>\015\012
(631 rows)

Can you figure out the reason behind that.
I am waiting for your valuable suggestion.

Thanks.

Regards,
Purusothaman A

On 5/23/07, Richard Huxton <dev@archonet.com> wrote:
Purusothaman A wrote:
> Dear Richard Huxton,
>
> Thanks for your quick reply.
>
> only the first 3 values(HX, MASK, Rockey4ND) are file object's oid value.
> the other two are are not oid values.

Umm - OK. Can I suggest perhaps having different tables for different
types of data?

> I have shown original output values displayed by postgresql client.
>
> I can explain more.
>
> 1. HX is a XML file. after downloading that file I opened that file in word
> pad application.
> In that I have noticed that nearly 20 characters of last line lost.
> 2. Rockey4ND is a dll file. I was unable to use that dll in my application.
>
> In both cases, I checked file size. Corrupted files are smaller when
> compare
> to the original one what I uploaded to postgresql.

OK, so the data seems OK up until that point?

> Usually this problem arises only after the database become large.

I can't think of any reason why that would make a difference. But, this
does give us a clue. If you have successfully downloaded these files
before, that rules out certain forms of failure.

> Any suggestion to rectify this problem would be nice of you.

Reading through recent release notes, I can't see anything mentioning
lo_import/export, large objects or similar.

http://www.postgresql.org/docs/8.2/static/release-8-2-4.html

Well, your data is stored in pg_largeobject. If you run a SELECT you can
see how it's broken into chunks.
   SELECT loid,pageno,length(data) FROM pg_largeobject ;

On my system, a full chunk is 2048 bytes long. What does the last chunk
of your HX object (101800) look like? Is it a full chunk? Does it end
where your downloaded file ends?

If the data is OK in pg_largeobject then we know we have a problem with
lo_export-ing or saving to a file.

If not, then we know we have a problem with something deleting or
overwriting chunks in pg_largeobject. That would surprise me, because I
don't think there's anything special about pg_largeobject - it's just a
table with chunks of bytea data in it.

Just to recap - you're using lo_import() and lo_export from C (or at
least via libpq) to read/write these files directly to your filesystem.
You've not been seeing crashes and you don't think you've got hardware
problems.

--
   Richard Huxton
   Archonet Ltd



--
http://PurusothamanA.wordpress.com/

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
Richard Huxton
Дата:
Purusothaman A wrote:
> Richard Huxton,
>
> Thanks for your detailed reply.
>
> I am maintaining various database of same kind in postgresql.
> Here I have shown various corrupted last line of output of select * from
> pg_largeobject where oid = xxxxxx; in 5 databases.
>
> I have used '\o e:\\filename.xml' before executing query and inspected the
> output in that file.
>
> Kindly look at the end of line in all 5 outputs.
> First 3 output shows few missing characters.
> But last 2 output is the perfect one.
>
> 1. Output of SFRS2 database:
> 2. Output of SFRS1 database:
 > 3. Output of FASP_AVT database:

Let's have a look at the last few bytes of these three rows. For
example, below is a very short file that finishes with "end<NL>".

SELECT loid,pageno,length(data),encode(data, 'hex') FROM pg_largeobject
WHERE loid=340161 AND pageno=0;
   loid  | pageno | length |                             encode
--------+--------+--------+----------------------------------------------------------------
  340161 |      0 |     31 |
54686973206973206120746573742066696c650a6c696e6520320a656e640a
(1 row)


It doesn't seem to be random garbage at the end of your rows, so lets
see what it is.

Oh - things we can rule out:
1. Index corruption. We're not missing the last row.

Adding the length check above will show us if the row is truncated or
full of spaces.

--
   Richard Huxton
   Archonet Ltd

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
"Purusothaman A"
Дата:
Richard Huxton,

In my system also its 2048 bytes chunk.

The below output shows clearly that the last chunk differs in its length.

You might have noticed in my previous mail that the string
"</haarcascade_frontalface_default>\015\012</opencv_storage>\015\012" is missing some characters in SFRS2, SFRS1 and FASP_AVT database outputs.
Have a look at it, In this mail I have bolded the corrucpted part.

sfrs2=> select loid, pageno, length(data) from pg_largeobject where loid = 101177 and pageno = 630;
  loid  | pageno | length
--------+--------+--------
 101177 |    630 |    181
(1 row)

sfrs1=> select loid, pageno, length(data) from pg_largeobject where loid = 41642 and pageno = 630;
 loid  | pageno | length
-------+--------+--------
 41642 |    630 |    193
(1 row)

fasp_avt=> select loid, pageno, length(data) from pg_largeobject where loid = 101800 and pageno = 630;
  loid  | pageno | length
--------+--------+--------
 101800 |    630 |    193
(1 row)

sfrs=> select loid, pageno, length(data) from pg_largeobject where loid = 24038 and pageno = 630;
 loid  | pageno | length
-------+--------+--------
 24038 |    630 |    205
(1 row)

fasp_test=> select loid, pageno, length(data) from pg_largeobject where loid = 106310 and pageno = 630;
  loid  | pageno | length
--------+--------+--------
 106310 |    630 |    205
(1 row)

1. Output of SFRS2 database:
------------------------------
--------------
 101177 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>-2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface_defau
(631 rows)

2. Output of SFRS1 database:
---------------------------------------------
 41642 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>-2.9928278923034668 </stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface _default>\015\012</openc
(631 rows)

3. Output of FASP_AVT database:
--------------------------------------------------
 101800 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>-2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages></haarcascade_frontalface _default>\015\012</openc
(631 rows)

4. Output of SFRS database: (not yet corrupted)
----------------------------------------------------------------------
 24038 |    630 | ight_val></_></_></trees>\015 \012      <stage_threshold>- 2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages>< /haarcascade_frontalface_default>\015\012</opencv_storage>\015\012
(631 rows)

5. Output of FASP_TEST database: (Not yet corrupted)
--------------------------------------------------------------------------------
 106310 |    630 | ight_val></_></_></trees>\015\012      <stage_threshold>- 2.9928278923034668</stage_threshold>\015\012      <parent>23</parent>\015\012      <next>-1</next></_></stages>< /haarcascade_frontalface_default>\015\012</opencv_storage>\015\012
(631 rows)


Have you noticed the bolded strings?  What could have caused to damage that part?

Is it bug in Posstgresql or is they any way to solve this problem.

Thanks.

Regards,
Purusothaman A


On 5/24/07, Richard Huxton <dev@archonet.com> wrote:
Purusothaman A wrote:
> Richard Huxton,
>
> Thanks for your detailed reply.
>
> I am maintaining various database of same kind in postgresql.
> Here I have shown various corrupted last line of output of select * from
> pg_largeobject where oid = xxxxxx; in 5 databases.
>
> I have used '\o e:\\filename.xml' before executing query and inspected the
> output in that file.
>
> Kindly look at the end of line in all 5 outputs.
> First 3 output shows few missing characters.
> But last 2 output is the perfect one.
>
> 1. Output of SFRS2 database:
> 2. Output of SFRS1 database:
> 3. Output of FASP_AVT database:

Let's have a look at the last few bytes of these three rows. For
example, below is a very short file that finishes with "end<NL>".

SELECT loid,pageno,length(data),encode(data, 'hex') FROM pg_largeobject
WHERE loid=340161 AND pageno=0;
   loid  | pageno | length |                             encode
--------+--------+--------+----------------------------------------------------------------
  340161 |      0 |     31 |
54686973206973206120746573742066696c650a6c696e6520320a656e640a
(1 row)


It doesn't seem to be random garbage at the end of your rows, so lets
see what it is.

Oh - things we can rule out:
1. Index corruption. We're not missing the last row.

Adding the length check above will show us if the row is truncated or
full of spaces.

--
   Richard Huxton
   Archonet Ltd



--
http://PurusothamanA.wordpress.com/

Re: [GENERAL] OIDs - file objects, are damaged by PostgreSQL.

От
Richard Huxton
Дата:
Purusothaman A wrote:
> Richard Huxton,
>
> In my system also its 2048 bytes chunk.
>
> The below output shows clearly that the last chunk differs in its length.
>
> You might have noticed in my previous mail that the string
> "</haarcascade_frontalface_default>\015\012</opencv_storage>\015\012" is
> missing some characters in SFRS2, SFRS1 and FASP_AVT database outputs.
> Have a look at it, In this mail I have bolded the corrucpted part.

Yep, spotted that. Hence asking for the length, and it looks like...

>  loid  | pageno | length
> --------+--------+--------
> 101177 |    630 |    181

> 41642 |    630 |    193

> 101800 |    630 |    193

> 24038 |    630 |    205

The data has just been truncated rather than corrupted.

> Is it bug in Posstgresql or is they any way to solve this problem.

Well, something is setting the length too short on these entries. Can
you tell me whether the following statements are all correct?

1. Each database is on a separate machine (that would rule out a
hardware problem)
2. All systems are running on Windows 2000/XP/2003.
3. All systems are version 8.2.4 (if not, please give details)
4. You upload the data with lo_import (once) and download it with
lo_export (many times) and don't alter it in-between.
5. Where the data has been truncated, you know for a fact you downloaded
it OK before (or do you just suspect it was OK?)

If you're not changing the data, and you know it was OK at some point
then there are only two things I can think of:
   1. A hardware problem (which we might rule out above)
   2. A bug in PostgreSQL's vacuum code
Nothing else should be writing to those blocks.

If it looks like a bug in vacuum, we can try to reproduce it, and also
examine the actual contents of the on-disk files (to see if the data is
there on the disk or not). I'll also copy this message over to the
hackers list and see what the developers have to say.

--
   Richard Huxton
   Archonet Ltd