Обсуждение:
Hi, I sent this message to pgsql-bugs@postresql.org a couple of weeks ago and I got an automated response from pgsql-bugs-owner@hub.org indicating that I'd been subscribed to this list, but I've not seen any traffic from it all since then. Either there are really very few bugs in PostgreSQL (congratulations!) or something is wrong with my subscription. Any (non-automatic) feedback would be welcomed. Here's the message again: I'm using V 7.0 on a Linux machine and I believe I have found a bug in the large object interface provided by libpq. The code below will reproduce it, I hope. Basically it creates a large object, writes six 'a' characters to it, then closes it. Then, in another transaction, it opens the object, seeks to position 1 from the start, writes a 'b', then seeks to position 3 from the start and writes another 'b'. Then it closes the object and COMMITs the transaction. Finally, in a further separate transaction, it calls lo_export to write out the resulting object to a file testloseek.c.lobj I find this file, instead of containing the string 'ababaa' as expected, contains '^@b^@baa' where ^@ is ASCII NUL. Compile with something like gcc -o testloseek testloseek.c -lpq The program sets the PQtrace to STDOUT and writes messages to STDERR, so run it with STDOUT redirected to a log file. This is a C version of a basic regression test of guile-pg, my Guile language bindings for libpq. You may recall I reported a similar bug a year or so ago, and I believed it was then fixed by Tatsuo, after a couple of iterations. I'm sorry to be the bearer of bad news ... Please reply to me directly since I'm not on the list. Thanks Ian #include <stdio.h> #include "libpq-fe.h" #include "libpq/libpq-fs.h" void exec_cmd(PGconn *conn, char *str); main (int argc, char *argv[]) { PGconn *conn; int lobj_fd; char buf[256]; int ret, i; Oid lobj_id; conn = PQconnectdb("dbname=test"); if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Can't connect to backend.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } exec_cmd(conn, "BEGIN TRANSACTION"); PQtrace (conn, stdout); if ((lobj_id = lo_creat(conn, INV_READ | INV_WRITE)) < 0) { fprintf(stderr, "Can't create lobj.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } fprintf(stderr, "lo_creat() returned OID %ld.\n", lobj_id); if ((lobj_fd = lo_open(conn, lobj_id, INV_READ | INV_WRITE)) < 0) { fprintf(stderr, "Can't open lobj.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } fprintf(stderr, "lo_open returned fd = %d.\n", lobj_fd); if ((ret = lo_write(conn, lobj_fd, "aaaaaa", 6)) != 6) { fprintf(stderr, "Can't write lobj.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } ret = lo_close(conn, lobj_fd); printf("lo_close returned %d.\n", ret); exec_cmd(conn, "END TRANSACTION"); exec_cmd(conn, "BEGIN TRANSACTION"); if ((lobj_fd = lo_open(conn, lobj_id, INV_READ | INV_WRITE)) < 0) { fprintf(stderr, "Can't open lobj.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } fprintf(stderr, "lo_open returned fd = %d.\n", lobj_fd); if (ret) fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn)); if ((ret = lo_lseek(conn, lobj_fd, 1, 0)) != 1) { fprintf(stderr, "error (%d) lseeking in large object.\n", ret); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } if ((ret = lo_write(conn, lobj_fd, "b", 1)) != 1) { fprintf(stderr, "Can't write lobj.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } if ((ret = lo_lseek(conn, lobj_fd, 3, 0)) != 3) { fprintf(stderr, "error (%d) lseeking in large object.\n", ret); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } if ((ret = lo_write(conn, lobj_fd, "b", 1)) != 1) { fprintf(stderr, "Can't write lobj.\n"); fprintf(stderr, "ERROR: %s\n", PQerrorMessage(conn)); exit(1); } ret = lo_close(conn, lobj_fd); printf("lo_close returned %d.\n", ret); if (ret) fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn)); PQuntrace(conn); exec_cmd(conn, "END TRANSACTION"); exec_cmd(conn, "BEGIN TRANSACTION"); ret = lo_export(conn, lobj_id, "testloseek.c.lobj"); printf("lo_export returned %d.\n", ret); if (ret != 1) fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn)); exec_cmd(conn, "END TRANSACTION"); exit(0); } void exec_cmd(PGconn *conn, char *str) { PGresult *res; if ((res = PQexec(conn, str)) == NULL) { fprintf(stderr, "Error executing %s.\n", str); fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn)); exit(1); } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error executing %s.\n", str); fprintf(stderr, "Error message: %s\n", PQerrorMessage(conn)); PQclear(res); exit(1); } PQclear(res); } -- Ian Grant, Computer Lab., New Museums Site, Pembroke Street, Cambridge Phone: +44 1223 334420 Personal e-mail: iang at pobox dot com
Ian Grant <Ian.Grant@cl.cam.ac.uk> writes: > I'm using V 7.0 on a Linux machine and I believe I have found a bug in the > large object interface provided by libpq. Thank you for the carefully developed test case. The bug is actually in the backend, not in libpq: there's an ancient hack in inv_write that looks at rel->rd_nblocks to decide if the large object is empty. Unfortunately rd_nblocks isn't maintained very carefully, so the test might mistakenly consider a recently-created LO to be empty, leading to the Wrong Thing. But that hack is no longer necessary, since the index bug it was trying to work around was swatted ages ago; diking out the check suffices to fix it. Another error I noticed while testing the fix is that inv_read doesn't cope gracefully with "holes" in the large object, ie, ranges never written because of a seek and write past the previous end of file. With the attached patch, reads of "holes" reliably return zeroes, as is considered standard behavior for Unix files. (The garbage data you saw was actually from this error, although the case would not have been triggered if not for the first bug.) I believe the attached patch fixes these problems, but I haven't been able to wring it out very thoroughly because I don't have applications that do random seeks and writes in large objects. If you could bang on it a little more and report back, that'd be great. regards, tom lane *** src/backend/storage/large_object/inv_api.c.orig Wed Apr 12 13:15:37 2000 --- src/backend/storage/large_object/inv_api.c Thu Jun 15 02:10:27 2000 *************** *** 185,190 **** --- 185,191 ---- retval->idesc = RelationGetDescr(indr); retval->offset = retval->lowbyte = retval->highbyte = 0; ItemPointerSetInvalid(&(retval->htid)); + retval->flags = 0; if (flags & INV_WRITE) { *************** *** 233,238 **** --- 234,240 ---- retval->idesc = RelationGetDescr(indrel); retval->offset = retval->lowbyte = retval->highbyte = 0; ItemPointerSetInvalid(&(retval->htid)); + retval->flags = 0; if (flags & INV_WRITE) { *************** *** 371,384 **** if (whence == SEEK_CUR) { offset += obj_desc->offset; /* calculate absolute position */ - return inv_seek(obj_desc, offset, SEEK_SET); } ! ! /* ! * if you seek past the end (offset > 0) I have no clue what happens ! * :-( B.L. 9/1/93 ! */ ! if (whence == SEEK_END) { /* need read lock for getsize */ if (!(obj_desc->flags & IFS_RDLOCK)) --- 373,380 ---- if (whence == SEEK_CUR) { offset += obj_desc->offset; /* calculate absolute position */ } ! else if (whence == SEEK_END) { /* need read lock for getsize */ if (!(obj_desc->flags & IFS_RDLOCK)) *************** *** 389,396 **** offset += _inv_getsize(obj_desc->heap_r, obj_desc->hdesc, obj_desc->index_r); - return inv_seek(obj_desc, offset, SEEK_SET); } /* * Whenever we do a seek, we turn off the EOF flag bit to force --- 385,392 ---- offset += _inv_getsize(obj_desc->heap_r, obj_desc->hdesc, obj_desc->index_r); } + /* now we can assume that the operation is SEEK_SET */ /* * Whenever we do a seek, we turn off the EOF flag bit to force *************** *** 414,422 **** * stores the offset of the last byte that appears on it, and we have * an index on this. */ - - - /* right now, just assume that the operation is SEEK_SET */ if (obj_desc->iscan != (IndexScanDesc) NULL) { d = Int32GetDatum(offset); --- 410,415 ---- *************** *** 424,430 **** } else { - ScanKeyEntryInitialize(&skey, 0x0, 1, F_INT4GE, Int32GetDatum(offset)); --- 417,422 ---- *************** *** 487,495 **** /* copy the data from this block into the buffer */ d = heap_getattr(&tuple, 2, obj_desc->hdesc, &isNull); ReleaseBuffer(buffer); ! fsblock = (struct varlena *) DatumGetPointer(d); off = obj_desc->offset - obj_desc->lowbyte; ncopy = obj_desc->highbyte - obj_desc->offset + 1; --- 479,505 ---- /* copy the data from this block into the buffer */ d = heap_getattr(&tuple, 2, obj_desc->hdesc, &isNull); + fsblock = (struct varlena *) DatumGetPointer(d); ReleaseBuffer(buffer); ! /* ! * If block starts beyond current seek point, then we are looking ! * at a "hole" (unwritten area) in the object. Return zeroes for ! * the "hole". ! */ ! if (obj_desc->offset < obj_desc->lowbyte) ! { ! int nzeroes = obj_desc->lowbyte - obj_desc->offset; ! ! if (nzeroes > (nbytes - nread)) ! nzeroes = (nbytes - nread); ! MemSet(buf, 0, nzeroes); ! buf += nzeroes; ! nread += nzeroes; ! obj_desc->offset += nzeroes; ! if (nread >= nbytes) ! break; ! } off = obj_desc->offset - obj_desc->lowbyte; ncopy = obj_desc->highbyte - obj_desc->offset + 1; *************** *** 535,548 **** Buffer buffer; /* ! * Fetch the current inversion file system block. If the class ! * storing the inversion file is empty, we don't want to do an ! * index lookup, since index lookups choke on empty files (should ! * be fixed someday). */ ! if ((obj_desc->flags & IFS_ATEOF) ! || obj_desc->heap_r->rd_nblocks == 0) tuple.t_data = NULL; else inv_fetchtup(obj_desc, &tuple, &buffer); --- 545,555 ---- Buffer buffer; /* ! * Fetch the current inversion file system block. We can skip ! * the work if we already know we are at EOF. */ ! if (obj_desc->flags & IFS_ATEOF) tuple.t_data = NULL; else inv_fetchtup(obj_desc, &tuple, &buffer);
> Thank you for the carefully developed test case. The bug is actually > in the backend, not in libpq: ... snip ... > > I believe the attached patch fixes these problems, but I haven't been > able to wring it out very thoroughly because I don't have applications > that do random seeks and writes in large objects. If you could bang > on it a little more and report back, that'd be great. Hi Tom. Thanks for the response and the quick fix. I'll write some more test cases now. I just thought that whilst thispart of the backend is still fresh in your mind you might consider implementing lo_truncate (the lo_ analog of the unixtruncate system call.) At present there is no way to reduce the size of a large object except by copying to a new one(and then we still can't delete the old one, or can we do that now?) Cheers Ian -- Ian Grant, Computer Lab., New Museums Site, Pembroke Street, Cambridge Phone: +44 1223 334420 Personal e-mail: iang at pobox dot com
Ian Grant <Ian.Grant@cl.cam.ac.uk> writes: > I just thought that whilst this part of the backend is still fresh in > your mind you might consider implementing lo_truncate (the lo_ analog > of the unix truncate system call.) I'll leave that for someone else (Denis Perchine seems to be interested). Bugs are one thing, but new features are another, and large objects aren't high on my personal priority list... regards, tom lane