Обсуждение: Using the right tool
I spent about 12 hours crawling over Google searches, Usenet archives, and a great deal of GIS matter without finding much in the way of solid answers. Perhaps someone on this list can provide a bit of insight. I'm in the need of cataloging a growing collection of USGS satellite imagery. That in itself has been handled rather handily using flat files, but with a recent decision to acquire non-USGS data (potentially tens of thousands of images) covering geographical regions, the need for a reliable catalog is paramount. You can imaging how easily this would become a management nightmare. This data will be tagged, cataloged, and filed by me, rather than the easy-to-handle USGS data that has a common packaging and naming style. Most people who create an image library are doing so for web-based applications, usually, it seems, using MySQL. While I have no convictions to either database (it's just a tool, so the end is more important than the means), I've decided to use Postgres since it has strong ties to the GIS community. But for this task, I've encountered little in the way of stories from users who have used Postgres to catalog image data, despite fairly intense digging. The thing is, these are not small images by any means. So what works well for a few thousand < 250k JPEGS might not work so well for a few thousand 25-125 meg TIFFs and SID files. Can anyone suggest some references for storage of binary objects of this magnitude (100 gig)? While it isn't particularly important to store the file in the database itself, it sure would simplify things -- that way if the file is moved, the db link isn't broken. Rather than reinvent the wheel and build a database and interface solution, is any information available on implementations such as this? Regards, Nathan Hopper
"Nathan Hopper" <nathanh@broszengineering.com> writes: > But for this task, I've encountered little in the way of stories from users > who have used Postgres to catalog image data, despite fairly intense > digging. The thing is, these are not small images by any means. So what > works well for a few thousand < 250k JPEGS might not work so well for a few > thousand 25-125 meg TIFFs and SID files. > Can anyone suggest some references for storage of binary objects of this > magnitude (100 gig)? Nathan, I'd suggest talking to the guys at Refractions Inc (Dave Blasby and Paul Ramsey). I'm pretty sure they have direct experience with exactly this sort of thing ... but I dunno how closely they follow the PG mailing lists, so they might not see your request. regards, tom lane
I'm running a simple query on a simple table (see create syntax below). before running vacuum on the table explain tells me that the index "mytable_id_name_idx" is being used after running vacuum on the table explain tells me that a sequential scan is being used. If I run reindex, I'm back to the index being used. Any ideas why this is happening? PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00) Output of the commands below CREATE TABLE mytable ( id INT NOT NULL, name TEXT NOT NULL, num INT NOT NULL, answer INT NOT NULL, field1 INT, field2 INT, field3 TEXT, field4 TEXT ); CREATE INDEX mytable_id_name_idx ON mytable (id,name); CREATE INDEX mytable_num_idx ON mytable (num); CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num); aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; NOTICE: QUERY PLAN: Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1 width=116) EXPLAIN aw_db_joe_1=# vacuum mytable; VACUUM aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; NOTICE: QUERY PLAN: Seq Scan on mytable (cost=0.00..0.00 rows=1 width=116) EXPLAIN aw_db_joe_1=# reindex table mytable; REINDEX aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; NOTICE: QUERY PLAN: Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1 width=116) EXPLAIN
Firstly, how many rows in the table? If it's less than a few dozen, a seq scan is the right answer. Secondly, vacuum analyse is usually recommended reasonably often. HTH, On Mon, Sep 09, 2002 at 03:04:30PM +0100, Joe Murphy wrote: > I'm running a simple query on a simple table (see create syntax below). > > before running vacuum on the table explain tells me that the index > "mytable_id_name_idx" is being used > after running vacuum on the table explain tells me that a sequential scan is > being used. > If I run reindex, I'm back to the index being used. > > Any ideas why this is happening? > > PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00) > > Output of the commands below > > > CREATE TABLE mytable ( > id INT NOT NULL, > name TEXT NOT NULL, > num INT NOT NULL, > answer INT NOT NULL, > field1 INT, > field2 INT, > field3 TEXT, > field4 TEXT > ); > > CREATE INDEX mytable_id_name_idx ON mytable (id,name); > CREATE INDEX mytable_num_idx ON mytable (num); > CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num); > > > aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; > NOTICE: QUERY PLAN: > > Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1 > width=116) > > EXPLAIN > > aw_db_joe_1=# vacuum mytable; > VACUUM > > aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; > NOTICE: QUERY PLAN: > > Seq Scan on mytable (cost=0.00..0.00 rows=1 width=116) > > EXPLAIN > > aw_db_joe_1=# reindex table mytable; > REINDEX > > aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; > NOTICE: QUERY PLAN: > > Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1 > width=116) > > EXPLAIN > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
There were fewer than a dozen rows OK, I've tested with ~1000 and indexes are used. Thanks. Martijn van Oosterhout wrote: Firstly, how many rows in the table? If it's less than a few dozen, a seq scan is the right answer. Secondly, vacuum analyse is usually recommended reasonably often. HTH, On Mon, Sep 09, 2002 at 03:04:30PM +0100, Joe Murphy wrote: > I'm running a simple query on a simple table (see create syntax below). > > before running vacuum on the table explain tells me that the index > "mytable_id_name_idx" is being used > after running vacuum on the table explain tells me that a sequential scan is > being used. > If I run reindex, I'm back to the index being used. > > Any ideas why this is happening? > > PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00) > > Output of the commands below > > > CREATE TABLE mytable ( > id INT NOT NULL, > name TEXT NOT NULL, > num INT NOT NULL, > answer INT NOT NULL, > field1 INT, > field2 INT, > field3 TEXT, > field4 TEXT > ); > > CREATE INDEX mytable_id_name_idx ON mytable (id,name); > CREATE INDEX mytable_num_idx ON mytable (num); > CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num); > > > aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; > NOTICE: QUERY PLAN: > > Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1 > width=116) > > EXPLAIN > > aw_db_joe_1=# vacuum mytable; > VACUUM > > aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; > NOTICE: QUERY PLAN: > > Seq Scan on mytable (cost=0.00..0.00 rows=1 width=116) > > EXPLAIN > > aw_db_joe_1=# reindex table mytable; > REINDEX > > aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john'; > NOTICE: QUERY PLAN: > > Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1 > width=116) > > EXPLAIN > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ JoeMurphy - AerSoft Limited 2 Northumberland Avenue, Dun Laoghaire, Co. Dublin. phone: +353-1-2301166 direct: +353-1-2145953 fax: +353-1-2301167 mailto:joe@aersoft.com mobile: +353-86-8526181 http://www.aersoft.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~