Обсуждение: OSS database needed for testing
Folks, Please pardon the cross-posting. A small group of us on the Performance list were discussing the first steps toward constructing a comprehensive Postgresql installation benchmarking tool, mostly to compare different operating systems and file systemsm but later to be used as a foundation for a tuning wizard. To do this, we need one or more real (not randomly generated*) medium-large database which is or can be BSD-licensed (data AND schema). This database must have: 1) At least one "main" table with 12+ columns and 100,000+ rows (each). 2) At least 10-12 additional tables of assorted sizes, at least half of which should have Foriegn Key relationships to the main table(s) or each other. 3) At least one large text or varchar field among the various tables. In addition, the following items would be helpful, but are not required: 4) Views, triggers, and functions built on the database 5) A query log of database activity to give us sample queries to work with. 6) Some complex data types, such as geometric, network, and/or custom data types. Thanks for any leads you can give me! (* To forestall knee-jerk responses: Randomly generated data does not look or perform the same as real data in my professional opinion, and I'm the one writing the test scripts.) -- -Josh Berkus Aglio Database Solutions San Francisco
On Thu, Apr 03, 2003 at 13:26:01 -0500, pgsql@mohawksoft.com wrote: > I don't know that it meets your criteria, but..... > > I have a set of scripts and a program that will load the US Census TigerUA > database into PostgreSQL. The thing is absolutely freak'n huge. I forget > which, but it is either 30g or 60g of data excluding indexes. Are the data model or the loading scripts available publicly? I have the tiger data and a program that uses it to convert addresses to latitude and longitude, but I don't really like the program and was thinking about trying to load the data into a database and do queries against the database to find location.
On Thu, Apr 03, 2003 at 17:19:13 -0500, mlw <pgsql@mohawksoft.com> wrote: > > I have a set of scripts, SQL table defs, a small C program, along with a > set of field with files that loads it into PGSQL using the "copy from > stdin" It works fairly well, but takes a good long time to load it all. > > Should I put it in the download section of my website? Yes. I would be interested in looking at it even if I don't use exactly the same way to do stuff. Taking a logn time to load the data into the database isn't a big deal for me. reading through the tiger (and FIPS) data documentation it seemed like there might be some gotchas in unusual cases and I am not sure the google contest program really handled things right so I would like to see another implementation. I am also interested in the data model as that will save me some time.
I don't know that it meets your criteria, but..... I have a set of scripts and a program that will load the US Census TigerUA database into PostgreSQL. The thing is absolutely freak'n huge. I forget which, but it is either 30g or 60g of data excluding indexes. Also, if that is too much, I have a similar setup to load the FreeDB music database, from www.freedb.org. It has roughly 670,000 entries in "cdtitles" and 8 million entries in "cdsongs." Either one of which, I would be willing to send you the actual DB on cd(s) if you pay for postage and media. > Folks, > > Please pardon the cross-posting. > > A small group of us on the Performance list were discussing the first > steps toward constructing a comprehensive Postgresql installation > benchmarking tool, mostly to compare different operating systems and > file systemsm but later to be used as a foundation for a tuning > wizard. > > To do this, we need one or more real (not randomly generated*) > medium-large database which is or can be BSD-licensed (data AND > schema). This database must have: > > 1) At least one "main" table with 12+ columns and 100,000+ rows (each). > 2) At least 10-12 additional tables of assorted sizes, at least half of > which should have Foriegn Key relationships to the main table(s) or > each other. 3) At least one large text or varchar field among the > various tables. > > In addition, the following items would be helpful, but are not > required: 4) Views, triggers, and functions built on the database > 5) A query log of database activity to give us sample queries to work > with. 6) Some complex data types, such as geometric, network, and/or > custom data types. > > Thanks for any leads you can give me! > > (* To forestall knee-jerk responses: Randomly generated data does not > look or perform the same as real data in my professional opinion, and > I'm the one writing the test scripts.) > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of > broadcast)--------------------------- TIP 1: subscribe and unsubscribe > commands go to majordomo@postgresql.org
Bruno Wolff III wrote: >On Thu, Apr 03, 2003 at 13:26:01 -0500, > pgsql@mohawksoft.com wrote: > > >>I don't know that it meets your criteria, but..... >> >>I have a set of scripts and a program that will load the US Census TigerUA >>database into PostgreSQL. The thing is absolutely freak'n huge. I forget >>which, but it is either 30g or 60g of data excluding indexes. >> >> > >Are the data model or the loading scripts available publicly? >I have the tiger data and a program that uses it to convert addresses >to latitude and longitude, but I don't really like the program and >was thinking about trying to load the data into a database and do >queries against the database to find location. > > > I have a set of scripts, SQL table defs, a small C program, along with a set of field with files that loads it into PGSQL using the "copy from stdin" It works fairly well, but takes a good long time to load it all. Should I put it in the download section of my website?