Обсуждение: How Big is Too Big for Tables?

Поиск

Список

Период

Сортировка

How Big is Too Big for Tables?

От

Bill Thoen

Дата:

28 июля 2010 г., 16:56:41

I'm building a national database of agricultural information and one of the layers is a bit more than a gigabyte per state. That's 1-2 million records per state, with a mult polygon geometry, and i've got about 40 states worth of data. I trying to store everything in a single PG table. What I'm concerned about is if I combine every state into one big table then will performance will be terrible, even with indexes? On the other hand, if I store the data in several smaller files, then if a user zooms in on a multi-state region, I've got to build or find a much more complicated way to query multiple files.

So I'm wondering, should I be concerned with building a single national size table (possibly 80-100 Gb) for all these records, or should I keep the files smaller and hope there's something like ogrtindex out there for PG tables? what do you all recommend in this case? I just moved over to Postgres to handle big files, but I don't know its limits. With a background working with MS Access and bitter memories of what happens when you get near Access' two gigabyte database size limit, I'm a little nervous of these much bigger files. So I'd appreciate anyone's advice here.

TIA,
- Bill Thoen

Re: How Big is Too Big for Tables?

От

Vincenzo Romano

Дата:

28 июля 2010 г., 17:00:32

2010/7/28 Bill Thoen <bthoen@gisnet.com>:
> I'm building a national database of agricultural information and one of the
> layers is a bit more than a gigabyte per state. That's 1-2 million records
> per state, with a mult polygon geometry, and i've got about 40 states worth
> of data. I trying to store everything in a single PG table. What I'm
> concerned about is if I combine every state into one big table then will
> performance will be terrible, even with indexes? On the other hand, if I
> store the data in several smaller files, then if a user zooms in on a
> multi-state region,  I've got  to build or find a much more complicated way
> to query multiple files.
>
> So I'm wondering, should I be concerned with building a single national size
> table (possibly 80-100 Gb) for all these records, or should I keep the files
> smaller and hope there's something like ogrtindex out there for PG tables?
> what do you all recommend in this case? I just moved over to Postgres to
> handle big files, but I don't know its limits. With a background working
> with MS Access and bitter memories of what happens when you get near
> Access'  two gigabyte database size limit, I'm a little nervous of these
> much bigger files. So I'd appreciate anyone's advice here.
>

AFAIK it could be just a matter of how much RAM do you have, DDL and
DML (aka queries).
Hitting the real PG limits it's quite hard, even in your case.


--
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS

Re: How Big is Too Big for Tables?

От

"Joshua D. Drake"

Дата:

28 июля 2010 г., 17:03:47

On Wed, 2010-07-28 at 11:09 -0600, Bill Thoen wrote:
> I'm building a national database of agricultural information and one
> of the layers is a bit more than a gigabyte per state. That's 1-2
> million records per state, with a mult polygon geometry, and i've got
> about 40 states worth of data. I trying to store everything in a
> single PG table. What I'm concerned about is if I combine every state
> into one big table then will performance will be terrible, even with
> indexes? On the other hand, if I store the data in several smaller
> files, then if a user zooms in on a multi-state region,  I've got  to
> build or find a much more complicated way to query multiple files.
>
> So I'm wondering, should I be concerned with building a single
> national size table (possibly 80-100 Gb) for all these records, or
> should I keep the files smaller and hope there's something like
> ogrtindex out there for PG tables? what do you all recommend in this
> case?

80-100Gb isn't that much. However it may be worth looking into
partitioning by state.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: How Big is Too Big for Tables?

От

Alex Thurlow

Дата:

28 июля 2010 г., 17:13:47

You should look at table partitioning. That is, you make a master table and then make a table for each state that would inherit the master. That way you can query each state individually or you can query the whole country if need be.

http://www.postgresql.org/docs/current/static/ddl-partitioning.html

On 7/28/2010 12:09 PM, Bill Thoen wrote:

I'm building a national database of agricultural information and one of the layers is a bit more than a gigabyte per state. That's 1-2 million records per state, with a mult polygon geometry, and i've got about 40 states worth of data. I trying to store everything in a single PG table. What I'm concerned about is if I combine every state into one big table then will performance will be terrible, even with indexes? On the other hand, if I store the data in several smaller files, then if a user zooms in on a multi-state region, I've got to build or find a much more complicated way to query multiple files.

So I'm wondering, should I be concerned with building a single national size table (possibly 80-100 Gb) for all these records, or should I keep the files smaller and hope there's something like ogrtindex out there for PG tables? what do you all recommend in this case? I just moved over to Postgres to handle big files, but I don't know its limits. With a background working with MS Access and bitter memories of what happens when you get near Access' two gigabyte database size limit, I'm a little nervous of these much bigger files. So I'd appreciate anyone's advice here.

TIA,
- Bill Thoen

Re: How Big is Too Big for Tables?

От

P Kishor

Дата:

28 июля 2010 г., 17:22:53

On Wed, Jul 28, 2010 at 12:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> On Wed, 2010-07-28 at 11:09 -0600, Bill Thoen wrote:
>> I'm building a national database of agricultural information and one
>> of the layers is a bit more than a gigabyte per state. That's 1-2
>> million records per state, with a mult polygon geometry, and i've got
>> about 40 states worth of data. I trying to store everything in a
>> single PG table. What I'm concerned about is if I combine every state
>> into one big table then will performance will be terrible, even with
>> indexes? On the other hand, if I store the data in several smaller
>> files, then if a user zooms in on a multi-state region,  I've got  to
>> build or find a much more complicated way to query multiple files.
>>
>> So I'm wondering, should I be concerned with building a single
>> national size table (possibly 80-100 Gb) for all these records, or
>> should I keep the files smaller and hope there's something like
>> ogrtindex out there for PG tables? what do you all recommend in this
>> case?
>
> 80-100Gb isn't that much. However it may be worth looking into
> partitioning by state.
>

See http://archives.postgresql.org/pgsql-general/2010-07/msg00691.php
for details, but here is a summary.

My experience has not been the greatest. I have been trying to figure
out if I can store a few hundred million rows, and have experienced a
great number of problems.

One. Loading the data is a problem. COPY is the quickest way (I was
able to achieve a max of about 20,000 inserts per second). However,
you need to make sure there are no indexes, not even a primary key, in
order to extract maximum speed. That means, you have to load
*everything* in one go. If you load in stages, you have drop all the
indexes, then load, then rebuild the indexes. Next time you want to
load more data, you to repeat this process. Building the indexes takes
a long time, so experimenting is a chore.

Two. Partitioning is not the perfect solution. My database will
ultimately have about 13 million rows per day (it is daily data) for
about 25 years. So, I need either --

- One big table with 25 * 365 * 13 million rows. Completely undoable.
- 25 yearly tables with 365 * 13 million rows each. Still a huge
chore, very slow queries.
- 25 * 365 tables with 13 million rows each. More doable, but
partitioning doesn't work.

Three. At least, in my case, the overhead is too much. My data are
single bytes, but the smallest data type in Pg is smallint (2 bytes).
That, plus the per row overhead adds to a fair amount of overhead.

I haven't yet given up on storing this specific dataset in Pg, but am
reconsidering. It is all readonly data, so flat files might be better
for me.

In other words, Pg is great, but do tests, benchmark, research before
committing to a strategy. Of course, since you are storing geometries,
Pg is a natural choice for you. My data are not geometries, so I can
explore alternatives for it, while keeping my geographic data in Pg.

Hope this helps, or, at least provides an alternative view point.

> Sincerely,
>
> Joshua D. Drake
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

Re: How Big is Too Big for Tables?

От

Terry Fielder

Дата:

28 июля 2010 г., 17:40:12

If all the table files are the same structure, its really not hard, just a UNION clause. Indeed, one can even create a VIEW that leverages that union clause to simplify the code that needs to grab from the multiple tables. As far as indexes, "single table" COULD be OK if you throw enough hardware at it. But if the data changes a lot and vacumming/index rebuilding is not keeping up, well it could get degraded performance even with high end hardware. Let's look at your indexes, are they to be of 3-4 columns or less? Likely you will be OK. If there are several or more columns, your indexes will be massive and then performance drops off with increased paging on even just index usage. NOTE: If you compile the data into a SINGLE table, you could always break up your table into smaller tables using SELECT INTO statements that grab by state. Then your queries that assume a single table for all states need to be tweaked to use union or (even better) tweaked to use a VIEW that already implements a union. If a lot of querying would use the UNION'd view, you probably want to avoid that. If its not very often, or "OK to wait a little bit longer", the union will allow you to break up the data with probably only minor impact when you need multiple states reported together. You likely probably might almost sort of maybe be best to do a test case on your hardware first, even if dummy meaningless data populated by a script, it will give you a measurement of your expected performance that is much more meaningful then my ramble above. :) Terry

Terry Fielder
terry@greatgulfhomes.com
Associate Director Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
Fax: (416) 441-9085

Bill Thoen wrote:

I'm building a national database of agricultural information and one of the layers is a bit more than a gigabyte per state. That's 1-2 million records per state, with a mult polygon geometry, and i've got about 40 states worth of data. I trying to store everything in a single PG table. What I'm concerned about is if I combine every state into one big table then will performance will be terrible, even with indexes? On the other hand, if I store the data in several smaller files, then if a user zooms in on a multi-state region, I've got to build or find a much more complicated way to query multiple files.

So I'm wondering, should I be concerned with building a single national size table (possibly 80-100 Gb) for all these records, or should I keep the files smaller and hope there's something like ogrtindex out there for PG tables? what do you all recommend in this case? I just moved over to Postgres to handle big files, but I don't know its limits. With a background working with MS Access and bitter memories of what happens when you get near Access' two gigabyte database size limit, I'm a little nervous of these much bigger files. So I'd appreciate anyone's advice here.

TIA,
- Bill Thoen

Re: How Big is Too Big for Tables?

От

Bryan Hinton

Дата:

28 июля 2010 г., 17:40:29

Under the assumption that you properly modeled the data -  achieved a
nice balance of normalization and de-normalization, examined the size of
your relations in such a context, and accounted for
how the data will grow over time and if it will grow over time, then
partitioning, as Joshua mentioned, could be an advantageous route to
explore.  The user-interface component, namely, "zooming" in and out,
should remain an abstraction at this point.  My two cents but it sounds
like a lot of groundwork needs to be done first.


On 7/28/10 12:04 PM, Alex Thurlow wrote:
>  You should look at table partitioning.  That is, you make a master
> table and then make a table for each state that would inherit the
> master.  That way you can query each state individually or you can query
> the whole country if need be.
>
> http://www.postgresql.org/docs/current/static/ddl-partitioning.html
>
> On 7/28/2010 12:09 PM, Bill Thoen wrote:
>> I'm building a national database of agricultural information and one
>> of the layers is a bit more than a gigabyte per state. That's 1-2
>> million records per state, with a mult polygon geometry, and i've got
>> about 40 states worth of data. I trying to store everything in a
>> single PG table. What I'm concerned about is if I combine every state
>> into one big table then will performance will be terrible, even with
>> indexes? On the other hand, if I store the data in several smaller
>> files, then if a user zooms in on a multi-state region,  I've got  to
>> build or find a much more complicated way to query multiple files.
>>
>> So I'm wondering, should I be concerned with building a single
>> national size table (possibly 80-100 Gb) for all these records, or
>> should I keep the files smaller and hope there's something like
>> ogrtindex out there for PG tables? what do you all recommend in this
>> case? I just moved over to Postgres to handle big files, but I don't
>> know its limits. With a background working with MS Access and bitter
>> memories of what happens when you get near Access'  two gigabyte
>> database size limit, I'm a little nervous of these much bigger files.
>> So I'd appreciate anyone's advice here.
>>
>> TIA,
>> - Bill Thoen
>>
>

Re: How Big is Too Big for Tables?

От

Vincenzo Romano

Дата:

28 июля 2010 г., 18:11:26

2010/7/28 P Kishor <punk.kish@gmail.com>:
...
> Two. Partitioning is not the perfect solution. My database will
> ultimately have about 13 million rows per day (it is daily data) for
> about 25 years. So, I need either --
>
> - One big table with 25 * 365 * 13 million rows. Completely undoable.
> - 25 yearly tables with 365 * 13 million rows each. Still a huge
> chore, very slow queries.
> - 25 * 365 tables with 13 million rows each. More doable, but
> partitioning doesn't work.
>
> Three. At least, in my case, the overhead is too much. My data are
> single bytes, but the smallest data type in Pg is smallint (2 bytes).
> That, plus the per row overhead adds to a fair amount of overhead.
>
> I haven't yet given up on storing this specific dataset in Pg, but am
> reconsidering. It is all readonly data, so flat files might be better
> for me.
>
> In other words, Pg is great, but do tests, benchmark, research before
> committing to a strategy. Of course, since you are storing geometries,
> Pg is a natural choice for you. My data are not geometries, so I can
> explore alternatives for it, while keeping my geographic data in Pg.

That recalls me an old inquiry of mine on the list about "enterprise
grade" (or whatever you want to call it) solutions.
That means, "really lots of rows" or, alternatively "really lots of tables in
the hierarchy" or, again, "really lots of partial indexes".

Partitioning is not going to work probably because coping with
thousands of tables in a hierarchy would hit against some "linear"
algorithm inside the query planner, even with constraint exclusion.

Maybe "multilevel" hierarchy (let's say partitioning by months (12)
on the first level *and* by day (28,29,30 or 31) on the second one)
would do the magics, but here the DDL would be quite killing,
even with some PL/PGSQL helper function.

The "linearity" of the index selection killed the performances also in
the "really lots of partial indexes" approach.

--
NotOrAnd Information Technologies
Vincenzo Romano
--
NON QVIETIS MARIBVS NAVTA PERITVS

Re: How Big is Too Big for Tables?

От

Otandeka Simon Peter

Дата:

28 июля 2010 г., 18:30:16

There are Postgres Enterprise solutions available although I think they are commercial. You may want to take a look and see if they can be helpful to you.

On Wed, Jul 28, 2010 at 8:44 PM, Vincenzo Romano <vincenzo.romano@notorand.it> wrote:

2010/7/28 P Kishor <punk.kish@gmail.com>:
...
> Two. Partitioning is not the perfect solution. My database will
> ultimately have about 13 million rows per day (it is daily data) for
> about 25 years. So, I need either --
>
> - One big table with 25 * 365 * 13 million rows. Completely undoable.
> - 25 yearly tables with 365 * 13 million rows each. Still a huge
> chore, very slow queries.
> - 25 * 365 tables with 13 million rows each. More doable, but
> partitioning doesn't work.
>
> Three. At least, in my case, the overhead is too much. My data are
> single bytes, but the smallest data type in Pg is smallint (2 bytes).
> That, plus the per row overhead adds to a fair amount of overhead.
>
> I haven't yet given up on storing this specific dataset in Pg, but am
> reconsidering. It is all readonly data, so flat files might be better
> for me.
>
> In other words, Pg is great, but do tests, benchmark, research before
> committing to a strategy. Of course, since you are storing geometries,
> Pg is a natural choice for you. My data are not geometries, so I can
> explore alternatives for it, while keeping my geographic data in Pg.

That recalls me an old inquiry of mine on the list about "enterprise
grade" (or whatever you want to call it) solutions.
That means, "really lots of rows" or, alternatively "really lots of tables in
the hierarchy" or, again, "really lots of partial indexes".

Partitioning is not going to work probably because coping with
thousands of tables in a hierarchy would hit against some "linear"
algorithm inside the query planner, even with constraint exclusion.

Maybe "multilevel" hierarchy (let's say partitioning by months (12)
on the first level *and* by day (28,29,30 or 31) on the second one)
would do the magics, but here the DDL would be quite killing,
even with some PL/PGSQL helper function.

The "linearity" of the index selection killed the performances also in
the "really lots of partial indexes" approach.

--
NotOrAnd Information Technologies
Vincenzo Romano
--
NON QVIETIS MARIBVS NAVTA PERITVS

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: How Big is Too Big for Tables?

От

Stephen Frost

Дата:

28 июля 2010 г., 18:38:40

* P Kishor (punk.kish@gmail.com) wrote:
> Three. At least, in my case, the overhead is too much. My data are
> single bytes, but the smallest data type in Pg is smallint (2 bytes).
> That, plus the per row overhead adds to a fair amount of overhead.

My first reaction to this would be- have you considered aggregating the
data before putting it into the database in such a way that you put more
than 1 byte of data on each row..?  That could possibly reduce the
number of rows you have by quite a bit and also reduce the impact of the
per-tuple overhead in PG..

    Thanks,

        Stephen

Вложения

signature.asc

Re: How Big is Too Big for Tables?

От

P Kishor

Дата:

28 июля 2010 г., 19:06:01

On Wed, Jul 28, 2010 at 1:38 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * P Kishor (punk.kish@gmail.com) wrote:
>> Three. At least, in my case, the overhead is too much. My data are
>> single bytes, but the smallest data type in Pg is smallint (2 bytes).
>> That, plus the per row overhead adds to a fair amount of overhead.
>
> My first reaction to this would be- have you considered aggregating the
> data before putting it into the database in such a way that you put more
> than 1 byte of data on each row..?  That could possibly reduce the
> number of rows you have by quite a bit and also reduce the impact of the
> per-tuple overhead in PG..
>

each row is half a dozen single byte values, so, it is actually 6
bytes per row (six columns). Even if I combine them somehow, still the
per row overhead (which, I believe, is about 23 bytes) is more than
the data. But, that is not the issue. First, I can't really merge
several days into one row. While it might make for fewer rows, it will
complicate my data extraction and analysis life very complicated.

The real issue is that once I put a 100 million rows in the table,
basically the queries became way too slow. Of course, I could (and
should) upgrade my hardware -- I am using a dual Xeon 3 GHz server
with 12 GB RAM, but there are limits to that route.

Keep in mind, the circa 100 million rows was for only part of the db.
If I were to build the entire db, I would have about 4 billion rows
for a year, if I were to partition the db by years. And, partitioning
by days resulted in too many tables.

I wish there were a way around all this so I could use Pg, with my
available resources, but it looks bleak right now.

>        Thanks,
>
>                Stephen
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkxQeSIACgkQrzgMPqB3kihjYgCeMx2awmTE4IfAHgtws8iKhteN
> cnMAoIp2g2Zfo00GC7du16nwBht3Kt1O
> =7tdl
> -----END PGP SIGNATURE-----
>
>

--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

Re: How Big is Too Big for Tables?

От

Vick Khera

Дата:

28 июля 2010 г., 19:17:10

On Wed, Jul 28, 2010 at 3:05 PM, P Kishor <punk.kish@gmail.com> wrote:
> Keep in mind, the circa 100 million rows was for only part of the db.
> If I were to build the entire db, I would have about 4 billion rows
> for a year, if I were to partition the db by years. And, partitioning
> by days resulted in too many tables.
>

Don't partition by arbitrary slices.  Find out what your queries are
and partition across the most common of those, possibly in two
dimensions even.  Without knowing what kinds of queries you do it is
hard to suggest things that may actually benefit you.  Are you using
one of the advanced data types in postgres that deals with spatial
data?

Additionally, if you're trying to have 4 billion rows of data and only
have a 12GB RAM on your box, no matter your choice of DB it will be
slow.

Re: How Big is Too Big for Tables?

От

Peter Bex

Дата:

28 июля 2010 г., 19:55:10

On Wed, Jul 28, 2010 at 02:05:47PM -0500, P Kishor wrote:
> each row is half a dozen single byte values, so, it is actually 6
> bytes per row (six columns). Even if I combine them somehow, still the
> per row overhead (which, I believe, is about 23 bytes) is more than
> the data. But, that is not the issue.

I had a design like that for an application too. I thought it was
not an issue, but the row overhead causes memory and disk usage to
skyrocket, and will cause queries to slow down to a grind.  The solution
for me was to group my values logically together and store them in the
same row somehow.  In my case, this worked by storing all the values for
one measuring point (timestamp) in an array field, with the array indices
being stored in a bookkeeping table (each measuring moment produced the
same number of values for me, so I was able to do this).

Extracting one value from a long array (some datasets include thousands
of values per measuring moment) is extremely fast. You can also easily
make indices on those array dereferences you need to search on, if those
are always the same.

> First, I can't really merge
> several days into one row. While it might make for fewer rows, it will
> complicate my data extraction and analysis life very complicated.

Perhaps you could put all days of a month in an array, indexed by day
of the month?  That wouldn't be too hard for your logic to deal with,
I think.

> The real issue is that once I put a 100 million rows in the table,
> basically the queries became way too slow.

I had the same issue.  Partitioning falls flat on its face once you're
dealing with such insane amounts of data.  In my experience if your
partitions aren't constant and will keep growing, you will face problems
sooner or later.  If you do partitioning the traditional way by
inheriting the table, you'll also run into additional trouble since for
some operations Postgres will need to obtain a handle on all partitions
and that will easily cause you to run out of shared memory.  You can
increase max_locks_per_transaction, but that's undoable if the number
of partitions keeps growing. You need to keep increasing that value all
the time...

> Of course, I could (and should) upgrade my hardware -- I am using a
> dual Xeon 3 GHz server with 12 GB RAM, but there are limits to that route.

Always try to solve it by changing your data design first, unless what
you're trying to do is fundamentally limited by hardware.  You're not
likely going to request all those record at once, nor will you need to
search through all of them; try to come up with a sane way of quickly
slicing your data to a smaller set which can be quickly retrieved.

> Keep in mind, the circa 100 million rows was for only part of the db.
> If I were to build the entire db, I would have about 4 billion rows
> for a year, if I were to partition the db by years. And, partitioning
> by days resulted in too many tables.

Yeah, sounds similar to the troubles I ran into in my project.

> I wish there were a way around all this so I could use Pg, with my
> available resources, but it looks bleak right now.

Try using the array approach.

Possibly you could create columns for each week or month in a year
and store the individual days in an array in that column.  Extracting
those shouldn't be too hard.

You could store the different types of data you have in different rows
for each unit of information you want to store for a day.

Alternatively, store your data points all in one row, and store a row
for each day.  You could easily start partitioning historical data per
year or per decade.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: How Big is Too Big for Tables?

От

"Joshua D. Drake"

Дата:

29 июля 2010 г., 00:03:02

On Wed, 2010-07-28 at 11:09 -0600, Bill Thoen wrote:
> I'm building a national database of agricultural information and one
> of the layers is a bit more than a gigabyte per state. That's 1-2
> million records per state, with a mult polygon geometry, and i've got
> about 40 states worth of data. I trying to store everything in a
> single PG table. What I'm concerned about is if I combine every state
> into one big table then will performance will be terrible, even with
> indexes? On the other hand, if I store the data in several smaller
> files, then if a user zooms in on a multi-state region,  I've got  to
> build or find a much more complicated way to query multiple files.
>
> So I'm wondering, should I be concerned with building a single
> national size table (possibly 80-100 Gb) for all these records, or
> should I keep the files smaller and hope there's something like
> ogrtindex out there for PG tables? what do you all recommend in this
> case?

80-100Gb isn't that much. However it may be worth looking into
partitioning by state.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: How Big is Too Big for Tables?

От

Jacqui Caren-home

Дата:

29 июля 2010 г., 15:59:42

P Kishor wrote:
> On Wed, Jul 28, 2010 at 1:38 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> * P Kishor (punk.kish@gmail.com) wrote:
>>> Three. At least, in my case, the overhead is too much. My data are
>>> single bytes, but the smallest data type in Pg is smallint (2 bytes).
>>> That, plus the per row overhead adds to a fair amount of overhead.
>> My first reaction to this would be- have you considered aggregating the
>> data before putting it into the database in such a way that you put more
>> than 1 byte of data on each row..?  That could possibly reduce the
>> number of rows you have by quite a bit and also reduce the impact of the
>> per-tuple overhead in PG..
> each row is half a dozen single byte values, so, it is actually 6
> bytes per row (six columns).

Hmm six chars - this would not perchance be bio (sequence) or geospacial data?
If so then there are specialist lists out there that can help.
Also quite a few people use Pg for this data and there are some very neat Pg add ons.

Jacqui

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: How Big is Too Big for Tables?

Вложения