On 09/19/2014 04:51 AM, Björn Wittich wrote:
>
> I am relatively new to postgres. I have a table with 500 coulmns and
> about 40 mio rows. I call this cache table where one column is a unique
> key (indexed) and the 499 columns (type integer) are some values
> belonging to this key.
>
> Now I have a second (temporary) table (only 2 columns one is the key of
> my cache table) and I want do an inner join between my temporary table
> and the large cache table and export all matching rows. I found out,
> that the performance increases when I limit the join to lots of small
> parts.
> But it seems that the databases needs a lot of disk io to gather all 499
> data columns.
> Is there a possibilty to tell the databases that all these colums are
> always treated as tuples and I always want to get the whole row? Perhaps
> the disk oraganization could then be optimized?
PostgreSQL is already a row store, which means by default you're getting
all of the columns, and the columns are stored physically adjacent to
each other.
If requesting only 1 or two columns is faster than requesting all of
them, that's pretty much certainly due to transmission time, not disk
IO. Otherwise, please post your schema (well, a truncated version) and
your queries.
BTW, in cases like yours I've used a INT array instead of 500 columns to
good effect; it works slightly better with PostgreSQL's compression.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com