Обсуждение: Transposing rows and columns

Поиск
Список
Период
Сортировка

Transposing rows and columns

От
Aram Fingal
Дата:
I'm working with some people who live and breath Excel.  I need to be able to move data back and forth between formats
whichmake sense for Excel and for PostgreSQL.  In some cases, this is just to accommodate what people are used to.  In
othercases, like statistical clustering, it's something that really has to be done. 

Here is a simplified example:

I'm given data in Excel with one sheet each for a bunch of experiments.  In each sheet, there are rows with different
drugsat different doses and columns for each subject.  The cells contain the response data.  I wrote a Perl script
whichautomates the process of extracting that data into a csv file which can be imported into a table like the
following:

create table results(
expt_no int references experiments(id),
subject int references subjects(id),
drug text references drugs(name),
dose numeric,
response numeric
)

Now, suppose I do some computation on the results in the database and want to export it back out to the same kind of
formatthat I received it (drugs and doses in rows and subjects in columns.)   One method would be to use Perl.  I could
useDBD::Pg and loop through a bunch of queries to build a two dimensional array and then spit that back out but is
therea good way to do this just in SQL?  Is there a better way than creating a temporary table for each subject and
thenjoining all the temp tables?   

-Aram

Re: Transposing rows and columns

От
Uwe Schroeder
Дата:

> I'm working with some people who live and breath Excel.  I need to be able
> to move data back and forth between formats which make sense for Excel and
> for PostgreSQL.  In some cases, this is just to accommodate what people are
> used to.  In other cases, like statistical clustering, it's something that
> really has to be done.
>
> Here is a simplified example:
>
> I'm given data in Excel with one sheet each for a bunch of experiments.  In
> each sheet, there are rows with different drugs at different doses and
> columns for each subject.  The cells contain the response data.  I wrote a
> Perl script which automates the process of extracting that data into a csv
> file which can be imported into a table like the following:
>
> create table results(
> expt_no int references experiments(id),
> subject int references subjects(id),
> drug text references drugs(name),
> dose numeric,
> response numeric
> )
>
> Now, suppose I do some computation on the results in the database and want
> to export it back out to the same kind of format that I received it (drugs
> and doses in rows and subjects in columns.)   One method would be to use
> Perl.  I could use DBD::Pg and loop through a bunch of queries to build a
> two dimensional array and then spit that back out but is there a good way
> to do this just in SQL?  Is there a better way than creating a temporary
> table for each subject and then joining all the temp tables?

You may want to look into the tablefunc contrib module. It contains a crosstab
which will transpose rows and columns in the result.
This may be slow though.

HTH






Re: Transposing rows and columns

От
Sam Mason
Дата:
On Thu, Sep 16, 2010 at 11:42:21AM -0400, Aram Fingal wrote:
> create table results(
> expt_no int references experiments(id),
> subject int references subjects(id),
> drug text references drugs(name),
> dose numeric,
> response numeric
> )

What's the primary key?  I presume it's (expt_no,subject,drug,dose).

> Now, suppose I do some computation on the results in the database and
> want to export it back out to the same kind of format that I received
> it (drugs and doses in rows and subjects in columns.)

Have you tried setting up an ODBC data source to the database and use
the PivotTable functionality in Excel to do the transformation?

If you want to do the transformation in SQL, you'd be writing something
like:

  SELECT drug, dose
    MIN(CASE subject WHEN 1 THEN response END) AS resp_1,
    MIN(CASE subject WHEN 2 THEN response END) AS resp_2,
    MIN(CASE subject WHEN 3 THEN response END) AS resp_3
  FROM results
  WHERE expt_no = 1
    AND subject IN (1,2,3)
  GROUP BY drug, dose
  ORDER BY drug, dose;

Or you can use the tablefunc contrib module as suggested by Uwe.  I
prefer doing it by hand as you get more options, but it can be quite
tedious if you've got lots of columns you're trying to deal with.

If I've got my assumption about primary key wrong then my code, as well
as the tablefunc, will probably both fail to do the "right thing".

--
  Sam  http://samason.me.uk/

Re: Transposing rows and columns

От
Aram Fingal
Дата:
On Sep 16, 2010, at 12:28 PM, Sam Mason wrote:

> On Thu, Sep 16, 2010 at 11:42:21AM -0400, Aram Fingal wrote:
>> create table results(
>> expt_no int references experiments(id),
>> subject int references subjects(id),
>> drug text references drugs(name),
>> dose numeric,
>> response numeric
>> )
>
> What's the primary key?  I presume it's (expt_no,subject,drug,dose).

Yes that's correct.  I copied and simplified from the actual table, which has a lot more in the table definition.


>
>> Now, suppose I do some computation on the results in the database and
>> want to export it back out to the same kind of format that I received
>> it (drugs and doses in rows and subjects in columns.)
>
> Have you tried setting up an ODBC data source to the database and use
> the PivotTable functionality in Excel to do the transformation?

I have thought about that but later on, when we do the full sized experiments, there will be too many rows for Excel to
handle.


>
> If you want to do the transformation in SQL, you'd be writing something
> like:
>
>  SELECT drug, dose
>    MIN(CASE subject WHEN 1 THEN response END) AS resp_1,
>    MIN(CASE subject WHEN 2 THEN response END) AS resp_2,
>    MIN(CASE subject WHEN 3 THEN response END) AS resp_3
>  FROM results
>  WHERE expt_no = 1
>    AND subject IN (1,2,3)
>  GROUP BY drug, dose
>  ORDER BY drug, dose;

That's a good trick to know but I just tried it and found that it begins to get complicated with the actual data.  It
alsoreturns a separate row for each drug/dose/subject combination which isn't exactly what I want.  Each row has one
columnwith a value and the rest of the columns in that row are all <null>. 

>
> Or you can use the tablefunc contrib module as suggested by Uwe.  I
> prefer doing it by hand as you get more options, but it can be quite
> tedious if you've got lots of columns you're trying to deal with.

I guess I need to read up on pivot tables.  At first glance, this looks like the best solution.

-Aram


Re: Transposing rows and columns

От
John R Pierce
Дата:
On 09/16/10 10:44 AM, Aram Fingal wrote:
> I have thought about that but later on, when we do the full sized experiments, there will be too many rows for Excel
tohandle. 

if you insist on this transposing, won't that mean you'll end up with
more columns than SQL can/should handle?



Re: Transposing rows and columns

От
Aram Fingal
Дата:

On Sep 16, 2010, at 4:37 PM, John R Pierce wrote:

On 09/16/10 10:44 AM, Aram Fingal wrote:
I have thought about that but later on, when we do the full sized experiments, there will be too many rows for Excel to handle.

if you insist on this transposing, won't that mean you'll end up with more columns than SQL can/should handle?

No.  The organization in Excel is much more efficient of the total number of cells used but not much good for querying.  When I transpose it for use in the database (or pivot it in Excel), it actually multiplies the number of rows.  So, if the version with separate columns for each subject has X rows and Y columns, you get X * Y rows in the database version.  For example, If there are 100 subjects, and 1000 drug/dose combinations.  Then the Excel version has 102 columns (drug, dose and a column for each subject) and 1000 rows.  The database (or pivoted) version would have 4 columns (subject, drug, dose and response) and 100,000 rows.  Excel maxes out at 65,535 rows and PostgreSQL has no limit.  

The subjects, by the way, are not people, they are cancer cell tissue cultures in 384-well plates, handled by robots.  That's how we can do so many drug/dose combinations.  We'll do even more in the future.

-Aram

Re: Transposing rows and columns

От
Sam Mason
Дата:
On Thu, Sep 16, 2010 at 01:44:30PM -0400, Aram Fingal wrote:
> On Sep 16, 2010, at 12:28 PM, Sam Mason wrote:
> > If you want to do the transformation in SQL, you'd be writing something
> > like:
> >
> >  SELECT drug, dose
> >    MIN(CASE subject WHEN 1 THEN response END) AS resp_1,
> >    MIN(CASE subject WHEN 2 THEN response END) AS resp_2,
> >    MIN(CASE subject WHEN 3 THEN response END) AS resp_3
> >  FROM results
> >  WHERE expt_no = 1
> >    AND subject IN (1,2,3)
> >  GROUP BY drug, dose
> >  ORDER BY drug, dose;
>
> That's a good trick to know but I just tried it and found that it
> begins to get complicated with the actual data.  It also returns
> a separate row for each drug/dose/subject combination which isn't
> exactly what I want.  Each row has one column with a value and the
> rest of the columns in that row are all <null>.

It shouldn't give a row per subject as the subject isn't in the GROUP
BY list.  Either that or you've got rounding problems that means that
the dose or drug combination means that they're effectively unique per
subject.  You need to find the set of columns that gives you the "right"
number of rows, the same issue applies to pivot tables in Excel.

> > Or you can use the tablefunc contrib module as suggested by Uwe.  I
> > prefer doing it by hand as you get more options, but it can be quite
> > tedious if you've got lots of columns you're trying to deal with.
>
> I guess I need to read up on pivot tables.  At first glance, this
> looks like the best solution.

Maybe pull them out by experiment?

--
  Sam  http://samason.me.uk/

Re: Transposing rows and columns

От
Steve Clark
Дата:
On 09/16/2010 05:26 PM, Aram Fingal wrote:
>
> On Sep 16, 2010, at 4:37 PM, John R Pierce wrote:
>
>> On 09/16/10 10:44 AM, Aram Fingal wrote:
>>> I have thought about that but later on, when we do the full sized
>>> experiments, there will be too many rows for Excel to handle.
>>
>> if you insist on this transposing, won't that mean you'll end up with
>> more columns than SQL can/should handle?
>
> No. The organization in Excel is much more efficient of the total number
> of cells used but not much good for querying. When I transpose it for
> use in the database (or pivot it in Excel), it actually multiplies the
> number of rows. So, if the version with separate columns for each
> subject has X rows and Y columns, you get X * Y rows in the database
> version. For example, If there are 100 subjects, and 1000 drug/dose
> combinations. Then the Excel version has 102 columns (drug, dose and a
> column for each subject) and 1000 rows. The database (or pivoted)
> version would have 4 columns (subject, drug, dose and response) and
> 100,000 rows. Excel maxes out at 65,535 rows and PostgreSQL has no limit.
I think excel 2007 can handle more than 65,535 rows.
>
> The subjects, by the way, are not people, they are cancer cell tissue
> cultures in 384-well plates, handled by robots. That's how we can do so
> many drug/dose combinations. We'll do even more in the future.
>
> -Aram


--
Stephen Clark
NetWolves
Sr. Software Engineer III
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
www.netwolves.com

Re: Transposing rows and columns

От
Aram Fingal
Дата:
On Sep 17, 2010, at 9:00 AM, Steve Clark wrote:

> I think excel 2007 can handle more than 65,535 rows.

You may be right.  I'm actually using NeoOffice (Mac enhanced version of OpenOffice) and that can handle something like
1,048,000rows.    I wouldn't be surprised if newer versions of Excel can do the same.  The real issue is querying. 

-Aram