Обсуждение: simultaneously reducing both memory usage and runtime for a query

Поиск
Список
Период
Сортировка

simultaneously reducing both memory usage and runtime for a query

От
Faheem Mitha
Дата:
Hi everyone,

I've been trying to reduce both memory usage and runtime for a query. An
issue I have encountered is that I can drastically reduce both runtime and
memory usage by splitting up the query into two pieces and gluing them
together outside PostgreSQL. However, I'm unable to get similar results
inside PostgreSQL. I have tried different variations of the same query,
but either the memory or the runtime blows up. I wonder if I'm missing
something. In any case, feedback would be helpful. Details of my attempts
at optimization are at

http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf

See particularly Section 1 - Background and Discussion.

If you want a text version, see

http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex

For background see

http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version
http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and
http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf

Please CC any replies to me at the above email address. Thanks.

                                                      Regards, Faheem.

Re: simultaneously reducing both memory usage and runtime for a query

От
Tom Lane
Дата:
Faheem Mitha <faheem@email.unc.edu> writes:
> ... In any case, feedback would be helpful. Details of my attempts
> at optimization are at
> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf

By and large, this is not the way to ask for help on the Postgres lists.
If you're supplying extremely large test data or something, it's fair to
provide a link instead of putting the information in-line, but otherwise
you should try to make your email self-contained.  Those of us who are
willing to help are not here just to help you --- we'd like other people
to learn from it too, both at the time and later from the archived
discussion.  So the information needs to be in the email thread, not
only on some transient web page.

            regards, tom lane

Re: simultaneously reducing both memory usage and runtime for a query

От
Faheem Mitha
Дата:

On Sun, 28 Mar 2010, Tom Lane wrote:

> Faheem Mitha <faheem@email.unc.edu> writes:
>> ... In any case, feedback would be helpful. Details of my attempts
>> at optimization are at
>> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf
>
> By and large, this is not the way to ask for help on the Postgres lists.
> If you're supplying extremely large test data or something, it's fair to
> provide a link instead of putting the information in-line, but otherwise
> you should try to make your email self-contained.  Those of us who are
> willing to help are not here just to help you --- we'd like other people
> to learn from it too, both at the time and later from the archived
> discussion.  So the information needs to be in the email thread, not
> only on some transient web page.

Dear Tom Lane,

Thank you for your message. I understand your point of view. However, the
three documents I cite in my previous message have significant graphical
content, eg. time-memory graphs. At least two of them are relevant, the
third tangentially so. Also, these two files (opt and diag) are 25 and 41
pages respectively in their pdf version. I don't think it would be very
easy to cram these two into an email.

So, in light of that, would it be permissible to attach these documents to
my email, since you want a self-contained email? If so, I hope I do not
fall afoul of spam filters. The two documents are 3.3M (opt.pdf) and 132K
(diag.pdf).

                                                   Sincerely, Faheem Mitha.

Re: simultaneously reducing both memory usage and runtime for a query

От
Andy Colson
Дата:
On 03/28/2010 03:05 PM, Faheem Mitha wrote:
>
>
> On Sun, 28 Mar 2010, Tom Lane wrote:
>
>> Faheem Mitha <faheem@email.unc.edu> writes:
>>> ... In any case, feedback would be helpful. Details of my attempts
>>> at optimization are at
>>> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf
>>
>> By and large, this is not the way to ask for help on the Postgres lists.
>> If you're supplying extremely large test data or something, it's fair to
>> provide a link instead of putting the information in-line, but otherwise
>> you should try to make your email self-contained. Those of us who are
>> willing to help are not here just to help you --- we'd like other people
>> to learn from it too, both at the time and later from the archived
>> discussion. So the information needs to be in the email thread, not
>> only on some transient web page.
>
> Dear Tom Lane,
>
> Thank you for your message. I understand your point of view. However,
> the three documents I cite in my previous message have significant
> graphical content, eg. time-memory graphs. At least two of them are
> relevant, the third tangentially so. Also, these two files (opt and
> diag) are 25 and 41 pages respectively in their pdf version. I don't
> think it would be very easy to cram these two into an email.
>
> So, in light of that, would it be permissible to attach these documents
> to my email, since you want a self-contained email? If so, I hope I do
> not fall afoul of spam filters. The two documents are 3.3M (opt.pdf) and
> 132K (diag.pdf).
>
> Sincerely, Faheem Mitha.
>

Wait... Your saying your question is so complex it needs 41 pages (including graphs) to ask?  I didn't bother before,
butnow I'm curious, I'll have to go take a look. 

-Andy


Re: simultaneously reducing both memory usage and runtime for a query

От
Andy Colson
Дата:
On 03/28/2010 07:43 PM, Andy Colson wrote:
> On 03/28/2010 03:05 PM, Faheem Mitha wrote:
>>
>>
>
> Wait... Your saying your question is so complex it needs 41 pages
> (including graphs) to ask? I didn't bother before, but now I'm curious,
> I'll have to go take a look.
>
> -Andy

Faheem, you seem to be incredibly detail oriented.  We probably on need 10% of whats in diag.pdf:

16 core, 64 gig ram, 6 drives on 3ware 9690SA-8I card in RAID 10, with slow read (280mb/sec) and write (40mb/sec).

Running 64 bit Debian lenny with postgresql 8.4.

config settings:

shared_buffers = 2GB
work_mem = 1GB
maintenance_work_mem = 8GB
wal_buffers = 16MB
checkpoint_segments = 50
effective_cache_size = 50GB


There is sql script to create the tables n'stuff.

A Few counts:

affy6_faheem=# select count(*) from anno;
count
--------
932979
(1 row)

affy6_faheem=# select count(*) from geno;
count
-----------
825733782
(1 row)



The rest is irc chat about getting the data imported into PG, and other than slowness problems, does not seem relevant
tothe sql in opt.pdf. 


As for opt.pdf, I dont think, again, we need all that detail.  And the important parts got cut off.  The explain
analyzeoutput is needed, but its cut off. 

I'd recommend you paste the output here:

http://explain.depesz.com/

And give us links.  The explain analyze will have counts and info that we (ok not me, but Tom and others) can use to
helpyou. 

You also seem to have gone through several revisions of the sql (I admit, I just skimmed the pdf's), it would be great
ifyou could drop the ones you are sure are not useful, and we concentrate on just one or two. 


-Andy

Re: simultaneously reducing both memory usage and runtime for a query

От
Faheem Mitha
Дата:

On Sun, 28 Mar 2010, Andy Colson wrote:

> Wait... Your saying your question is so complex it needs 41 pages
> (including graphs) to ask?  I didn't bother before, but now I'm curious,
> I'll have to go take a look.

No, only the 25 page document (which has the graphs) is directly related
to the question. It is different variants on a query (well, two queries,
but they are very similar), along with EXPLAIN ANALYZE VERBOSE and
time-memory graphs.

The 41 page document is just background, but relevant background. It
contains information about the schema, tables, hardware, pg config info.
It also has transcripts of a couple of old IRC sessions, which are
increasingly less relevant, so I may remove that.

Since I've been told by a couple of different people that the way I asked
my question was not ideal, I'll try reposting again with a modified
version (but still containing the same information) in a bit.

                                                           Regards, Faheem.

Re: simultaneously reducing both memory usage and runtime for a query

От
Faheem Mitha
Дата:
Hi again Andy,

On Sun, 28 Mar 2010 20:59:24 -0500, Andy Colson <andy@squeakycode.net> wrote:
> On 03/28/2010 07:43 PM, Andy Colson wrote:
>> On 03/28/2010 03:05 PM, Faheem Mitha wrote:
>>>
>>>
>>
>> Wait... Your saying your question is so complex it needs 41 pages
>> (including graphs) to ask? I didn't bother before, but now I'm curious,
>> I'll have to go take a look.
>>
>> -Andy

> Faheem, you seem to be incredibly detail oriented.  We probably on
> need 10% of whats in diag.pdf:

Most of this is stuff I've been asked on #postgresql. Admittedly, in
different contexts than this query.

[snippage]

> The rest is irc chat about getting the data imported into PG, and
>  other than slowness problems, does not seem relevant to the sql in
>  opt.pdf.

True, the IRC sessions should go. I'll take them out.

> As for opt.pdf, I dont think, again, we need all that detail.  And
>  the important parts got cut off.  The explain analyze output is
>  needed, but its cut off.

> I'd recommend you paste the output here:
>
> http://explain.depesz.com/

> And give us links.  The explain analyze will have counts and info
> that we (ok not me, but Tom and others) can use to help you.

That's one way to go. I was going to paste the entirety of opt.tex
into an email. That would include all the EXPLAIN ANALYZE STUFF, but
not the graphs, and thus would be relatively self-contained. For the
graphs you'd have to look at a pdf (unless a ps.gz is preferred).

> You also seem to have gone through several revisions of the sql (I
> admit, I just skimmed the pdf's), it would be great if you could
> drop the ones you are sure are not useful, and we concentrate on
> just one or two.

Ok, I'll trim it down a bit. At least the initial queries in both
sections are not relevant. Thanks for the feedback.

                                                   Regards,Faheem.

Re: simultaneously reducing both memory usage and runtime for a query

От
Faheem Mitha
Дата:

On Sun, 28 Mar 2010, Tom Lane wrote:

> Faheem Mitha <faheem@email.unc.edu> writes:
>> ... In any case, feedback would be helpful. Details of my attempts
>> at optimization are at
>> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf
>
> By and large, this is not the way to ask for help on the Postgres lists.
> If you're supplying extremely large test data or something, it's fair to
> provide a link instead of putting the information in-line, but otherwise
> you should try to make your email self-contained.  Those of us who are
> willing to help are not here just to help you --- we'd like other people
> to learn from it too, both at the time and later from the archived
> discussion.  So the information needs to be in the email thread, not
> only on some transient web page.

I submitted a modified self-contained email as requested, but it does not
appear to have made it to the list, and I never got any kind of reject
message. It was around 1000 lines with no attachments. Is there a size
limit for posts, and if so, what is it? If it was rejected due to some
filter, it would be desirable (and polite) if the recepient was told what
happened.

                                                           Regards, Faheem.

Re: simultaneously reducing both memory usage and runtime for a query

От
Bruce Momjian
Дата:
Faheem Mitha wrote:
>
>
> On Sun, 28 Mar 2010, Tom Lane wrote:
>
> > Faheem Mitha <faheem@email.unc.edu> writes:
> >> ... In any case, feedback would be helpful. Details of my attempts
> >> at optimization are at
> >> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf
> >
> > By and large, this is not the way to ask for help on the Postgres lists.
> > If you're supplying extremely large test data or something, it's fair to
> > provide a link instead of putting the information in-line, but otherwise
> > you should try to make your email self-contained.  Those of us who are
> > willing to help are not here just to help you --- we'd like other people
> > to learn from it too, both at the time and later from the archived
> > discussion.  So the information needs to be in the email thread, not
> > only on some transient web page.
>
> I submitted a modified self-contained email as requested, but it does not
> appear to have made it to the list, and I never got any kind of reject
> message. It was around 1000 lines with no attachments. Is there a size
> limit for posts, and if so, what is it? If it was rejected due to some
> filter, it would be desirable (and polite) if the recepient was told what
> happened.

The bottom line is that this email list is best at answering short,
detailed questions.  Larger questions that require a lot of analysis by
readers are often ignored.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

Re: simultaneously reducing both memory usage and runtime for a query

От
Faheem Mitha
Дата:

On Wed, 31 Mar 2010, Bruce Momjian wrote:

> Faheem Mitha wrote:

>> I submitted a modified self-contained email as requested, but it does not
>> appear to have made it to the list, and I never got any kind of reject
>> message. It was around 1000 lines with no attachments. Is there a size
>> limit for posts, and if so, what is it? If it was rejected due to some
>> filter, it would be desirable (and polite) if the recepient was told what
>> happened.
>
> The bottom line is that this email list is best at answering short,
> detailed questions.  Larger questions that require a lot of analysis by
> readers are often ignored.

The question is not really that complex and I (perhaps mistakenly) thought
it would be of interest to the PostgreSQL community. Maybe I'll try
resubmitting a part of the question.

Regardless, you don't address why the email I sent disappeared into the
ether. If it was rejected due to excessive length, then it would be polite
to get an automated rejection. Perhaps something like

"Your message is too long and so is being rejected. Messages to the ...
mailing list need to be under x lines"

                                                     Regards, Faheem.

Re: simultaneously reducing both memory usage and runtime for a query

От
Alvaro Herrera
Дата:
Faheem Mitha wrote:

> Regardless, you don't address why the email I sent disappeared into
> the ether. If it was rejected due to excessive length, then it would
> be polite to get an automated rejection. Perhaps something like
>
> "Your message is too long and so is being rejected. Messages to the
> ... mailing list need to be under x lines"

Our mailing lists are set up with a spam filter that's very happy to
drop stuff on the floor without warning at first suspicion.  It's not an
easy thing to fix for reasons that are outside the scope of this mailing
list.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support