Обсуждение: Query results caching?

Поиск
Список
Период
Сортировка

Query results caching?

От
Ben-Nes Yonatan
Дата:
Hi all,

I dont know how its called but I noticed that when I query the db for
the first time it give me the result slower then the next times ill
repeat the same exact query, I figure that its some kind of caching so
henceforth the title of the mail :)

Anyway I would want to be able to delete that "caching" after every
query test that I run, cause I want to see the real time results for my
queries (its for a searching option for users so it will vary alot).

Is it possible to do it manually each time or maybe only from the
configuration?

Thanks in advance,
Ben-Nes Yonatan
Canaan Surfing ltd.
http://www.canaan.net.il

Re: Query results caching?

От
"Dann Corbit"
Дата:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Ben-Nes Yonatan
> Sent: Monday, August 22, 2005 9:03 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Query results caching?
>
> Hi all,
>
> I dont know how its called but I noticed that when I query the db for
> the first time it give me the result slower then the next times ill
> repeat the same exact query, I figure that its some kind of caching so
> henceforth the title of the mail :)

The operating system and the database will both percolate frequently
used information from disk into memory.  Particularly if they are SELECT
queries, they will get faster and faster.

> Anyway I would want to be able to delete that "caching" after every
> query test that I run, cause I want to see the real time results for
my
> queries (its for a searching option for users so it will vary alot).

Those are the real times for your queries.

> Is it possible to do it manually each time or maybe only from the
> configuration?

You will have to query a different table each time.

> Thanks in advance,
> Ben-Nes Yonatan
> Canaan Surfing ltd.
> http://www.canaan.net.il
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: Query results caching?

От
Sean Davis
Дата:
On 8/22/05 1:59 PM, "Dann Corbit" <DCorbit@connx.com> wrote:

>
>
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
>> owner@postgresql.org] On Behalf Of Ben-Nes Yonatan
>> Sent: Monday, August 22, 2005 9:03 AM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Query results caching?
>>
>> Hi all,
>>
>> I dont know how its called but I noticed that when I query the db for
>> the first time it give me the result slower then the next times ill
>> repeat the same exact query, I figure that its some kind of caching so
>> henceforth the title of the mail :)
>
> The operating system and the database will both percolate frequently
> used information from disk into memory.  Particularly if they are SELECT
> queries, they will get faster and faster.
>
>> Anyway I would want to be able to delete that "caching" after every
>> query test that I run, cause I want to see the real time results for
> my
>> queries (its for a searching option for users so it will vary alot).
>
> Those are the real times for your queries.
>
>> Is it possible to do it manually each time or maybe only from the
>> configuration?
>
> You will have to query a different table each time.

Just to extend this notion a bit, if you want to test your application
speed, you may want to generate "real-world" input to determine the actual
behavior/speed under real conditions.  As Dann pointed out, the results for
timings are "real" in that if the user generated the queries as you did, the
timing results would be (nearly) the same as for you.  It seems that your
concern is that the user will not generate the same type of input that you
did (that it will vary more), so the best solution may be to actually
generate some test queries that actually conform to what you think the user
input will look like.

Sean


Re: Query results caching?

От
Ben-Nes Yonatan
Дата:
Sean Davis wrote:
> On 8/22/05 1:59 PM, "Dann Corbit" <DCorbit@connx.com> wrote:
>
>
>>
>>>-----Original Message-----
>>>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
>>>owner@postgresql.org] On Behalf Of Ben-Nes Yonatan
>>>Sent: Monday, August 22, 2005 9:03 AM
>>>To: pgsql-general@postgresql.org
>>>Subject: [GENERAL] Query results caching?
>>>
>>>Hi all,
>>>
>>>I dont know how its called but I noticed that when I query the db for
>>>the first time it give me the result slower then the next times ill
>>>repeat the same exact query, I figure that its some kind of caching so
>>>henceforth the title of the mail :)
>>
>>The operating system and the database will both percolate frequently
>>used information from disk into memory.  Particularly if they are SELECT
>>queries, they will get faster and faster.
>>
>>
>>>Anyway I would want to be able to delete that "caching" after every
>>>query test that I run, cause I want to see the real time results for
>>
>>my
>>
>>>queries (its for a searching option for users so it will vary alot).
>>
>>Those are the real times for your queries.
>>
>>
>>>Is it possible to do it manually each time or maybe only from the
>>>configuration?
>>
>>You will have to query a different table each time.
>
>
> Just to extend this notion a bit, if you want to test your application
> speed, you may want to generate "real-world" input to determine the actual
> behavior/speed under real conditions.  As Dann pointed out, the results for
> timings are "real" in that if the user generated the queries as you did, the
> timing results would be (nearly) the same as for you.  It seems that your
> concern is that the user will not generate the same type of input that you
> did (that it will vary more), so the best solution may be to actually
> generate some test queries that actually conform to what you think the user
> input will look like.
>
> Sean
>
I think that I was misunderstood, Ill make an example:
Lets say that im making the following query for the first time on the
"motorcycles" table which got an index on the "manufacturer" field:

EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
manufacturer='suzuki';
... Total runtime: 3139.587 ms

Now im doing the same query again and i get a much faster result (cause
of the "caching"): Total runtime: 332.53 ms

After both of those queries I drop the index and query the table again
with the exact same query as before and now I receive: Total runtime:
216834.871 ms

And for my last check I run the exact same query again (without creating
the INDEX back again) and I get quite similar result to my third query:
Total runtime: 209218.01 ms


My problem is that (maybe I just dont understand something basic
here...) the last 2 (also the second query but I dont care about that)
queries were using the "cache" that was created after the first query
(which had an INDEX) so none of them actually showed me what will happen
if a client will do such a search (without an INDEX) for the first time.

I want to delete that "caching" after I do the first 2 queries so my
next queries will show me "real life results".

Thanks alot again,
Yonatan

Re: Query results caching?

От
"Jim C. Nasby"
Дата:
On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
> I think that I was misunderstood, Ill make an example:
> Lets say that im making the following query for the first time on the
> "motorcycles" table which got an index on the "manufacturer" field:
>
> EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> manufacturer='suzuki';
> ... Total runtime: 3139.587 ms
>
> Now im doing the same query again and i get a much faster result (cause
> of the "caching"): Total runtime: 332.53 ms
>
> After both of those queries I drop the index and query the table again
> with the exact same query as before and now I receive: Total runtime:
> 216834.871 ms
>
> And for my last check I run the exact same query again (without creating
> the INDEX back again) and I get quite similar result to my third query:
> Total runtime: 209218.01 ms
>
>
> My problem is that (maybe I just dont understand something basic
> here...) the last 2 (also the second query but I dont care about that)
> queries were using the "cache" that was created after the first query
> (which had an INDEX) so none of them actually showed me what will happen
> if a client will do such a search (without an INDEX) for the first time.
>
> I want to delete that "caching" after I do the first 2 queries so my
> next queries will show me "real life results".

Emptying the cache will not show real-life results. You are always going
to have some stuff cached, even if you get a query for something new. In
this case (since you'll obviously want those indexes there), after some
amount of time you will have most (if not all) of the non-leaf index
pages cached, since they take a fairly small amount of memory and are
frequently accessed. This makes index traversal *much* faster than your
initial case shows, even if you query on something different each time.
Testing with a completely empty cache just isn't that realistic.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software        http://pervasive.com        512-569-9461

Re: Query results caching?

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: Ben-Nes Yonatan [mailto:da@canaan.co.il]
> Sent: Monday, August 22, 2005 1:14 PM
> To: Sean Davis; Dann Corbit
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query results caching?
>
> Sean Davis wrote:
> > On 8/22/05 1:59 PM, "Dann Corbit" <DCorbit@connx.com> wrote:
> >
> >
> >>
> >>>-----Original Message-----
> >>>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> >>>owner@postgresql.org] On Behalf Of Ben-Nes Yonatan
> >>>Sent: Monday, August 22, 2005 9:03 AM
> >>>To: pgsql-general@postgresql.org
> >>>Subject: [GENERAL] Query results caching?
> >>>
> >>>Hi all,
> >>>
> >>>I dont know how its called but I noticed that when I query the db
for
> >>>the first time it give me the result slower then the next times ill
> >>>repeat the same exact query, I figure that its some kind of caching
so
> >>>henceforth the title of the mail :)
> >>
> >>The operating system and the database will both percolate frequently
> >>used information from disk into memory.  Particularly if they are
SELECT
> >>queries, they will get faster and faster.
> >>
> >>
> >>>Anyway I would want to be able to delete that "caching" after every
> >>>query test that I run, cause I want to see the real time results
for
> >>
> >>my
> >>
> >>>queries (its for a searching option for users so it will vary
alot).
> >>
> >>Those are the real times for your queries.
> >>
> >>
> >>>Is it possible to do it manually each time or maybe only from the
> >>>configuration?
> >>
> >>You will have to query a different table each time.
> >
> >
> > Just to extend this notion a bit, if you want to test your
application
> > speed, you may want to generate "real-world" input to determine the
> actual
> > behavior/speed under real conditions.  As Dann pointed out, the
results
> for
> > timings are "real" in that if the user generated the queries as you
did,
> the
> > timing results would be (nearly) the same as for you.  It seems that
> your
> > concern is that the user will not generate the same type of input
that
> you
> > did (that it will vary more), so the best solution may be to
actually
> > generate some test queries that actually conform to what you think
the
> user
> > input will look like.
> >
> > Sean
> >
> I think that I was misunderstood, Ill make an example:
> Lets say that im making the following query for the first time on the
> "motorcycles" table which got an index on the "manufacturer" field:
>
> EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> manufacturer='suzuki';
> ... Total runtime: 3139.587 ms
>
> Now im doing the same query again and i get a much faster result
(cause
> of the "caching"): Total runtime: 332.53 ms
>
> After both of those queries I drop the index and query the table again
> with the exact same query as before and now I receive: Total runtime:
> 216834.871 ms
>
> And for my last check I run the exact same query again (without
creating
> the INDEX back again) and I get quite similar result to my third
query:
> Total runtime: 209218.01 ms

These results are all what I would expect.  When you delete the index,
the query will be forced to do a table scan (to examine every single
record in the table one by one).  If the table is non-trivial it is
unlikely that either the OS or the database will cache the whole thing
in memory.  However, when you query a small record set, then it is
likely to be retained in RAM which is literally thousands of times
faster than disk.

> My problem is that (maybe I just dont understand something basic
> here...) the last 2 (also the second query but I dont care about that)
> queries were using the "cache" that was created after the first query
> (which had an INDEX) so none of them actually showed me what will
happen
> if a client will do such a search (without an INDEX) for the first
time.

If a search is to be made on a frequent basis, you should create an
index.
The query results above show you why.

> I want to delete that "caching" after I do the first 2 queries so my
> next queries will show me "real life results".

Think about this for a minute.  The real life results you want are very
fast results.  For that reason, you should try to model the customer
queries as nearly as possible.  If you have a canned application like
order entry, then the real parameterized query set will probably be
quite small in real life.  If you are creating a server for ad-hoc
queries then it will be far more difficult to model in real life.

What is the real purpose of the application that you are writing?

Will users be using a pre-programmed front end, or will they be typing
in queries free-form for whatever their heart desires?


> Thanks alot again,
> Yonatan

Re: Query results caching?

От
"A. Kretschmer"
Дата:
am  22.08.2005, um 22:13:49 +0200 mailte Ben-Nes Yonatan folgendes:
> I think that I was misunderstood, Ill make an example:

Okay:

> Lets say that im making the following query for the first time on the
> "motorcycles" table which got an index on the "manufacturer" field:
>
> EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> manufacturer='suzuki';
> .. Total runtime: 3139.587 ms

neither the DB nor the OS has the the table and index in the cache.

>
> Now im doing the same query again and i get a much faster result (cause of
> the "caching"): Total runtime: 332.53 ms

OS and DN has now the table and index in the cache.


>
> After both of those queries I drop the index and query the table again with
> the exact same query as before and now I receive: Total runtime: 216834.871
> ms

Without index -> DB make a seq-scan. Very slow, of cource.

>
> And for my last check I run the exact same query again (without creating
> the INDEX back again) and I get quite similar result to my third query:
> Total runtime: 209218.01 ms

Never mind. The table is too big for the cache.


> My problem is that (maybe I just dont understand something basic here...)
> the last 2 (also the second query but I dont care about that) queries were
> using the "cache" that was created after the first query (which had an
> INDEX) so none of them actually showed me what will happen if a client will
> do such a search (without an INDEX) for the first time.
>
> I want to delete that "caching" after I do the first 2 queries so my next
> queries will show me "real life results".

No problem: demount all RAM and send this to me ;-)


Regards, Andreas
--
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47212,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===    Schollglas Unternehmensgruppe    ===

Re: Query results caching?

От
Ben-Nes Yonatan
Дата:
> am  22.08.2005, um 22:13:49 +0200 mailte Ben-Nes Yonatan folgendes:
>
>>I think that I was misunderstood, Ill make an example:
>
>
> Okay:
>
>
>>Lets say that im making the following query for the first time on the
>>"motorcycles" table which got an index on the "manufacturer" field:
>>
>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
>>manufacturer='suzuki';
>>.. Total runtime: 3139.587 ms
>
>
> neither the DB nor the OS has the the table and index in the cache.
>
>
>>Now im doing the same query again and i get a much faster result (cause of
>>the "caching"): Total runtime: 332.53 ms
>
>
> OS and DN has now the table and index in the cache.
>
>
>
>>After both of those queries I drop the index and query the table again with
>>the exact same query as before and now I receive: Total runtime: 216834.871
>>ms
>
>
> Without index -> DB make a seq-scan. Very slow, of cource.
>
>
>>And for my last check I run the exact same query again (without creating
>>the INDEX back again) and I get quite similar result to my third query:
>>Total runtime: 209218.01 ms
>
>
> Never mind. The table is too big for the cache.
>
>
>
>>My problem is that (maybe I just dont understand something basic here...)
>>the last 2 (also the second query but I dont care about that) queries were
>>using the "cache" that was created after the first query (which had an
>>INDEX) so none of them actually showed me what will happen if a client will
>>do such a search (without an INDEX) for the first time.
>>
>>I want to delete that "caching" after I do the first 2 queries so my next
>>queries will show me "real life results".
>
>
> No problem: demount all RAM and send this to me ;-)
>
>
> Regards, Andreas

heheheh sure to which address should I send it? :P

Re: Query results caching?

От
Ben-Nes Yonatan
Дата:
> On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
>
>>I think that I was misunderstood, Ill make an example:
>>Lets say that im making the following query for the first time on the
>>"motorcycles" table which got an index on the "manufacturer" field:
>>
>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
>>manufacturer='suzuki';
>>... Total runtime: 3139.587 ms
>>
>>Now im doing the same query again and i get a much faster result (cause
>>of the "caching"): Total runtime: 332.53 ms
>>
>>After both of those queries I drop the index and query the table again
>>with the exact same query as before and now I receive: Total runtime:
>>216834.871 ms
>>
>>And for my last check I run the exact same query again (without creating
>>the INDEX back again) and I get quite similar result to my third query:
>>Total runtime: 209218.01 ms
>>
>>
>>My problem is that (maybe I just dont understand something basic
>>here...) the last 2 (also the second query but I dont care about that)
>>queries were using the "cache" that was created after the first query
>>(which had an INDEX) so none of them actually showed me what will happen
>>if a client will do such a search (without an INDEX) for the first time.
>>
>>I want to delete that "caching" after I do the first 2 queries so my
>>next queries will show me "real life results".
>
>

Ok I tried to handle both of your replies cause I got them at 2 seperate
emails.

Dann Corbit wrote:
 > These results are all what I would expect.  When you delete the index,
 > the query will be forced to do a table scan (to examine every single
 > record in the table one by one).  If the table is non-trivial it is
 > unlikely that either the OS or the database will cache the whole thing
 > in memory.  However, when you query a small record set, then it is
 > likely to be retained in RAM which is literally thousands of times
 > faster than disk.

Didnt know that, good to know though doesnt assure me...
What if I drop the INDEX but create a diffrent INDEX which also make the
process alot faster then without an INDEX but slower/faster then the one
before, will it wont use the former "caching"?

 > If a search is to be made on a frequent basis, you should create an
 > index.
 > The query results above show you why.

Obvious :)

 > Think about this for a minute.  The real life results you want are
 > very fast results.  For that reason, you should try to model the
 > customer queries as nearly as possible.  If you have a canned
 > application like order entry, then the real parameterized query set
 > will probably be quite small in real life.  If you are creating a
 > server for ad-hoc queries then it will be far more difficult to model
 > in real life.
 >
 > What is the real purpose of the application that you are writing?
 >
 > Will users be using a pre-programmed front end, or will they be typing
 > in queries free-form for whatever their heart desires?

Ok ill try to describe the system as short & precise as possible (its
also passed midnight here :)).
Each day I receive about 4 million rows of data (products) which I
insert into table1 (after I delete all of the previous data it had),
along it I receive for every row about another 15 keywords which I
insert into table2 (where as in table1 I delete all of the previous data
it had also), this process is a fact that I cant change.
Now the users of the site can search for data from table1 by typing
whichever (and up to 4) words as they want at a text field (search input
string) and the server should display the correct results by querying
table1 & join table2 for its keywords.
I succeded to do it quite fast but when I tried to ORDER BY my results
its times jumped up drastically (2-3 seconds for a query... and thats
after the caching..).
I can't allow a situation where a user will search with a keyword which
wasnt 'cached' before and because of that he will wait 15 seconds for a
result.

Jim C. Nasby wrote:
> Emptying the cache will not show real-life results. You are always going
> to have some stuff cached, even if you get a query for something new. In
> this case (since you'll obviously want those indexes there), after some
> amount of time you will have most (if not all) of the non-leaf index
> pages cached, since they take a fairly small amount of memory and are
> frequently accessed. This makes index traversal *much* faster than your
> initial case shows, even if you query on something different each time.
> Testing with a completely empty cache just isn't that realistic.

As far as I understand it at my situation where all of the data is
deleted and inserted each day from the start (INDEX will get lost with
it..) & the endless variety of possible keywords search's & the immense
size of the tables, the following reason wont last.. or am I wrong here?



Because of all of that I want to be able to see how much time a query
takes when its the first time its being run..... or I'm wrong again and
failing to understand something?

Again everyone THANKS ALOT its really amazing the help that I receive
from you!
Ben-Nes Yonatan

Re: Query results caching?

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: Ben-Nes Yonatan [mailto:da@canaan.co.il]
> Sent: Monday, August 22, 2005 3:28 PM
> To: Jim C. Nasby; Sean Davis; Dann Corbit
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query results caching?
>
> > On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
> >
> >>I think that I was misunderstood, Ill make an example:
> >>Lets say that im making the following query for the first time on
the
> >>"motorcycles" table which got an index on the "manufacturer" field:
> >>
> >>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> >>manufacturer='suzuki';
> >>... Total runtime: 3139.587 ms
> >>
> >>Now im doing the same query again and i get a much faster result
(cause
> >>of the "caching"): Total runtime: 332.53 ms
> >>
> >>After both of those queries I drop the index and query the table
again
> >>with the exact same query as before and now I receive: Total
runtime:
> >>216834.871 ms
> >>
> >>And for my last check I run the exact same query again (without
creating
> >>the INDEX back again) and I get quite similar result to my third
query:
> >>Total runtime: 209218.01 ms
> >>
> >>
> >>My problem is that (maybe I just dont understand something basic
> >>here...) the last 2 (also the second query but I dont care about
that)
> >>queries were using the "cache" that was created after the first
query
> >>(which had an INDEX) so none of them actually showed me what will
happen
> >>if a client will do such a search (without an INDEX) for the first
time.
> >>
> >>I want to delete that "caching" after I do the first 2 queries so my
> >>next queries will show me "real life results".
> >
> >
>
> Ok I tried to handle both of your replies cause I got them at 2
seperate
> emails.
>
> Dann Corbit wrote:
>  > These results are all what I would expect.  When you delete the
index,
>  > the query will be forced to do a table scan (to examine every
single
>  > record in the table one by one).  If the table is non-trivial it is
>  > unlikely that either the OS or the database will cache the whole
thing
>  > in memory.  However, when you query a small record set, then it is
>  > likely to be retained in RAM which is literally thousands of times
>  > faster than disk.
>
> Didnt know that, good to know though doesnt assure me...
> What if I drop the INDEX but create a diffrent INDEX which also make
the
> process alot faster then without an INDEX but slower/faster then the
one
> before, will it wont use the former "caching"?

You can add several indexes to a single table.
If you do some statistics on the query patterns, you can find what
indexes are needed to make the queries as fast as possible.

>  > If a search is to be made on a frequent basis, you should create an
>  > index.
>  > The query results above show you why.
>
> Obvious :)
>
>  > Think about this for a minute.  The real life results you want are
>  > very fast results.  For that reason, you should try to model the
>  > customer queries as nearly as possible.  If you have a canned
>  > application like order entry, then the real parameterized query set
>  > will probably be quite small in real life.  If you are creating a
>  > server for ad-hoc queries then it will be far more difficult to
model
>  > in real life.
>  >
>  > What is the real purpose of the application that you are writing?
>  >
>  > Will users be using a pre-programmed front end, or will they be
typing
>  > in queries free-form for whatever their heart desires?
>
> Ok ill try to describe the system as short & precise as possible (its
> also passed midnight here :)).
> Each day I receive about 4 million rows of data (products) which I
> insert into table1 (after I delete all of the previous data it had),
> along it I receive for every row about another 15 keywords which I
> insert into table2 (where as in table1 I delete all of the previous
data
> it had also), this process is a fact that I cant change.

If the data arrives on a daily basis, and is not updated until the next
day, I suggest creating a lot of indexes, and cluster on the index used
most frequently.

What exactly are the 15 keywords in the second table for?
Are they column names?
Are they categories for the first table?
Why is the second table necessary at all?

> Now the users of the site can search for data from table1 by typing
> whichever (and up to 4) words as they want at a text field (search
input
> string) and the server should display the correct results by querying
> table1 & join table2 for its keywords.

Can you give the exact table definitions for the two tables, and also
the most likely queries you are going to receive?

When the users type in keywords -- can these keywords be applied against
any column in the table or only against a single column or against a
small set of columns or something else?

> I succeded to do it quite fast but when I tried to ORDER BY my results
> its times jumped up drastically (2-3 seconds for a query... and thats
> after the caching..).

Order by will complicate quite a bit.  I have not tried it on
PostgreSQL, but if you know the result set is small, a technique is to
select into a temp table and then order by on the temp table.  It works
well on other database systems (caveat: it has been a while since I
worked as a DBA and I have not worked as a DBA on PostgreSQL).

> I can't allow a situation where a user will search with a keyword
which
> wasnt 'cached' before and because of that he will wait 15 seconds for
a
> result.

You might try throwing hardware at it.  A 4 CPU AMD 64 machine with
Ultra 320 striped SCSI disk array and a few gigabytes of ram will
perform admirably.

You might want more than is possible.  If you have 4 million rows, and
each row is 1K, then that is 4 GB.  If your users do a query that has
not been performed yet and you have to do a table scan, then you cannot
expect some kind of sub-second response times because it won't be
physically possible on any system.

If you know what most queries may look like or if you only have a few
character columns so that you can make an index on each of them and if
you can put a unique clustered index on the most important (frequently
used) item, then you can get the majority of your queries to run very
quickly, and only on rare occasions will the query be slow.

If I have a 4 million row table, with long rows and big varchar columns
and I run a query on a column like this:

    SELECT * FROM inventory WHERE product LIKE '%Table%'

It isn't going to be fast on any system with any database.
[snip]

Re: Query results caching?

От
"Jim C. Nasby"
Дата:
On Tue, Aug 23, 2005 at 12:27:39AM +0200, Ben-Nes Yonatan wrote:
> Jim C. Nasby wrote:
> >Emptying the cache will not show real-life results. You are always going
> >to have some stuff cached, even if you get a query for something new. In
> >this case (since you'll obviously want those indexes there), after some
> >amount of time you will have most (if not all) of the non-leaf index
> >pages cached, since they take a fairly small amount of memory and are
> >frequently accessed. This makes index traversal *much* faster than your
> >initial case shows, even if you query on something different each time.
> >Testing with a completely empty cache just isn't that realistic.
>
> As far as I understand it at my situation where all of the data is
> deleted and inserted each day from the start (INDEX will get lost with
> it..) & the endless variety of possible keywords search's & the immense
> size of the tables, the following reason wont last.. or am I wrong here?

You're wrong - to an extent. Remember that while you're loading all that
data it's also being cached. Now, some of it will probably end up
falling out of the cache as all the data is read in, but you certainly
won't be starting from the clean slate that you're looking for.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software        http://pervasive.com        512-569-9461

Re: Query results caching?

От
Ben-Nes Yonatan
Дата:
Jim C. Nasby wrote:

>On Tue, Aug 23, 2005 at 12:27:39AM +0200, Ben-Nes Yonatan wrote:
>
>
>>Jim C. Nasby wrote:
>>
>>
>>>Emptying the cache will not show real-life results. You are always going
>>>to have some stuff cached, even if you get a query for something new. In
>>>this case (since you'll obviously want those indexes there), after some
>>>amount of time you will have most (if not all) of the non-leaf index
>>>pages cached, since they take a fairly small amount of memory and are
>>>frequently accessed. This makes index traversal *much* faster than your
>>>initial case shows, even if you query on something different each time.
>>>Testing with a completely empty cache just isn't that realistic.
>>>
>>>
>>As far as I understand it at my situation where all of the data is
>>deleted and inserted each day from the start (INDEX will get lost with
>>it..) & the endless variety of possible keywords search's & the immense
>>size of the tables, the following reason wont last.. or am I wrong here?
>>
>>
>
>You're wrong - to an extent. Remember that while you're loading all that
>data it's also being cached. Now, some of it will probably end up
>falling out of the cache as all the data is read in, but you certainly
>won't be starting from the clean slate that you're looking for.
>
>
Ok I guess that if all of you are telling me this over and over then it
probably got some point in it :), I guess that I'll just see it work by
time.

Thanks alot again (I really appreciate it),
Ben-Nes Yonatan
Canaan Surfing ltd.
http://www.canaan.net.il


Re: Query results caching?

От
Ben-Nes Yonatan
Дата:
Dann Corbit wrote:

>>-----Original Message-----
>>From: Ben-Nes Yonatan [mailto:da@canaan.co.il]
>>Sent: Monday, August 22, 2005 3:28 PM
>>To: Jim C. Nasby; Sean Davis; Dann Corbit
>>Cc: pgsql-general@postgresql.org
>>Subject: Re: [GENERAL] Query results caching?
>>
>>
>>
>>>On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
>>>
>>>
>>>
>>>>I think that I was misunderstood, Ill make an example:
>>>>Lets say that im making the following query for the first time on
>>>>
>>>>
>the
>
>
>>>>"motorcycles" table which got an index on the "manufacturer" field:
>>>>
>>>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
>>>>manufacturer='suzuki';
>>>>... Total runtime: 3139.587 ms
>>>>
>>>>Now im doing the same query again and i get a much faster result
>>>>
>>>>
>(cause
>
>
>>>>of the "caching"): Total runtime: 332.53 ms
>>>>
>>>>After both of those queries I drop the index and query the table
>>>>
>>>>
>again
>
>
>>>>with the exact same query as before and now I receive: Total
>>>>
>>>>
>runtime:
>
>
>>>>216834.871 ms
>>>>
>>>>And for my last check I run the exact same query again (without
>>>>
>>>>
>creating
>
>
>>>>the INDEX back again) and I get quite similar result to my third
>>>>
>>>>
>query:
>
>
>>>>Total runtime: 209218.01 ms
>>>>
>>>>
>>>>My problem is that (maybe I just dont understand something basic
>>>>here...) the last 2 (also the second query but I dont care about
>>>>
>>>>
>that)
>
>
>>>>queries were using the "cache" that was created after the first
>>>>
>>>>
>query
>
>
>>>>(which had an INDEX) so none of them actually showed me what will
>>>>
>>>>
>happen
>
>
>>>>if a client will do such a search (without an INDEX) for the first
>>>>
>>>>
>time.
>
>
>>>>I want to delete that "caching" after I do the first 2 queries so my
>>>>next queries will show me "real life results".
>>>>
>>>>
>>>
>>>
>>Ok I tried to handle both of your replies cause I got them at 2
>>
>>
>seperate
>
>
>>emails.
>>
>>Dann Corbit wrote:
>> > These results are all what I would expect.  When you delete the
>>
>>
>index,
>
>
>> > the query will be forced to do a table scan (to examine every
>>
>>
>single
>
>
>> > record in the table one by one).  If the table is non-trivial it is
>> > unlikely that either the OS or the database will cache the whole
>>
>>
>thing
>
>
>> > in memory.  However, when you query a small record set, then it is
>> > likely to be retained in RAM which is literally thousands of times
>> > faster than disk.
>>
>>Didnt know that, good to know though doesnt assure me...
>>What if I drop the INDEX but create a diffrent INDEX which also make
>>
>>
>the
>
>
>>process alot faster then without an INDEX but slower/faster then the
>>
>>
>one
>
>
>>before, will it wont use the former "caching"?
>>
>>
>
>You can add several indexes to a single table.
>If you do some statistics on the query patterns, you can find what
>indexes are needed to make the queries as fast as possible.
>
>
>
>> > If a search is to be made on a frequent basis, you should create an
>> > index.
>> > The query results above show you why.
>>
>>Obvious :)
>>
>> > Think about this for a minute.  The real life results you want are
>> > very fast results.  For that reason, you should try to model the
>> > customer queries as nearly as possible.  If you have a canned
>> > application like order entry, then the real parameterized query set
>> > will probably be quite small in real life.  If you are creating a
>> > server for ad-hoc queries then it will be far more difficult to
>>
>>
>model
>
>
>> > in real life.
>> >
>> > What is the real purpose of the application that you are writing?
>> >
>> > Will users be using a pre-programmed front end, or will they be
>>
>>
>typing
>
>
>> > in queries free-form for whatever their heart desires?
>>
>>Ok ill try to describe the system as short & precise as possible (its
>>also passed midnight here :)).
>>Each day I receive about 4 million rows of data (products) which I
>>insert into table1 (after I delete all of the previous data it had),
>>along it I receive for every row about another 15 keywords which I
>>insert into table2 (where as in table1 I delete all of the previous
>>
>>
>data
>
>
>>it had also), this process is a fact that I cant change.
>>
>>
>
>If the data arrives on a daily basis, and is not updated until the next
>day, I suggest creating a lot of indexes, and cluster on the index used
>most frequently.
>
>What exactly are the 15 keywords in the second table for?
>Are they column names?
>Are they categories for the first table?
>Why is the second table necessary at all?
>
>
Now clustering was unknown to me when I received this email from you...
THANKS!!! I created 4 replicas of my table ordered by the diffrent order
that I want to allow my users to use, yep its quite alot of GB but I
dont care about it as long its working fast, and damn its flying! less
then 100 ms and thats on a weak server which will be replaced soon!.

>
>
>>Now the users of the site can search for data from table1 by typing
>>whichever (and up to 4) words as they want at a text field (search
>>
>>
>input
>
>
>>string) and the server should display the correct results by querying
>>table1 & join table2 for its keywords.
>>
>>
>
>Can you give the exact table definitions for the two tables, and also
>the most likely queries you are going to receive?
>
>When the users type in keywords -- can these keywords be applied against
>any column in the table or only against a single column or against a
>small set of columns or something else?
>
>
Well yea its working on a single column and that column got indexed so
its flying.

>
>
>>I succeded to do it quite fast but when I tried to ORDER BY my results
>>its times jumped up drastically (2-3 seconds for a query... and thats
>>after the caching..).
>>
>>
>
>Order by will complicate quite a bit.  I have not tried it on
>PostgreSQL, but if you know the result set is small, a technique is to
>select into a temp table and then order by on the temp table.  It works
>well on other database systems (caveat: it has been a while since I
>worked as a DBA and I have not worked as a DBA on PostgreSQL).
>
>
That cant work cause it wont order all of the results by the desired
column but only the returned results so when a user will want to see
more results he can see results which were supposed to be displayed
before his previous display.

>
>
>>I can't allow a situation where a user will search with a keyword
>>
>>
>which
>
>
>>wasnt 'cached' before and because of that he will wait 15 seconds for
>>
>>
>a
>
>
>>result.
>>
>>
>
>You might try throwing hardware at it.  A 4 CPU AMD 64 machine with
>Ultra 320 striped SCSI disk array and a few gigabytes of ram will
>perform admirably.
>
>You might want more than is possible.  If you have 4 million rows, and
>each row is 1K, then that is 4 GB.  If your users do a query that has
>not been performed yet and you have to do a table scan, then you cannot
>expect some kind of sub-second response times because it won't be
>physically possible on any system.
>
>If you know what most queries may look like or if you only have a few
>character columns so that you can make an index on each of them and if
>you can put a unique clustered index on the most important (frequently
>used) item, then you can get the majority of your queries to run very
>quickly, and only on rare occasions will the query be slow.
>
>If I have a 4 million row table, with long rows and big varchar columns
>and I run a query on a column like this:
>
>    SELECT * FROM inventory WHERE product LIKE '%Table%'
>
>It isn't going to be fast on any system with any database.
>[snip]
>
>
Well a better hardware will soon be working but anyway your idea about
clustering solved my problem.

Thanks alot again! :)
Ben-Nes Yonatan
Canaan Surfing ltd.
http://www.canaan.net.il


Re: Query results caching?

От
Ron Mayer
Дата:
Dann Corbit wrote:
>
> If I have a 4 million row table, with long rows and big varchar columns
> and I run a query on a column like this:
>
>     SELECT * FROM inventory WHERE product LIKE '%Table%'
>
> It isn't going to be fast on any system with any database.

Hypothetically it seems one could theoretically use some sort
of GIST index not unlike the contrib/trigram stuff to speed
up like clauses like that.   If so, I wonder if down the road
that could be a nice competitive advantage over systems with
less flexible index systems.  Is that a possible TODO?