Re: large tables and simple "= constant" queries using indexes
От | Erik Jones |
---|---|
Тема | Re: large tables and simple "= constant" queries using indexes |
Дата | |
Msg-id | FC30283E-575E-4119-A7CA-0645CC71D41F@myemma.com обсуждение исходный текст |
Ответ на | Re: large tables and simple "= constant" queries using indexes (John Beaver <john.e.beaver@gmail.com>) |
Список | pgsql-performance |
On Apr 10, 2008, at 9:44 AM, John Beaver wrote: > Thanks a lot, all of you - this is excellent advice. With the data > clustered and statistics at a more reasonable value of 100, it now > reproducibly takes even less time - 20-57 ms per query. > > After reading the section on "Statistics Used By the Planner" in the > manual, I was a little concerned that, while the statistics sped up > the queries that I tried immeasurably, that the most_common_vals > array was where the speedup was happening, and that the values which > wouldn't fit in this array wouldn't be sped up. Though I couldn't > offhand find an example where this occurred, the clustering approach > seems intuitively like a much more complete and scalable solution, > at least for a read-only table like this. > > As to whether the entire index/table was getting into ram between my > statistics calls, I don't think this was the case. Here's the > behavior that I found: > - With statistics at 10, the query took 25 (or so) seconds no matter > how many times I tried different values. The query plan was the same > as for the 200 and 800 statistics below. > - Trying the same constant a second time gave an instantaneous > result, I'm guessing because of query/result caching. > - Immediately on increasing the statistics to 200, the query took a > reproducibly less amount of time. I tried about 10 different values > - Immediately on increasing the statistics to 800, the query > reproducibly took less than a second every time. I tried about 30 > different values. > - Decreasing the statistics to 100 and running the cluster command > brought it to 57 ms per query. > - The Activity Monitor (OSX) lists the relevant postgres process as > taking a little less than 500 megs. > - I didn't try decreasing the statistics back to 10 before I ran the > cluster command, so I can't show the search times going up because > of that. But I tried killing the 500 meg process. The new process > uses less than 5 megs of ram, and still reproducibly returns a > result in less than 60 ms. Again, this is with a statistics value of > 100 and the data clustered by gene_prediction_view_gene_ref_key. > > And I'll consider the idea of using triggers with an ancillary table > for other purposes; seems like it could be a useful solution for > something. FWIW, killing the backend process responsible for the query won't necessarily clear the table's data from memory as that will be in the shared_buffers. If you really want to flush the data from memory you need to read in data from other tables of a size total size greater than your shared_buffers setting. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
В списке pgsql-performance по дате отправления: