Re: Help with bulk read performance

Поиск

Список

Период

Сортировка

От	Jim Nasby
Тема	Re: Help with bulk read performance
Дата	14 декабря 2010 г. 03:54:15
Msg-id	4FD088B1-7B0B-47BA-BAA5-5A75205B94E6@nasby.net обсуждение исходный текст
Ответ на	Help with bulk read performance (Dan Schaffer <Daniel.S.Schaffer@noaa.gov>)
Список	pgsql-performance

Дерево обсуждения

On Nov 1, 2010, at 9:15 AM, Dan Schaffer wrote:
> We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table).
We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s
versus100 MB/s).  Part of this is due to the storage overhead we have observed in Postgres.  In the example below, it
takes1 GB to store 350 MB of nominal data.  However that suggests we would expect to get 35 MB/s bulk read rates. 
>
> Observations using iostat and top during these bulk reads suggest that the queries are CPU bound, not I/O bound.  In
fact,repeating the queries yields similar response times.  Presumably if it were an I/O issue the response times would
bemuch shorter the second time through with the benefit of caching. 
>
> We have tried these simple queries using psql, JDBC, pl/java stored procedures, and libpq.  In all cases the client
coderan on the same box as the server. 
> We have experimented with Postgres 8.1, 8.3 and 9.0.
>
> We also tried playing around with some of the server tuning parameters such as shared_buffers to no avail.
>
> Here is uname -a for a machine we have tested on:
>
> Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>
> A sample dataset that reproduces these results looks like the following (there are no indexes):
>
> Table "bulk_performance.counts"
> Column |  Type   | Modifiers
> --------+---------+-----------
> i1     | integer |
> i2     | integer |
> i3     | integer |
> i4     | integer |
>
> There are 22 million rows in this case.
>
> We HAVE observed that summation queries run considerably faster.  In this case,
>
> select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts
>
> runs at 35 MB/s.
>
>
> Our business logic does operations on the resulting data such that the output is several orders of magnitude smaller
thanthe input.  So we had hoped that by putting our business logic into stored procedures (and thus drastically
reducingthe amount of data flowing to the client) our throughput would go way up.  This did not happen. 
>
> So our questions are as follows:
>
> Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close
tothe expected 35 MB/s doing these bulk reads?  Or is this the price we have to pay for using SQL instead of some NoSQL
solution. (We actually tried Tokyo Cabinet and found it to perform quite well. However it does not measure up to
Postgresin terms of replication, data interrogation, community support, acceptance, etc). 

Have you by chance tried EXPLAIN ANALYZE SELECT * FROM bulk_performance.counts? That will throw away the query results,
whichremoves client-server considerations. 

Also, when you tested raw disk IO, did you do it with an 8k block size? That's the default size of a Postgres block, so
allof it's IO is done that way. 

What does iostat show you? Are you getting a decent number of read requests/second?
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Help with bulk read performance