Обсуждение: Cost Model
Hello All
Anyone know if there is a patch that changes the PostgreSQL cost model in some way?
I'm working with an tuning technique, based in hybrid storage environments (HDDs and SSDs).
I need to know if exist any patches that allow postgresql to differentiate, for example, I/O costs of read and write and also costs of sequential read and random read.
This is because the read and write speeds of SSDs are asynchronous. On SSDs the read speed can be 340 times faster than writing.
In Hard Disks the sequential reading and writing values are almost the same. Also in HDDs the random reading is much slower than the sequential reading.
In the SSDs the sequential and random reading are practically the same, due to being a purely electronic device.
I would like to know if there is any solution related to this that I said, because I would not need to develop this if something similar already exists.
It would be interesting if there is a patch that inserts new parameters in the postgresql.conf file to:
- sequencial read page cost
- random read page cost
- sequencial write page cost
- random write page cost
Any comment, hint about it or something, please inform me.
Regards,
Neto.
Any comment, hint about it or something, please inform me.
The docs contain this - its seem to cover what you describe:
David J.
Dear David
I have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a patch or other solution, that allows you to configure individual parameters to:
- Sequential reading page cost
- cost of the random reading page
- sequential recording page cost
- Random recording page cost
Best Regards
Neto
2017-12-20 15:35 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
Any comment, hint about it or something, please inform me.The docs contain this - its seem to cover what you describe:David J.
neto brpr wrote: > Dear David > I have read documentation that you send, but it has only sequential page > cost and random page cost parameters. What I need, would be a model of > custo for Differentiate Read/Write (sequential and random), because in SSDs > the reads and writes have different costs. I don't think it matters. I mean, differentiating seq/random read speeds can make the planner choose one plan type over another depending on how much each plan intends to read randomly or sequentially. But why does it matter if one write is 360x as expensive as one read? No plan is going to change usefully because of that, because you can't turn one write into 360 reads or even 100000000 reads. If you said "writes of type X are 100 times as fast as writes of type Y", then some useful cost model could perhaps be developed. But that's not what you're saying. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Dear DavidI have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a patch or other solution, that allows you to configure individual parameters to:- Sequential reading page cost- cost of the random reading page- sequential recording page cost- Random recording page cost
Please don't top-post.
OK...reading more closely I don't see how "recording/writing" costs are important in decision making. Either you have to write something, or you don't. If you do you updates pages in a buffer and generate WAL and then the system puts the data onto disk where they belong - the database itself doesn't care about that part and knowing how fast or slow it might happen would impact it behavior. So PostgreSQL provides read settings to be tuned so it can decide between index and table scans on the table in question.
The system doesn't understand SSD or HDD but does understand tablespaces and I believe that many of these settings are able to be configured on a per-tablespace (or table?) basis.
David J.
2017-12-20 16:35 GMT-02:00 Alvaro Herrera <alvherre@alvh.no-ip.org>:
neto brpr wrote:
> Dear David
> I have read documentation that you send, but it has only sequential page
> cost and random page cost parameters. What I need, would be a model of
> custo for Differentiate Read/Write (sequential and random), because in SSDs
> the reads and writes have different costs.
I don't think it matters. I mean, differentiating seq/random read
speeds can make the planner choose one plan type over another depending
on how much each plan intends to read randomly or sequentially. But why
does it matter if one write is 360x as expensive as one read? No plan
is going to change usefully because of that, because you can't turn one
write into 360 reads or even 100000000 reads.
Just to explain it better. The idea of differentiating read and write parameters (sequential and random) is exactly so that the access plans can be better chosen by the optimizer. But for this, the Hash join, merge join, sorting and other algorithms should also be changed to consider these new parameters.
Because postgresql uses a cost-based optimizer, I believe that differentiating these costs can have a positive impact on the process of choosing access methods... This is just an opinion, I'm not sure.
If you said "writes of type X are 100 times as fast as writes of type
Y", then some useful cost model could perhaps be developed. But that's
not what you're saying.
Anyway, It seems that there has not yet been any initiative related to this in the postgresql community, am I right?
Best Regards
Neto
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
neto brpr wrote: > Anyway, It seems that there has not yet been any initiative related to this > in the postgresql community, am I right? Yes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-12-20 16:37 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
Dear DavidI have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a patch or other solution, that allows you to configure individual parameters to:- Sequential reading page cost- cost of the random reading page- sequential recording page cost- Random recording page costPlease don't top-post.
OK, sorry!
OK...reading more closely I don't see how "recording/writing" costs are important in decision making. Either you have to write something, or you don't. If you do you updates pages in a buffer and generate WAL and then the system puts the data onto disk where they belong - the database itself doesn't care about that part and knowing how fast or slow it might happen would impact it behavior. So PostgreSQL provides read settings to be tuned so it can decide between index and table scans on the table in question.
The system doesn't understand SSD or HDD but does understand tablespaces and I believe that many of these settings are able to be configured on a per-tablespace (or table?) basis.
About what you said, that some settings can be configured by Tablespace?
I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?.
You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.
Regards
Neto
David J.
About what you said, that some settings can be configured by Tablespace?I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?.You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.
ALTER TABLESPACE
David J.
On 2017-12-20 17:13:31 -0200, neto brpr wrote: > Just to explain it better. The idea of differentiating read and write > parameters (sequential and random) is exactly so that the access plans can > be better chosen by the optimizer. But for this, the Hash join, merge join, > sorting and other algorithms should also be changed to consider these new > parameters. I'm doubtful that there's that much benefit. Mergejoin doesn't write, hashjoins commonly don't write , and usually if so there's not that many alternatives to batched hashjoins. Similar-ish with sorts, although sometimes that can instead be done using ordered index scans. What are the cases you forsee where costing reads/writes differently will lead to better plans? Greetings, Andres Freund
2017-12-20 17:34 GMT-02:00 Andres Freund <andres@anarazel.de>:
On 2017-12-20 17:13:31 -0200, neto brpr wrote:
> Just to explain it better. The idea of differentiating read and write
> parameters (sequential and random) is exactly so that the access plans can
> be better chosen by the optimizer. But for this, the Hash join, merge join,
> sorting and other algorithms should also be changed to consider these new
> parameters.
I'm doubtful that there's that much benefit. Mergejoin doesn't write,
hashjoins commonly don't write , and usually if so there's not that many
alternatives to batched hashjoins. Similar-ish with sorts, although
sometimes that can instead be done using ordered index scans.
Dear Andres
By reading some cientific paper, it has been said that the hash join and sort merge join algorithms perform better than nested loop, considering that it runs on an HDD, since the costs of read and write are practically the same (symmetrical). However, in an SSD, where the cost of write is double the cost of reads, this is not true, since both algorithms (sort merge and hash join) require some writing operations on disk, when the data does not fit in memory RAM. If we consider that the nested loop that works only with read, and in case there is an index for the internal table, the nested loop this would be a good alternative, since the readings on SSDs are many times faster than in HDDs. This is an example of a situation in which the difference between reading and writing could make the Optimizer choose the Nested Loop rather than the Hash Join.
Regards
What are the cases you forsee where costing reads/writes differently
will lead to better plans?
Greetings,
Andres Freund
2017-12-20 17:32 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
About what you said, that some settings can be configured by Tablespace?I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?.You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.ALTER TABLESPACE
A question about setting random_page_cost and seq_page_cost per tablespace.
If these values are set to tablespace, will the random_page_cost and seq_page_cost parameters of the postgresql.conf file be ignored for the Tablespace that have been configured for these parameters?
Regards, Neto
David J.
On Thu, Dec 21, 2017 at 3:09 PM, Neto BR <netobrpr@gmail.com> wrote:
-- 2017-12-20 17:32 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:About what you said, that some settings can be configured by Tablespace?I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?.You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.ALTER TABLESPACEA question about setting random_page_cost and seq_page_cost per tablespace.If these values are set to tablespace, will the random_page_cost and seq_page_cost parameters of the postgresql.conf file be ignored for the Tablespace that have been configured for these parameters?
On the page pointed by the link, the explanation of tablespace_option says
"Setting either value for a particular tablespace will override the planner's usual estimate of the cost of reading pages from tables in that tablespace, as established by the configuration parameters of the same name (see seq_page_cost, random_page_cost, effective_io_concurrency).". This means the tablespace specific values will override the ones in configuration i.e. postgresql.conf. Answer to your question is yes.
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company