Обсуждение: Cost Model

Поиск
Список
Период
Сортировка

Cost Model

От
neto brpr
Дата:
Hello All

Anyone know if there is a patch that changes the PostgreSQL cost model in some way?
I'm working with an tuning technique, based in hybrid storage environments (HDDs and SSDs).

I need to know if exist any patches that allow postgresql to differentiate, for example, I/O costs of read and write and also costs of sequential read and random read.

This is because the read and write speeds of SSDs are asynchronous. On SSDs the read speed can be 340 times faster than writing.
In Hard Disks the sequential reading and writing values are almost the same. Also in HDDs the random reading is much slower than the sequential reading. 
In the SSDs the sequential and random reading are practically the same, due to being a purely electronic device.
I would like to know if there is any solution related to this that I said, because I would not need to develop this if something similar already exists.

It would be interesting if there is a patch that inserts new parameters in the postgresql.conf file to: 

- sequencial read page cost
- random read page cost
- sequencial write page cost
- random write page cost

Any comment, hint about it or something, please inform me.

Regards, 
Neto. 


Livre de vírus. www.avast.com.

Re: Cost Model

От
"David G. Johnston"
Дата:
On Wed, Dec 20, 2017 at 10:29 AM, neto brpr <netobrpr@gmail.com> wrote:
Any comment, hint about it or something, please inform me.

The docs contain this - its seem to cover what you describe:


David J.

Re: Cost Model

От
neto brpr
Дата:
Dear David 
I have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a patch or other solution, that allows you to configure individual parameters to:

- Sequential reading page cost
- cost of the random reading page
- sequential recording page cost
- Random recording page cost

Best Regards
Neto 

Livre de vírus. www.avast.com.

2017-12-20 15:35 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Wed, Dec 20, 2017 at 10:29 AM, neto brpr <netobrpr@gmail.com> wrote:
Any comment, hint about it or something, please inform me.

The docs contain this - its seem to cover what you describe:


David J.


Re: Cost Model

От
Alvaro Herrera
Дата:
neto brpr wrote:
> Dear David
> I have read documentation that you send, but it has only sequential page
> cost and random page cost parameters. What I need, would be a model of
> custo for Differentiate Read/Write (sequential and random), because in SSDs
> the reads and writes have different costs.

I don't think it matters.  I mean, differentiating seq/random read
speeds can make the planner choose one plan type over another depending
on how much each plan intends to read randomly or sequentially.  But why
does it matter if one write is 360x as expensive as one read?  No plan
is going to change usefully because of that, because you can't turn one
write into 360 reads or even 100000000 reads.

If you said "writes of type X are 100 times as fast as writes of type
Y", then some useful cost model could perhaps be developed.  But that's
not what you're saying.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Cost Model

От
"David G. Johnston"
Дата:
On Wed, Dec 20, 2017 at 11:26 AM, neto brpr <netobrpr@gmail.com> wrote:
Dear David 
I have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a patch or other solution, that allows you to configure individual parameters to:

- Sequential reading page cost
- cost of the random reading page
- sequential recording page cost
- Random recording page cost


​Please don't top-post.

OK...reading more closely I don't see how "recording/writing" costs are important in decision making.  Either you have to write something, or you don't.​  If you do you updates pages in a buffer and generate WAL and then the system puts the data onto disk where they belong - the database itself doesn't care about that part and knowing how fast or slow it might happen would impact it behavior.  So PostgreSQL provides read settings to be tuned so it can decide between index and table scans on the table in question.

The system doesn't understand SSD or HDD but does understand tablespaces and I believe that many of these settings are able to be configured on a per-tablespace (or table?) basis.

David J.

Re: Cost Model

От
neto brpr
Дата:


2017-12-20 16:35 GMT-02:00 Alvaro Herrera <alvherre@alvh.no-ip.org>:
neto brpr wrote:
> Dear David
> I have read documentation that you send, but it has only sequential page
> cost and random page cost parameters. What I need, would be a model of
> custo for Differentiate Read/Write (sequential and random), because in SSDs
> the reads and writes have different costs.

I don't think it matters.  I mean, differentiating seq/random read
speeds can make the planner choose one plan type over another depending
on how much each plan intends to read randomly or sequentially.  But why
does it matter if one write is 360x as expensive as one read?  No plan
is going to change usefully because of that, because you can't turn one
write into 360 reads or even 100000000 reads.


Just to explain it better. The idea of ​​differentiating read and write parameters (sequential and random) is exactly so that the access plans can be better chosen by the optimizer. But for this, the Hash join, merge join, sorting and other algorithms should also be changed to consider these new parameters. 
Because postgresql uses a cost-based optimizer, I believe that differentiating these costs can have a positive impact on the process of choosing access methods... This is just an opinion, I'm not sure.

If you said "writes of type X are 100 times as fast as writes of type
Y", then some useful cost model could perhaps be developed.  But that's
not what you're saying.


Anyway, It seems that there has not yet been any initiative related to this in the postgresql community, am I right?

Best Regards
Neto
 
--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Livre de vírus. www.avast.com.

Re: Cost Model

От
Alvaro Herrera
Дата:
neto brpr wrote:

> Anyway, It seems that there has not yet been any initiative related to this
> in the postgresql community, am I right?

Yes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Cost Model

От
neto brpr
Дата:


2017-12-20 16:37 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Wed, Dec 20, 2017 at 11:26 AM, neto brpr <netobrpr@gmail.com> wrote:
Dear David 
I have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a patch or other solution, that allows you to configure individual parameters to:

- Sequential reading page cost
- cost of the random reading page
- sequential recording page cost
- Random recording page cost


​Please don't top-post.

OK, sorry! 

OK...reading more closely I don't see how "recording/writing" costs are important in decision making.  Either you have to write something, or you don't.​  If you do you updates pages in a buffer and generate WAL and then the system puts the data onto disk where they belong - the database itself doesn't care about that part and knowing how fast or slow it might happen would impact it behavior.  So PostgreSQL provides read settings to be tuned so it can decide between index and table scans on the table in question.

The system doesn't understand SSD or HDD but does understand tablespaces and I believe that many of these settings are able to be configured on a per-tablespace (or table?) basis.


About what you said, that some settings can be configured by Tablespace?
I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?. 
You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.

Regards 
Neto 

David J.



Livre de vírus. www.avast.com.

Re: Cost Model

От
"David G. Johnston"
Дата:
On Wed, Dec 20, 2017 at 12:26 PM, neto brpr <netobrpr@gmail.com> wrote:

About what you said, that some settings can be configured by Tablespace?
I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?. 
You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.

​ALTER TABLESPACE​


​David J.​

Re: Cost Model

От
Andres Freund
Дата:
On 2017-12-20 17:13:31 -0200, neto brpr wrote:
> Just to explain it better. The idea of ​​differentiating read and write
> parameters (sequential and random) is exactly so that the access plans can
> be better chosen by the optimizer. But for this, the Hash join, merge join,
> sorting and other algorithms should also be changed to consider these new
> parameters.

I'm doubtful that there's that much benefit. Mergejoin doesn't write,
hashjoins commonly don't write , and usually if so there's not that many
alternatives to batched hashjoins. Similar-ish with sorts, although
sometimes that can instead be done using ordered index scans.

What are the cases you forsee where costing reads/writes differently
will lead to better plans?

Greetings,

Andres Freund


Re: Cost Model

От
neto brpr
Дата:


2017-12-20 17:34 GMT-02:00 Andres Freund <andres@anarazel.de>:
On 2017-12-20 17:13:31 -0200, neto brpr wrote:
> Just to explain it better. The idea of ​​differentiating read and write
> parameters (sequential and random) is exactly so that the access plans can
> be better chosen by the optimizer. But for this, the Hash join, merge join,
> sorting and other algorithms should also be changed to consider these new
> parameters.

I'm doubtful that there's that much benefit. Mergejoin doesn't write,
hashjoins commonly don't write , and usually if so there's not that many
alternatives to batched hashjoins. Similar-ish with sorts, although
sometimes that can instead be done using ordered index scans.

Dear Andres
By reading some cientific paper, it has been said that the hash join and sort merge join algorithms perform better than nested loop, considering that it runs on an HDD, since the costs of read and write are practically the same (symmetrical). However, in an SSD, where the cost of write is double the cost of reads, this is not true, since both algorithms (sort merge and hash join) require some writing operations on disk, when the data does not fit in memory RAM. If we consider that the nested loop that works only with read, and in case there is an index for the internal table, the nested loop this would be a good alternative, since the readings on SSDs are many times faster than in HDDs. This is an example of a situation in which the difference between reading and writing could make the Optimizer choose the Nested Loop rather than the Hash Join.

Regards
 
What are the cases you forsee where costing reads/writes differently
will lead to better plans?

Greetings,

Andres Freund


Livre de vírus. www.avast.com.

Re: Cost Model

От
Neto BR
Дата:


2017-12-20 17:32 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Wed, Dec 20, 2017 at 12:26 PM, neto brpr <netobrpr@gmail.com> wrote:

About what you said, that some settings can be configured by Tablespace?
I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?. 
You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.

​ALTER TABLESPACE​



A question about setting random_page_cost and seq_page_cost per tablespace.
If these values ​​are set to tablespace, will the random_page_cost and seq_page_cost parameters of the postgresql.conf file be ignored for the Tablespace that have been configured for these parameters?

Regards, Neto
 
​David J.​



Livre de vírus. www.avast.com.

Re: Cost Model

От
Ashutosh Bapat
Дата:


On Thu, Dec 21, 2017 at 3:09 PM, Neto BR <netobrpr@gmail.com> wrote:


2017-12-20 17:32 GMT-02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Wed, Dec 20, 2017 at 12:26 PM, neto brpr <netobrpr@gmail.com> wrote:

About what you said, that some settings can be configured by Tablespace?
I have already seen this in IBM DB2, but in Postgresql as far as I know, for example the Random_page_cost and Seq_page_cost parameters are configured for the Integer Database and not for Tablespace, ok?. 
You or someone can tell me if cost parameters can be configured by Tablespace, this would be useful for me, thinking of a server that has hybrid storage environment (HDD x SSDs), I could leave some in a tablespace with HDD adapted settings and the same way for when I have an SSD disk.

​ALTER TABLESPACE​



A question about setting random_page_cost and seq_page_cost per tablespace.
If these values ​​are set to tablespace, will the random_page_cost and seq_page_cost parameters of the postgresql.conf file be ignored for the Tablespace that have been configured for these parameters?

On the page pointed by the link, the explanation of tablespace_option says
"Setting either value for a particular tablespace will override the planner's usual estimate of the cost of reading pages from tables in that tablespace, as established by the configuration parameters of the same name (see seq_page_cost, random_page_cost, effective_io_concurrency).". This means the tablespace specific values will override the ones in configuration i.e. postgresql.conf. Answer to your question is yes.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company