Обсуждение: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

Поиск
Список
Период
Сортировка

[GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

От
유상지
Дата:

 

I want to get help with Postgresql.

I investigated that Postgresql could be rather fast in an environment using a secondary index. but It came up with different results on benckmark.

The database I compared was mariadb, and the benchmark tool was sysbench 1.0.8 with the postgresql driver.

Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB, Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)

 

The created sysbench progress statement is as follows.

Sysbench /usr/share/sysbench/oltp_read_only.lua --db-driver = mysql --mysql-host = 127.0.0.1 --mysql-port = 3306 --mysql-user = root --mysql-password = ajdajddl75 - Mysql-db = sysbench --tables = 3 --table_size = 100000 --report-interval = 10 --secondary = on --time = 60 

 

Used options

Select only, Num of threads = 1, num of tables = 3, table-size = 100000 and Table-size = 1000000, secondary index select instead of primary key.

 

 

 

 My hypothesis was that  selecting by secondary index in postgresql is faster than in Mariadb. However, the results depend on table size.

 

Postgresql was faster than Mariadb when the table size was 1000000, but slower at 100000.

 

Cluster secondary indexes were faster than those without cluster indexes in pg, but slower than mariadb.

 

I'd like to see the difference in architecture rather than optimization, so every benchmark executed with default options except for clustered index.

 I wonder if there are any settings I missed.

 

I would be very pleased if someone could explain why these results came up.

Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

От
Melvin Davidson
Дата:


On Wed, Aug 30, 2017 at 10:03 PM, 유상지 <y0212@naver.com> wrote:

 

I want to get help with Postgresql.

I investigated that Postgresql could be rather fast in an environment using a secondary index. but It came up with different results on benckmark.

The database I compared was mariadb, and the benchmark tool was sysbench 1.0.8 with the postgresql driver.

Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB, Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)

 

The created sysbench progress statement is as follows.

Sysbench /usr/share/sysbench/oltp_read_only.lua --db-driver = mysql --mysql-host = 127.0.0.1 --mysql-port = 3306 --mysql-user = root --mysql-password = ajdajddl75 - Mysql-db = sysbench --tables = 3 --table_size = 100000 --report-interval = 10 --secondary = on --time = 60 

 

Used options

Select only, Num of threads = 1, num of tables = 3, table-size = 100000 and Table-size = 1000000, secondary index select instead of primary key.

 

 

 

 My hypothesis was that  selecting by secondary index in postgresql is faster than in Mariadb. However, the results depend on table size.

 

Postgresql was faster than Mariadb when the table size was 1000000, but slower at 100000.

 

Cluster secondary indexes were faster than those without cluster indexes in pg, but slower than mariadb.

 

I'd like to see the difference in architecture rather than optimization, so every benchmark executed with default options except for clustered index.

 I wonder if there are any settings I missed.

 

I would be very pleased if someone could explain why these results came up.


>Postgresql was faster than Mariadb when the table size was 1000000, but slower at 100000.

You made a general statement, but you left out a lot of important information.

A. Did you do an ANALYZE table_name BEFORE running your test?
B. Did you verify the index was being used with EXPLAIN your_query?
C. What was the exact query you used?
D. Most important, what is the structure of the table and index?
E. How much system memory is there?
F. In postgresql.conf What are the settings for
    1. shared_memory
    2. work_memory
    3. All Planner Cost Constants values, All Genetic Query Optimizer values
--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

От
Thomas Kellerer
Дата:
유상지 schrieb am 31.08.2017 um 04:03:
> Cluster secondary indexes were faster than those without cluster indexes in pg, but slower than mariadb.

There is no such thing as a "clustered index" in Postgres.

The Postgres "cluster" command physically sorts the rows of a table according to the sort order of an index, but that
issomething completely different then a "clustered index". The data is still stored in the index and the table.  

A clustered index in MariaDB/MySQL stores the entire table data. So the table and the index is the same thing (Oracle
callsthat an "index organized table" which describes this a lot better). As the table and index are the same thing you
can'thave two clustered indexes on the same table.  

An index in Postgres only stores the data of the indexed columns (plus an internal row identifier). There is no
technicaldifference between a primary key index and any other index. The structure and storage is always the same. So
theterm "secondary index" does not really make sense in Postgres. 




Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb

От
Merlin Moncure
Дата:
On Wed, Aug 30, 2017 at 9:03 PM, 유상지 <y0212@naver.com> wrote:

I want to get help with Postgresql.

I investigated that Postgresql could be rather fast in an environment using a secondary index. but It came up with different results on benckmark.

The database I compared was mariadb, and the benchmark tool was sysbench 1.0.8 with the postgresql driver.

Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB, Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)

mysql and other systems (for example sql server) use a technique where the table is automatically clustered around an index-- generally the primary key.  This technique has some tradeoffs; the main upside is that lookups on the pkey are somewhat faster whereas lookups on any other index are somewhat slower and insertions can be slower in certain cases (especially if guids are the pkey).  I would call this technique 'index organized table'.  The technique exploits the persistent organization so that the index is implied and does not have to be kept separate from the heap (the main table data storage).

postgres 'cluster' command currently is a one time pass over the table that organizes the table physically in index order but does not maintain the table in that order nor does it exploit the ordering to eliminate the primary key index in the manner that other systems do.   From a postgres point of view, the main advantage is that scans (not single record lookups) over the key will be sequential physical reads and will tend to have to read less physical pages since adjacent key records logically will also be adjacent physically.

For my part, I generally prefer the postgres style of organization for most workloads, particularly for the surrogate key pattern. I would definitely like to have the option of having the indexed organized style however.  It's definitely possible to tease out the tradeoffs in synthetic benchmarking but in the gross aggregate I suspect (but can't obviously prove) the technique is a loser since as database models mature the kinds of ways tables are indexed looked up and joined tends to proliferate. 

merlin