Why could different data in a table be processed with different performance?

Поиск
Список
Период
Сортировка
I am experiencing a strange performance problem when accessing JSONB content by primary key.

My DB version() is PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
postgres.conf: https://justpaste.it/6pzz1
uname -a
: Linux postgresnlpslave 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
The machine is virtual, running under Hyper-V.
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 1x1 cores
Disk storage: the host has two vmdx drives, first shared between the root partition and an LVM PV, second is a single LVM PV. Both PVs are in a VG containing swap and postgres data partitions. The data is mostly on the first PV.

I have such a table:

CREATE TABLE articles
(
    article_id bigint NOT NULL,
    content jsonb NOT NULL,
    published_at timestamp without time zone NOT NULL,
    appended_at timestamp without time zone NOT NULL,
    source_id integer NOT NULL,
    language character varying(2) NOT NULL,
    title text NOT NULL,
    topicstopic[] NOT NULL,
    objects object[] NOT NULL,
    cluster_id bigint NOT NULL,
    CONSTRAINT articles_pkey PRIMARY KEY (article_id)
)


We have a Python lib (using psycopg2 driver) to access this table. It executes simple queries to the table, one of them is used for bulk downloading of content and looks like this:

select content from articles where id between $1 and $2

I noticed that with some IDs it works pretty fast while with other it is 4-5 times slower. It is suitable to note, there are two main 'categories' of IDs in this table: first is range 270000000-500000000, and second is range 10000000000-100030000000. For the first range it is 'fast' and for the second it is 'slow'. Besides larger absolute numbers withdrawing them from int to bigint, values in the second range are more 'sparse', which means in the first range values are almost consequent (with very few 'holes' of missing values) while in the second range there are much more 'holes' (average filling is 35%). Total number of rows in the first range: ~62M, in the second range: ~10M.

I conducted several experiments to eliminate possible influence of library's code and network throughput, I omit some of them. I ended up with iterating over table with EXPLAIN to simulate read load:

explain (analyze, buffers)
select count(*), sum(length(content::text)) from articles where article_id between %s and %s


Sample output:

Aggregate  (cost=8635.91..8635.92 rows=1 width=16) (actual time=6625.993..6625.995 rows=1 loops=1)
  Buffers: shared hit=26847 read=3914
  ->  Index Scan using articles_pkey on articles  (cost=0.57..8573.35 rows=5005 width=107) (actual time=21.649..1128.004 rows=5000 loops=1)
        Index Cond: ((article_id >= 438000000) AND (article_id <= 438005000))
        Buffers: shared hit=4342 read=671
Planning time: 0.393 ms
Execution time: 6626.136 ms

Aggregate  (cost=5533.02..5533.03 rows=1 width=16) (actual time=33219.100..33219.102 rows=1 loops=1)
  Buffers: shared hit=6568 read=7104
  ->  Index Scan using articles_pkey on articles  (cost=0.57..5492.96 rows=3205 width=107) (actual time=22.167..12082.624 rows=2416 loops=1)
        Index Cond: ((article_id >= '100021000000'::bigint) AND (article_id <= '100021010000'::bigint))
        Buffers: shared hit=50 read=2378
Planning time: 0.517 ms
Execution time: 33219.218 ms


During iteration, I parse the result of EXPLAIN and collect series of following metrics:

- buffer hits/reads for the table,
- buffer hits/reads for the index,
- number of rows (from "Index Scan..."),
- duration of execution.

Based on metrics above I calculate inherited metrics:

- disk read rate: (index reads + table reads) * 8192 / duration,
- reads ratio: (index reads + table reads) / (index reads + table reads + index hits + table hits),
- data rate: (index reads + table reads + index hits + table hits) * 8192 / duration,
- rows rate: number of rows / duration.

Since "density" of IDs is different in "small" and "big" ranges, I adjusted size of chunks in order to get around 5000 rows on each iteration in both cases, though my experiments show that chunk size does not really matter a lot.

The issue posted at the very beginning of my message was confirmed for the *whole* first and second ranges (so it was not just caused by randomly cached data).

To eliminate cache influence, I restarted Postgres server with flushing buffers:

/$ postgresql stop; sync; echo 3 > /proc/sys/vm/drop_caches; postgresql start

After this I repeated the test and got next-to-same picture.

"Small' range: disk read rate is around 10-11 MB/s uniformly across the test. Output rate was 1300-1700 rows/s. Read ratio is around 13% (why? Shouldn't it be ~ 100% after drop_caches?).
"Big" range: In most of time disk read speed was about 2 MB/s but sometimes it jumped to 26-30 MB/s. Output rate was 70-80 rows/s (but varied a lot and reached 8000 rows/s). Read ratio also varied a lot.

I rendered series from the last test into charts:
"Small" range: https://i.stack.imgur.com/3Zfml.png
"Big" range (insane): https://i.stack.imgur.com/VXdID.png

During the tests I verified disk read speed with iotop and found its indications very close to ones calculated by me based on EXPLAIN BUFFERS. I cannot say I was monitoring it all the time, but I confirmed it when it was 2 MB/s and 22 MB/s on the second range and 10 MB/s on the first range. I also checked with htop that CPU was not a bottleneck and was around 3% during the tests.

The issue is reproducible on both master and slave servers. My tests were conducted on slave, while there were no any other load on DBMS, or disk activity on the host unrelated to DBMS.

My only assumption is that different fragments of data are being read with different speed due to virtualization or something, but... why is it so strictly bound to these ranges? Why is it the same on two different machines?

The file system performance measured by dd:

root@postgresnlpslave:/# echo 3 > /proc/sys/vm/drop_caches 
root@postgresnlpslave:/# dd if=/dev/mapper/postgresnlpslave--vg-root of=/dev/null bs=8K count=128K
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.12304 s, 506 MB/s

Am I missing something? What else can I do to narrow down the cause?

В списке pgsql-performance по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: To keep indexes in memory, is large enough effective_cache_size enough?
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Why could different data in a table be processed with differentperformance?