Обсуждение: Does PostgreSQL cache all columns of a table after SELECT?
Hello, I have the following table with the following columns: large_a: text (few dozen characters up to a few hundred) large_b: text (few dozen characters up to a few hundred) The table has several million rows. The DB is running on a large machine. I perform the following query: select large_a from table; The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a few seconds. What I want to know: Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything.
Hello Tim, Can you send the EXPLAIN (BUFFERS, ANALYZE, VERBOSE) of your query? This will show you whether the plan is using cachingor not. > On 5 Jun 2023, at 09:15, Tim Schwenke <tim@trallnag.com> wrote: > > Hello, > > I have the following table with the following columns: > > large_a: text (few dozen characters up to a few hundred) > large_b: text (few dozen characters up to a few hundred) > > The table has several million rows. The DB is running on a large machine. > > I perform the following query: > > select large_a from table; > > The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a few seconds. > > What I want to know: > > Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything. > > > Regards, William Alves
On Monday, June 5, 2023, Tim Schwenke <tim@trallnag.com> wrote:
Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything.
Shared buffers is a page cache.
David J.
Hello David, from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation bycolumns. https://www.postgresql.org/docs/8.1/storage.html This means if I select a single column from a table the first time, the full file / segment is read and put into page cacheif there is enough space. This means a table with only one large column large_a takes up less page cache compared toa table with many large columns large_a and large_b, even though in both cases only large_a is selected. Is that more or less correct? Ignoring toast? Tim S. ---- On Mon, 05 Jun 2023 14:58:21 +0200 David G. Johnston wrote --- > > > On Monday, June 5, 2023, Tim Schwenke tim@trallnag.com> wrote: > > > Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything. > > > Shared buffers is a page cache. > > David J. >
(second try, forgot to put mailing list in Cc) Hello William, here is the query plan. Note that it does not exactly match my example before. In this case the large columns are calledfp_smiles_rdkit and smiles_rdkit for example. I am wondering if smiles_rdkit is also cached even if this column isnot mentioned in the query statement: select id, arthor.similar(fp_smiles_rdkit, "masked molecule") as score from table order by score desc limit 1000; Probably not because the other columns except fp_smiles_rdkit are not mentioned anywhere? Limit (cost=9066733.30..9066856.96 rows=1000 width=14) (actual time=2693.416..2705.022 rows=1000 loops=1) Output: moleculeid, (arthor."similar"(fp_smiles_rdkit, 'MASKED'::text)) Buffers: shared hit=22555 read=8871264 -> Gather Merge (cost=9066733.30..10548560.34 rows=11982869 width=14) (actual time=2632.949..2644.494 rows=1000 loops=1) Output: moleculeid, (arthor."similar"(fp_smiles_rdkit, 'MASKED'::text)) Workers Planned: 8 Workers Launched: 8 Buffers: shared hit=22555 read=8871264 -> Sort (cost=9065733.15..9069477.80 rows=1497859 width=14) (actual time=2616.622..2616.663 rows=902 loops=9) Output: moleculeid, (arthor."similar"(fp_smiles_rdkit, 'MASKED'::text)) Sort Key: (arthor."similar"(m.fp_smiles_rdkit, 'MASKED'::text)) DESC Sort Method: top-N heapsort Memory: 129kB Buffers: shared hit=22555 read=8871264 Worker 0: actual time=2611.366..2611.438 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 128kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.427 ms, Inlining 64.124 ms, Optimization 61.353 ms, Emission 37.144 ms, Total 163.048ms Buffers: shared hit=1719 read=710940 Worker 1: actual time=2615.620..2615.658 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 127kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.270 ms, Inlining 37.866 ms, Optimization 36.915 ms, Emission 22.436 ms, Total 97.488ms Buffers: shared hit=2653 read=1045217 Worker 2: actual time=2615.612..2615.651 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 125kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.274 ms, Inlining 37.858 ms, Optimization 36.761 ms, Emission 22.590 ms, Total 97.484ms Buffers: shared hit=2487 read=1059144 Worker 3: actual time=2612.136..2612.204 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 125kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.410 ms, Inlining 64.523 ms, Optimization 61.715 ms, Emission 37.138 ms, Total 163.786ms Buffers: shared hit=1813 read=712819 Worker 4: actual time=2615.616..2615.654 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 129kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.271 ms, Inlining 38.924 ms, Optimization 36.381 ms, Emission 22.442 ms, Total 98.018ms Buffers: shared hit=2479 read=1066168 Worker 5: actual time=2615.614..2615.651 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 125kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.271 ms, Inlining 37.860 ms, Optimization 36.771 ms, Emission 22.582 ms, Total 97.484ms Buffers: shared hit=2538 read=1049422 Worker 6: actual time=2615.610..2615.647 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 125kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.271 ms, Inlining 37.864 ms, Optimization 36.677 ms, Emission 22.675 ms, Total 97.487ms Buffers: shared hit=2465 read=1068788 Worker 7: actual time=2615.607..2615.644 rows=1000 loops=1 Sort Method: top-N heapsort Memory: 128kB JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.271 ms, Inlining 37.864 ms, Optimization 36.749 ms, Emission 22.597 ms, Total 97.481ms Buffers: shared hit=3771 read=1045888 -> Parallel Seq Scan on eval.molecule m (cost=0.00..8912094.23 rows=1497859 width=14) (actual time=101.301..2443.105rows=1331595 loops=9) Output: moleculeid, arthor."similar"(fp_smiles_rdkit, 'MASKED'::text) Buffers: shared hit=22107 read=8871264 Worker 0: actual time=162.863..2409.947 rows=862751 loops=1 Buffers: shared hit=1663 read=710940 Worker 1: actual time=97.449..2454.168 rows=1440465 loops=1 Buffers: shared hit=2597 read=1045217 Worker 2: actual time=97.442..2453.629 rows=1434116 loops=1 Buffers: shared hit=2431 read=1059144 Worker 3: actual time=163.622..2408.849 rows=871340 loops=1 Buffers: shared hit=1757 read=712819 Worker 4: actual time=97.964..2452.247 rows=1428958 loops=1 Buffers: shared hit=2423 read=1066168 Worker 5: actual time=97.444..2453.461 rows=1427206 loops=1 Buffers: shared hit=2482 read=1049422 Worker 6: actual time=97.443..2453.294 rows=1448356 loops=1 Buffers: shared hit=2409 read=1068788 Worker 7: actual time=97.439..2453.456 rows=1436563 loops=1 Buffers: shared hit=3715 read=1045888 Planning Time: 0.061 ms JIT: Functions: 19 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 2.868 ms, Inlining 358.141 ms, Optimization 380.158 ms, Emission 231.978 ms, Total 973.146 ms Execution Time: 2705.518 ms (94 rows) Tim S. ---- On Mon, 05 Jun 2023 14:19:53 +0200 William Alves Da Silva wrote --- > Hello Tim, > > Can you send the EXPLAIN (BUFFERS, ANALYZE, VERBOSE) of your query? This will show you whether the plan is using cachingor not. > > > > On 5 Jun 2023, at 09:15, Tim Schwenke tim@trallnag.com> wrote: > > > > Hello, > > > > I have the following table with the following columns: > > > > large_a: text (few dozen characters up to a few hundred) > > large_b: text (few dozen characters up to a few hundred) > > > > The table has several million rows. The DB is running on a large machine. > > > > I perform the following query: > > > > select large_a from table; > > > > The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a fewseconds. > > > > What I want to know: > > > > Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything. > > > > > > > > Regards, > William Alves > >
Hello,
I just tested it and it indeed works this way.
1. Created new table with only relevant columns.
2. Stopped PostgreSQL DB.
3. Dropped page cache.
4. Started again.
5. Run query against new table
6. Way less cache used, query is faster, less data is being read into cache.
---- On Mon, 05 Jun 2023 15:17:17 +0200 Tim Schwenke <tim@trallnag.com> wrote ---
Hello David,
from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation by columns.
https://www.postgresql.org/docs/8.1/storage.html
This means if I select a single column from a table the first time, the full file / segment is read and put into page cache if there is enough space. This means a table with only one large column large_a takes up less page cache compared to a table with many large columns large_a and large_b, even though in both cases only large_a is selected.
Is that more or less correct? Ignoring toast?
Tim S.
---- On Mon, 05 Jun 2023 14:58:21 +0200 David G. Johnston wrote ---
>
>
> On Monday, June 5, 2023, Tim Schwenke tim@trallnag.com> wrote:
>
>
> Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything.
>
>
> Shared buffers is a page cache.
>
> David J.
>
On Mon, 2023-06-05 at 14:15 +0200, Tim Schwenke wrote: > Hello, > > I have the following table with the following columns: > > large_a: text (few dozen characters up to a few hundred) > large_b: text (few dozen characters up to a few hundred) > > The table has several million rows. The DB is running on a large machine. > > I perform the following query: > > select large_a from table; > > The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a few seconds. > > What I want to know: > > Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything. It depends. If "large_a" and "large_b" are TOASTed, then they will only be cached if the column is explicitly selected. For shorter attributes, the whole row is in one piece, and all columns will get cached. However, for sequential scans that are larger than a quarter of shared buffers, PostgreSQL uses a ring buffer to avoid blowing out the cache. In that case, most data won't be cached for the next query. Yours, Laurenz Albe
On Mon, Jun 5, 2023, 06:17 Tim Schwenke <tim@trallnag.com> wrote:
Hello David,
from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation by columns.
https://www.postgresql.org/docs/8.1/storage.html
This means if I select a single column from a table the first time, the full file / segment is read and put into page cache if there is enough space.
Pages are subsegment.
You may also wish to take care to notice the version of documentation pages you look at and reference. 8.1 is long past its expiration date.
David J.
Thanks, you all helped me
---- On Mon, 05 Jun 2023 16:26:08 +0200 David G. Johnston <david.g.johnston@gmail.com> wrote ---
On Mon, Jun 5, 2023, 06:17 Tim Schwenke <tim@trallnag.com> wrote:Hello David,
from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation by columns.
https://www.postgresql.org/docs/8.1/storage.html
This means if I select a single column from a table the first time, the full file / segment is read and put into page cache if there is enough space.Pages are subsegment.You may also wish to take care to notice the version of documentation pages you look at and reference. 8.1 is long past its expiration date.David J.