Обсуждение: Does PostgreSQL cache all columns of a table after SELECT?

Поиск

Список

Период

Сортировка

Does PostgreSQL cache all columns of a table after SELECT?

От

Tim Schwenke

Дата:

05 июня 2023 г., 12:15:21

Hello,

I have the following table with the following columns:

large_a: text (few dozen characters up to a few hundred)
large_b: text (few dozen characters up to a few hundred)

The table has several million rows. The DB is running on a large machine.

I perform the following query:

select large_a from table;

The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a few
seconds.

What I want to know:

Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit
everything.

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

William Alves Da Silva

Дата:

05 июня 2023 г., 12:19:53

Hello Tim,

Can you send the EXPLAIN (BUFFERS, ANALYZE, VERBOSE) of your query? This will show you whether the plan is using
cachingor not. 


> On 5 Jun 2023, at 09:15, Tim Schwenke <tim@trallnag.com> wrote:
>
> Hello,
>
> I have the following table with the following columns:
>
> large_a: text (few dozen characters up to a few hundred)
> large_b: text (few dozen characters up to a few hundred)
>
> The table has several million rows. The DB is running on a large machine.
>
> I perform the following query:
>
> select large_a from table;
>
> The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a few
seconds.
>
> What I want to know:
>
> Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit
everything.
>
>
>

Regards,
William Alves

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

"David G. Johnston"

Дата:

05 июня 2023 г., 12:58:21

On Monday, June 5, 2023, Tim Schwenke <tim@trallnag.com> wrote:

Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything.

Shared buffers is a page cache.

David J.

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

Tim Schwenke

Дата:

05 июня 2023 г., 13:17:17

Hello David,

from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation
bycolumns. 

https://www.postgresql.org/docs/8.1/storage.html

This means if I select a single column from a table the first time, the full file / segment is read and put into page
cacheif there is enough space. This means a table with only one large column large_a takes up less page cache compared
toa table with many large columns large_a and large_b, even though in both cases only large_a is selected. 

Is that more or less correct? Ignoring toast?

Tim S.

---- On Mon, 05 Jun 2023 14:58:21 +0200 David G. Johnston  wrote ---

 >
 >
 > On Monday, June 5, 2023, Tim Schwenke tim@trallnag.com> wrote:
 >
 >
 > Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit
everything.
 >
 >
 > Shared buffers is a page cache.
 >
 > David J. 
 >

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

Tim Schwenke

Дата:

05 июня 2023 г., 13:19:08

(second try, forgot to put mailing list in Cc)
Hello William,

here is the query plan. Note that it does not exactly match my example before. In this case the large columns are
calledfp_smiles_rdkit and smiles_rdkit for example. I am wondering if smiles_rdkit is also cached even if this column
isnot mentioned in the query statement: 

select id, arthor.similar(fp_smiles_rdkit, "masked molecule") as score from table order by score desc limit 1000;

Probably not because the other columns except fp_smiles_rdkit are not mentioned anywhere?

Limit  (cost=9066733.30..9066856.96 rows=1000 width=14) (actual time=2693.416..2705.022 rows=1000 loops=1)
   Output: moleculeid, (arthor."similar"(fp_smiles_rdkit, 'MASKED'::text))
   Buffers: shared hit=22555 read=8871264
   ->  Gather Merge  (cost=9066733.30..10548560.34 rows=11982869 width=14) (actual time=2632.949..2644.494 rows=1000
loops=1)
         Output: moleculeid, (arthor."similar"(fp_smiles_rdkit, 'MASKED'::text))
         Workers Planned: 8
         Workers Launched: 8
         Buffers: shared hit=22555 read=8871264
         ->  Sort  (cost=9065733.15..9069477.80 rows=1497859 width=14) (actual time=2616.622..2616.663 rows=902
loops=9)
               Output: moleculeid, (arthor."similar"(fp_smiles_rdkit, 'MASKED'::text))
               Sort Key: (arthor."similar"(m.fp_smiles_rdkit, 'MASKED'::text)) DESC
               Sort Method: top-N heapsort  Memory: 129kB
               Buffers: shared hit=22555 read=8871264
               Worker 0:  actual time=2611.366..2611.438 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 128kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.427 ms, Inlining 64.124 ms, Optimization 61.353 ms, Emission 37.144 ms, Total
163.048ms 
                 Buffers: shared hit=1719 read=710940
               Worker 1:  actual time=2615.620..2615.658 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 127kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.270 ms, Inlining 37.866 ms, Optimization 36.915 ms, Emission 22.436 ms, Total
97.488ms 
                 Buffers: shared hit=2653 read=1045217
               Worker 2:  actual time=2615.612..2615.651 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 125kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.274 ms, Inlining 37.858 ms, Optimization 36.761 ms, Emission 22.590 ms, Total
97.484ms 
                 Buffers: shared hit=2487 read=1059144
               Worker 3:  actual time=2612.136..2612.204 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 125kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.410 ms, Inlining 64.523 ms, Optimization 61.715 ms, Emission 37.138 ms, Total
163.786ms 
                 Buffers: shared hit=1813 read=712819
               Worker 4:  actual time=2615.616..2615.654 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 129kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.271 ms, Inlining 38.924 ms, Optimization 36.381 ms, Emission 22.442 ms, Total
98.018ms 
                 Buffers: shared hit=2479 read=1066168
               Worker 5:  actual time=2615.614..2615.651 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 125kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.271 ms, Inlining 37.860 ms, Optimization 36.771 ms, Emission 22.582 ms, Total
97.484ms 
                 Buffers: shared hit=2538 read=1049422
               Worker 6:  actual time=2615.610..2615.647 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 125kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.271 ms, Inlining 37.864 ms, Optimization 36.677 ms, Emission 22.675 ms, Total
97.487ms 
                 Buffers: shared hit=2465 read=1068788
               Worker 7:  actual time=2615.607..2615.644 rows=1000 loops=1
                 Sort Method: top-N heapsort  Memory: 128kB
                 JIT:
                   Functions: 2
                   Options: Inlining true, Optimization true, Expressions true, Deforming true
                   Timing: Generation 0.271 ms, Inlining 37.864 ms, Optimization 36.749 ms, Emission 22.597 ms, Total
97.481ms 
                 Buffers: shared hit=3771 read=1045888
               ->  Parallel Seq Scan on eval.molecule m  (cost=0.00..8912094.23 rows=1497859 width=14) (actual
time=101.301..2443.105rows=1331595 loops=9) 
                     Output: moleculeid, arthor."similar"(fp_smiles_rdkit, 'MASKED'::text)
                     Buffers: shared hit=22107 read=8871264
                     Worker 0:  actual time=162.863..2409.947 rows=862751 loops=1
                       Buffers: shared hit=1663 read=710940
                     Worker 1:  actual time=97.449..2454.168 rows=1440465 loops=1
                       Buffers: shared hit=2597 read=1045217
                     Worker 2:  actual time=97.442..2453.629 rows=1434116 loops=1
                       Buffers: shared hit=2431 read=1059144
                     Worker 3:  actual time=163.622..2408.849 rows=871340 loops=1
                       Buffers: shared hit=1757 read=712819
                     Worker 4:  actual time=97.964..2452.247 rows=1428958 loops=1
                       Buffers: shared hit=2423 read=1066168
                     Worker 5:  actual time=97.444..2453.461 rows=1427206 loops=1
                       Buffers: shared hit=2482 read=1049422
                     Worker 6:  actual time=97.443..2453.294 rows=1448356 loops=1
                       Buffers: shared hit=2409 read=1068788
                     Worker 7:  actual time=97.439..2453.456 rows=1436563 loops=1
                       Buffers: shared hit=3715 read=1045888
Planning Time: 0.061 ms
JIT:
   Functions: 19
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.868 ms, Inlining 358.141 ms, Optimization 380.158 ms, Emission 231.978 ms, Total 973.146 ms
Execution Time: 2705.518 ms
(94 rows)

Tim S.



---- On Mon, 05 Jun 2023 14:19:53 +0200 William Alves Da Silva  wrote ---

 > Hello Tim,
 >
 > Can you send the EXPLAIN (BUFFERS, ANALYZE, VERBOSE) of your query? This will show you whether the plan is using
cachingor not.  
 >
 >
 > > On 5 Jun 2023, at 09:15, Tim Schwenke tim@trallnag.com> wrote:
 > >
 > > Hello,
 > >
 > > I have the following table with the following columns:
 > >
 > > large_a: text (few dozen characters up to a few hundred)
 > > large_b: text (few dozen characters up to a few hundred)
 > >
 > > The table has several million rows. The DB is running on a large machine.
 > >
 > > I perform the following query:
 > >
 > > select large_a from table;
 > >
 > > The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a
fewseconds.  
 > >
 > > What I want to know:
 > >
 > > Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit
everything. 
 > >
 > >
 > >
 >
 > Regards,
 > William Alves
 >
 >

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

Tim Schwenke

Дата:

05 июня 2023 г., 13:47:34

Hello,

I just tested it and it indeed works this way.

1. Created new table with only relevant columns.

2. Stopped PostgreSQL DB.

3. Dropped page cache.

4. Started again.

5. Run query against new table

6. Way less cache used, query is faster, less data is being read into cache.

---- On Mon, 05 Jun 2023 15:17:17 +0200 Tim Schwenke <tim@trallnag.com> wrote ---

Hello David,

from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation by columns.

https://www.postgresql.org/docs/8.1/storage.html

This means if I select a single column from a table the first time, the full file / segment is read and put into page cache if there is enough space. This means a table with only one large column large_a takes up less page cache compared to a table with many large columns large_a and large_b, even though in both cases only large_a is selected.

Is that more or less correct? Ignoring toast?

Tim S.

---- On Mon, 05 Jun 2023 14:58:21 +0200 David G. Johnston wrote ---

>
>
> On Monday, June 5, 2023, Tim Schwenke tim@trallnag.com> wrote:
>
>
> Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit everything.
>
>
> Shared buffers is a page cache.
>
> David J.
>

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

Laurenz Albe

Дата:

05 июня 2023 г., 13:58:38

On Mon, 2023-06-05 at 14:15 +0200, Tim Schwenke wrote:
> Hello,
>
> I have the following table with the following columns:
>
> large_a: text (few dozen characters up to a few hundred)
> large_b: text (few dozen characters up to a few hundred)
>
> The table has several million rows. The DB is running on a large machine.
>
> I perform the following query:
>
> select large_a from table;
>
> The first query takes a few minutes. Afterwards I see that the cache in memory has grown. Next query only takes a few
seconds.
>
> What I want to know:
>
> Does the cache also contain large_b? Or is only large_a cached? Assumption is that memory is large enough to fit
everything.

It depends.

If "large_a" and "large_b" are TOASTed, then they will only be cached if the column
is explicitly selected.

For shorter attributes, the whole row is in one piece, and all columns will get cached.

However, for sequential scans that are larger than a quarter of shared buffers,
PostgreSQL uses a ring buffer to avoid blowing out the cache.  In that case, most
data won't be cached for the next query.

Yours,
Laurenz Albe

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

"David G. Johnston"

Дата:

05 июня 2023 г., 14:26:08

On Mon, Jun 5, 2023, 06:17 Tim Schwenke <tim@trallnag.com> wrote:

Hello David,

from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation by columns.

https://www.postgresql.org/docs/8.1/storage.html

This means if I select a single column from a table the first time, the full file / segment is read and put into page cache if there is enough space.

Pages are subsegment.

You may also wish to take care to notice the version of documentation pages you look at and reference. 8.1 is long past its expiration date.

David J.

Re: Does PostgreSQL cache all columns of a table after SELECT?

От

Tim Schwenke

Дата:

05 июня 2023 г., 15:15:24

Thanks, you all helped me

---- On Mon, 05 Jun 2023 16:26:08 +0200 David G. Johnston <david.g.johnston@gmail.com> wrote ---

On Mon, Jun 5, 2023, 06:17 Tim Schwenke <tim@trallnag.com> wrote:
Hello David,

from what I understand, in PostgreSQL, tables are stored in one or more files called segments. There is no separation by columns.

https://www.postgresql.org/docs/8.1/storage.html

This means if I select a single column from a table the first time, the full file / segment is read and put into page cache if there is enough space.

Pages are subsegment.

You may also wish to take care to notice the version of documentation pages you look at and reference. 8.1 is long past its expiration date.

David J.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Does PostgreSQL cache all columns of a table after SELECT?