Обсуждение: Getting "could not read block" error when creating an index on a function.

Поиск
Список
Период
Сортировка

Getting "could not read block" error when creating an index on a function.

От
Demitri Muna
Дата:
Hello,

I’m getting stuck on a problem I was hoping I could get some help with. I’m trying to create an index for the results
ofa function that touches two tables like this and get the following error: 

CREATE INDEX my_idx ON mytable (first_time(id));
ERROR:  could not read block 0 in file "base/16386/42868": read only 0 of 8192 bytes
CONTEXT:  SQL function "first_time" during startup

Every time I run this, the last number in the block path increments by one, e.g.

ERROR:  could not read block 0 in file "base/16386/42869": read only 0 of 8192 bytes
ERROR:  could not read block 0 in file "base/16386/42870": read only 0 of 8192 bytes
ERROR:  could not read block 0 in file "base/16386/42871": read only 0 of 8192 bytes

The database is sitting on two tablespaces (pg_default and ‘data2’). When I try to use the second, I get:

CREATE INDEX my_idx ON mytable (first_time(id)) TABLESPACE data2;
ERROR:  could not read block 0 in file "pg_tblspc/17007/PG_13_202007201/16386/42870": read only 0 of 8192 bytes
CONTEXT:  SQL function "first_time" during startup

with the last number similarly incrementing upon repeated attempts.

Relevant details:

* PostgreSQL version 13.1 running on Ubuntu 20.04.1 on an AWS instance using 2 x 8TB EBS storage.
* The database is ~15TB in size.
* I am not worried about data loss; the database can be considered read-only and I have all of the files needed to
recreateany table. 
* I am hoping to not recreate the whole database from scratch since doing so and creating the required indices will
takemore than a week. 
* I used these settings while importing the files to speed the process since I was not worried about data loss to
improvethe import speed (all turned back on after import): 

autovacuum = off
synchronous_commit=off
fsync = off
full_page_writes = off

* I will not do the above again. :)
* The postmaster server crashed at least twice during the process due to running out of disk space.
* Creating any number of new indices on bare columns is no problem.
* I DROPped and recreated the functions with no change.
* This statement returns no results (but maybe am looking at the wrong thing):

select n.nspname AS schema, c.relname AS realtion from pg_class c inner join pg_namespace n on (c.relnamespace = n.oid)
wherec.relfilenode = '16386’; 

From reading about this error (missing block files) it suggests I have some database corruption, which is fine as I can
easilydelete anything problematic and recreate. I’ve deleted the indices related to the function and recreated them,
butthe same error remains. Accessing the related tables seems ok, but with that much data I can’t guarantee that. I
don’tget any errors.  

Any help would be appreciated!

Cheers,
Demitri




Re: Getting "could not read block" error when creating an index on a function.

От
Tom Lane
Дата:
Demitri Muna <postgresql@demitri.com> writes:
> I’m getting stuck on a problem I was hoping I could get some help with. I’m trying to create an index for the results
ofa function that touches two tables like this and get the following error: 

> CREATE INDEX my_idx ON mytable (first_time(id));
> ERROR:  could not read block 0 in file "base/16386/42868": read only 0 of 8192 bytes
> CONTEXT:  SQL function "first_time" during startup

What's apparently happening is that some query in the function is trying
to examine the newly-created index before it's been populated.

I would call this a bug if it were a supported case, but really you are
doing something you are not allowed to.  Functions in indexed expressions
are required to be immutable, and a function that looks at the contents of
a table --- particularly the very table that the index is on --- is simply
not going to be that.  Marking such a function immutable to try to end-run
around the restriction is unsafe.

            regards, tom lane



Re: Getting "could not read block" error when creating an index on a function.

От
Demitri Muna
Дата:
Hi Tom,

> On Dec 30, 2020, at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I would call this a bug if it were a supported case, but really you are
> doing something you are not allowed to.  Functions in indexed expressions
> are required to be immutable, and a function that looks at the contents of
> a table --- particularly the very table that the index is on --- is simply
> not going to be that.  Marking such a function immutable to try to end-run
> around the restriction is unsafe.

Thank you, that makes perfect sense. In my mind it was immutable since the database is read-only, but I can see to PG
it’snot. Can you suggest an alternate for what I’m trying to do? Given this schema (a “person” has a number of
“events”):

CREATE TABLE person (
    id SERIAL,
    ...
);

CREATE TABLE event (
    id SERIAL,
    patient_id INTEGER
    event_timestamp TIMESTAMP,
    …
);

I have a function (the one I was trying to index) that returns the earliest event for a person. I’m scanning another
tablewith ~10B rows several times using a few of these “constant” values: 

* first_event_timestamp(person_id) + ‘1 month’
* first_event_timestamp(person_id) + ‘13 months’
* etc.

I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism would
bebest to do this? Create additional columns? Create another table? 

Thanks again,
Demitri





Re: Getting "could not read block" error when creating an index on a function.

От
Karsten Hilbert
Дата:
Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:

> I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism
wouldbe best to do this? Create additional columns? Create another table? 

A materialized view ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Getting "could not read block" error when creating an index on a function.

От
Christophe Pettus
Дата:

> On Dec 30, 2020, at 11:37, Demitri Muna <postgresql@demitri.com> wrote:
> I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism
wouldbe best to do this? Create additional columns? Create another table? 

This might be a good use for a generated column.

    https://www.postgresql.org/docs/current/ddl-generated-columns.html
--
-- Christophe Pettus
   xof@thebuild.com




Re: Getting "could not read block" error when creating an index on a function.

От
Christophe Pettus
Дата:

> On Dec 30, 2020, at 11:48, Christophe Pettus <xof@thebuild.com> wrote:
>
> This might be a good use for a generated column.
>
>     https://www.postgresql.org/docs/current/ddl-generated-columns.html

I take that back; the generation formula has to be immutable as well.  Perhaps a column populated by a trigger?
--
-- Christophe Pettus
   xof@thebuild.com




Re: Getting "could not read block" error when creating an index on a function.

От
Tom Lane
Дата:
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:
>> I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism
wouldbe best to do this? Create additional columns? Create another table? 

> A materialized view ?

Yeah, or you might be able to do something with a before-insert-or-update
trigger that computes whatever desired value you want and fills it into a
derived column.  Indexing that column then gives the same results as
indexing the derived expression; but it sidesteps the semantic problems
because the time of computation of the expression is well-defined, even
if it's not immutable.

You might try to avoid a handwritten trigger by defining a generated
column instead, but we insist that generation expressions be immutable
so it won't really work.  (Of course, you could still lie about the
mutability of the expression, but I can't recommend that.  Per Henry
Spencer's well-known dictum, "If you lie to the compiler, it will get its
revenge".  He was speaking of C compilers, I suppose, but the principle
applies to database query optimizers too.)

            regards, tom lane



Re: Getting "could not read block" error when creating an index on a function.

От
Demitri Muna
Дата:
Thank you for the responses! I was going to go with a materialized view, but then realized that since the dataset is
staticit’s really no different from just creating a new table and indexing that. The suggestions provide useful advice
forthe future though. 

Cheers,
Demitri

> On Dec 30, 2020, at 3:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
>> Am Wed, Dec 30, 2020 at 02:37:59PM -0500 schrieb Demitri Muna:
>>> I want to index the results of these repeated, unchanging calculations to speed up other queries. Which mechanism
wouldbe best to do this? Create additional columns? Create another table? 
>
>> A materialized view ?
>
> Yeah, or you might be able to do something with a before-insert-or-update
> trigger that computes whatever desired value you want and fills it into a
> derived column.  Indexing that column then gives the same results as
> indexing the derived expression; but it sidesteps the semantic problems
> because the time of computation of the expression is well-defined, even
> if it's not immutable.
>
> You might try to avoid a handwritten trigger by defining a generated
> column instead, but we insist that generation expressions be immutable
> so it won't really work.  (Of course, you could still lie about the
> mutability of the expression, but I can't recommend that.  Per Henry
> Spencer's well-known dictum, "If you lie to the compiler, it will get its
> revenge".  He was speaking of C compilers, I suppose, but the principle
> applies to database query optimizers too.)
>
>             regards, tom lane