Обсуждение: [PATCH] Add hint for misspelled relations

Поиск

Список

Период

Сортировка

[PATCH] Add hint for misspelled relations

От

Steve Chavez

Дата:

02 декабря 2025 г., 05:46:18

Hello hackers,

Currently misspelled columns offer a hint but not misspelled relations.

This patch enables that, the result is like:

-- having this table

create table clients (id int);
-- we misspell the table name
select * from cliants;
ERROR: relation "cliants" does not exist
LINE 1: select * from cliants;
HINT: Perhaps you meant to reference the table "public.clients".

The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is qualified, it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function similar to how the column matching is done.

If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If that fails, then the lexicographic order is used to break the tie.

One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is capped by MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is appreciated. Personally I've seen a catalog that contains 125K tables, with mostly auto generated names. For these cases I don't think the hint helps that much anyway, so it seemed fine to bail here.

The changes are split into two commits, one refactoring some reusable functions for easier review and another one implementing the relation hint.

Any feedback is welcomed.

Best regards,

Steve Chavez

[1]:

Вложения

Re: [PATCH] Add hint for misspelled relations

От

Kirill Reshke

Дата:

02 декабря 2025 г., 06:32:44

On Tue, 2 Dec 2025 at 07:46, Steve Chavez <steve@supabase.io> wrote:
>
> Hello hackers,
>
> Currently misspelled columns offer a hint but not misspelled relations.
>
> This patch enables that, the result is like:
>
> -- having this table
> create table clients (id int);
> -- we misspell the table name
> select * from cliants;
> ERROR:  relation "cliants" does not exist
> LINE 1: select * from cliants;
> HINT:  Perhaps you meant to reference the table "public.clients".
>
> The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is
qualified,it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function
similarto how the column matching is done. 
>
> If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If
thatfails, then the lexicographic order is used to break the tie. 
>
> One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is
cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is
appreciated.Personally I've seen  a catalog that contains 125K tables, with mostly auto generated names. For these
casesI don't think the hint helps that much anyway, so it seemed fine to bail here. 
>
> The changes are split into two commits, one refactoring some reusable functions for easier review and another one
implementingthe relation hint. 
>
> Any feedback is welcomed.
>
> Best regards,
> Steve Chavez
>
> [1]:

Hi! I did not reviewed this patch closely, but I stopped a this:

> + reldistance = varstr_levenshtein_less_equal(target->relname, strlen(target->relname),
> + candidate, candidatelen,
> + 1, 1, 1,
> + Min(fuzzystate->distance + 1, MAX_FUZZY_DISTANCE + 1),
> + true);
> +
> + /* The above can return MAX_FUZZY_DISTANCE + 1 results, skip these */
> + if (reldistance > MAX_FUZZY_DISTANCE)
> + continue;

Why do we even do this? Can't we just pass fuzzystate->distance to
varstr_levenshtein_less_equal? It is initialized in outer func to
MAX_FUZZY_DISTANCE + 1

--
Best regards,
Kirill Reshke

Re: [PATCH] Add hint for misspelled relations

От

Daniel Gustafsson

Дата:

02 декабря 2025 г., 12:27:41

> On 2 Dec 2025, at 03:46, Steve Chavez <steve@supabase.io> wrote:

> Currently misspelled columns offer a hint but not misspelled relations.

The tab-completion in psql is one tool offered to avoid misspellings which
reduce the need.

> One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is
cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is
appreciated.Personally I've seen  a catalog that contains 125K tables, with mostly auto generated names. For these
casesI don't think the hint helps that much anyway, so it seemed fine to bail here. 

What makes hints for columns appealing is that it's a pretty contained problem
across data we've already accumulated, relations are quite different as they
require a catalog lookup making it a lot less appealing.  The number of
relations can easily become quite large, especially when large partitioning
hierarchies are involved, and while capping prevents large scans there is no
guarantee that the MAX_REL_HINT_CANDIDATES set contains the most likely
entries.  Worst case is that it never contains the a close enough match and
we'll spend a scan + hash + calculation of MAX_REL_HINT_CANDIDATES for each
typo.

Are there ways you can pare down the scan, perhaps a scankey on relispartition
and only consider base relations?

Hinting on relations can also give hints for relations the user does not have
permissions on which further reduce the useability.

--
Daniel Gustafsson

Re: [PATCH] Add hint for misspelled relations

От

Steve Chavez

Дата:

10 декабря 2025 г., 09:10:28

Hi Kirill and Daniel, many thanks for your feedback.

> Why do we even do this? Can't we just pass fuzzystate->distance to

varstr_levenshtein_less_equal? It is initialized in outer func to
MAX_FUZZY_DISTANCE + 1

This was an oversight. I've corrected it on the new attached patch.

> Worst case is that it never contains the a close enough match and

we'll spend a scan + hash + calculation of MAX_REL_HINT_CANDIDATES for each
typo.

> Are there ways you can pare down the scan, perhaps a scankey on relispartition

and only consider base relations?

I've improved the code (new patch attached) and the hash is no longer necessary.

Now the scan omits partitions as you suggested plus they're now restricted to each schema on the search_path.

It was also possible to do more scans for each relevant relkind (RELKIND_RELATION, RELKIND_VIEW, etc) but I wasn't sure if that was more expensive than doing the filtering on the loop as it's currently done.

It also made the code a bit more complicated with more nested loops, so I've left it as is.

Perhaps we should limit the amount of schema scans too in case the search_path has too many schemas?

> Hinting on relations can also give hints for relations the user does not have

permissions on which further reduce the useability.

I've tested the current column hint and it also doesn't consider privileges:

```
create role new nosuperuser;
grant usage on schema public to new;
create table subitems (id int, name text);
grant select (id) on subitems to new;

begin;
set local role to new;
select nam from subitems;
2025-12-09 18:16:38.017 -05 [542091] ERROR: column "nam" does not exist at character 8
2025-12-09 18:16:38.017 -05 [542091] HINT: Perhaps you meant to reference the column "subitems.name".
2025-12-09 18:16:38.017 -05 [542091] STATEMENT: select nam from subitems ;
ERROR: column "nam" does not exist
LINE 1: select nam from subitems ;
HINT: Perhaps you meant to reference the column "subitems.name".

```

So perhaps that can be a separate enhancement for both hints?

> The tab-completion in psql is one tool offered to avoid misspellings which

reduce the need.

True, but it's still useful to offer a hint for other clients.

What do you think?

Best regards,

Steve Chavez

On Tue, 2 Dec 2025 at 04:27, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 2 Dec 2025, at 03:46, Steve Chavez <steve@supabase.io> wrote:

> Currently misspelled columns offer a hint but not misspelled relations.

The tab-completion in psql is one tool offered to avoid misspellings which
reduce the need.

> One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is capped by MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is appreciated. Personally I've seen a catalog that contains 125K tables, with mostly auto generated names. For these cases I don't think the hint helps that much anyway, so it seemed fine to bail here.

What makes hints for columns appealing is that it's a pretty contained problem
across data we've already accumulated, relations are quite different as they
require a catalog lookup making it a lot less appealing. The number of
relations can easily become quite large, especially when large partitioning
hierarchies are involved, and while capping prevents large scans there is no
guarantee that the MAX_REL_HINT_CANDIDATES set contains the most likely
entries. Worst case is that it never contains the a close enough match and
we'll spend a scan + hash + calculation of MAX_REL_HINT_CANDIDATES for each
typo.

Are there ways you can pare down the scan, perhaps a scankey on relispartition
and only consider base relations?

Hinting on relations can also give hints for relations the user does not have
permissions on which further reduce the useability.

--
Daniel Gustafsson

Вложения

v2-0001-add-a-hint-for-a-missing-relation.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [PATCH] Add hint for misspelled relations

[PATCH] Add hint for misspelled relations

Вложения

Re: [PATCH] Add hint for misspelled relations

Re: [PATCH] Add hint for misspelled relations

Re: [PATCH] Add hint for misspelled relations

Вложения