Обсуждение: [PATCH] Add hint for misspelled relations

Поиск
Список
Период
Сортировка

[PATCH] Add hint for misspelled relations

От
Steve Chavez
Дата:
Hello hackers,

Currently misspelled columns offer a hint but not misspelled relations.

This patch enables that, the result is like:

-- having this table
create table clients (id int); 
-- we misspell the table name 
select * from cliants;
ERROR:  relation "cliants" does not exist
LINE 1: select * from cliants;
HINT:  Perhaps you meant to reference the table "public.clients".

The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is qualified, it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function similar to how the column matching is done. 

If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If that fails, then the lexicographic order is used to break the tie.

One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is capped by MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is appreciated. Personally I've seen  a catalog that contains 125K tables, with mostly auto generated names. For these cases I don't think the hint helps that much anyway, so it seemed fine to bail here.

The changes are split into two commits, one refactoring some reusable functions for easier review and another one implementing the relation hint.

Any feedback is welcomed.

Best regards,
Steve Chavez

[1]:
Вложения

Re: [PATCH] Add hint for misspelled relations

От
Kirill Reshke
Дата:
On Tue, 2 Dec 2025 at 07:46, Steve Chavez <steve@supabase.io> wrote:
>
> Hello hackers,
>
> Currently misspelled columns offer a hint but not misspelled relations.
>
> This patch enables that, the result is like:
>
> -- having this table
> create table clients (id int);
> -- we misspell the table name
> select * from cliants;
> ERROR:  relation "cliants" does not exist
> LINE 1: select * from cliants;
> HINT:  Perhaps you meant to reference the table "public.clients".
>
> The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is
qualified,it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function
similarto how the column matching is done. 
>
> If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If
thatfails, then the lexicographic order is used to break the tie. 
>
> One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is
cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is
appreciated.Personally I've seen  a catalog that contains 125K tables, with mostly auto generated names. For these
casesI don't think the hint helps that much anyway, so it seemed fine to bail here. 
>
> The changes are split into two commits, one refactoring some reusable functions for easier review and another one
implementingthe relation hint. 
>
> Any feedback is welcomed.
>
> Best regards,
> Steve Chavez
>
> [1]:

Hi! I did not reviewed this patch closely, but I stopped a this:

> + reldistance = varstr_levenshtein_less_equal(target->relname, strlen(target->relname),
> + candidate, candidatelen,
> + 1, 1, 1,
> + Min(fuzzystate->distance + 1, MAX_FUZZY_DISTANCE + 1),
> + true);
> +
> + /* The above can return MAX_FUZZY_DISTANCE + 1 results, skip these */
> + if (reldistance > MAX_FUZZY_DISTANCE)
> + continue;

Why do we even do this? Can't we just pass fuzzystate->distance to
varstr_levenshtein_less_equal? It is initialized in outer func to
MAX_FUZZY_DISTANCE + 1

--
Best regards,
Kirill Reshke



Re: [PATCH] Add hint for misspelled relations

От
Daniel Gustafsson
Дата:
> On 2 Dec 2025, at 03:46, Steve Chavez <steve@supabase.io> wrote:

> Currently misspelled columns offer a hint but not misspelled relations.

The tab-completion in psql is one tool offered to avoid misspellings which
reduce the need.

> One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is
cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is
appreciated.Personally I've seen  a catalog that contains 125K tables, with mostly auto generated names. For these
casesI don't think the hint helps that much anyway, so it seemed fine to bail here. 

What makes hints for columns appealing is that it's a pretty contained problem
across data we've already accumulated, relations are quite different as they
require a catalog lookup making it a lot less appealing.  The number of
relations can easily become quite large, especially when large partitioning
hierarchies are involved, and while capping prevents large scans there is no
guarantee that the MAX_REL_HINT_CANDIDATES set contains the most likely
entries.  Worst case is that it never contains the a close enough match and
we'll spend a scan + hash + calculation of MAX_REL_HINT_CANDIDATES for each
typo.

Are there ways you can pare down the scan, perhaps a scankey on relispartition
and only consider base relations?

Hinting on relations can also give hints for relations the user does not have
permissions on which further reduce the useability.

--
Daniel Gustafsson