Обсуждение: [PATCH] Add hint for misspelled relations
Hello hackers,
Currently misspelled columns offer a hint but not misspelled relations.
This patch enables that, the result is like:
-- having this table
-- having this table
create table clients (id int);
-- we misspell the table name
select * from cliants;
ERROR: relation "cliants" does not exist
LINE 1: select * from cliants;
HINT: Perhaps you meant to reference the table "public.clients".
The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is qualified, it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function similar to how the column matching is done.
If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If that fails, then the lexicographic order is used to break the tie.
-- we misspell the table name
select * from cliants;
ERROR: relation "cliants" does not exist
LINE 1: select * from cliants;
HINT: Perhaps you meant to reference the table "public.clients".
The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is qualified, it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function similar to how the column matching is done.
If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If that fails, then the lexicographic order is used to break the tie.
One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is capped by MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is appreciated. Personally I've seen a catalog that contains 125K tables, with mostly auto generated names. For these cases I don't think the hint helps that much anyway, so it seemed fine to bail here.
The changes are split into two commits, one refactoring some reusable functions for easier review and another one implementing the relation hint.
Any feedback is welcomed.
Best regards,
Any feedback is welcomed.
Best regards,
Steve Chavez
[1]:
Вложения
On Tue, 2 Dec 2025 at 07:46, Steve Chavez <steve@supabase.io> wrote: > > Hello hackers, > > Currently misspelled columns offer a hint but not misspelled relations. > > This patch enables that, the result is like: > > -- having this table > create table clients (id int); > -- we misspell the table name > select * from cliants; > ERROR: relation "cliants" does not exist > LINE 1: select * from cliants; > HINT: Perhaps you meant to reference the table "public.clients". > > The possible matches are searched in pg_class for the schemas present in search_path (or if the relation is qualified,it only searches matches in that schema). The logic reuses the `varstr_levenshtein_less_equal` function similarto how the column matching is done. > > If there's a tie in the fuzzy match, it's solved by preferring the schema that appears first on the search path. If thatfails, then the lexicographic order is used to break the tie. > > One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is appreciated.Personally I've seen a catalog that contains 125K tables, with mostly auto generated names. For these casesI don't think the hint helps that much anyway, so it seemed fine to bail here. > > The changes are split into two commits, one refactoring some reusable functions for easier review and another one implementingthe relation hint. > > Any feedback is welcomed. > > Best regards, > Steve Chavez > > [1]: Hi! I did not reviewed this patch closely, but I stopped a this: > + reldistance = varstr_levenshtein_less_equal(target->relname, strlen(target->relname), > + candidate, candidatelen, > + 1, 1, 1, > + Min(fuzzystate->distance + 1, MAX_FUZZY_DISTANCE + 1), > + true); > + > + /* The above can return MAX_FUZZY_DISTANCE + 1 results, skip these */ > + if (reldistance > MAX_FUZZY_DISTANCE) > + continue; Why do we even do this? Can't we just pass fuzzystate->distance to varstr_levenshtein_less_equal? It is initialized in outer func to MAX_FUZZY_DISTANCE + 1 -- Best regards, Kirill Reshke
> On 2 Dec 2025, at 03:46, Steve Chavez <steve@supabase.io> wrote: > Currently misspelled columns offer a hint but not misspelled relations. The tab-completion in psql is one tool offered to avoid misspellings which reduce the need. > One problem is that scanning all pg_class entries can get expensive on big catalogs, so the number of searches is cappedby MAX_REL_HINT_CANDIDATES. I've set this to 4096 arbitrarily, any guidance on what would be a good number is appreciated.Personally I've seen a catalog that contains 125K tables, with mostly auto generated names. For these casesI don't think the hint helps that much anyway, so it seemed fine to bail here. What makes hints for columns appealing is that it's a pretty contained problem across data we've already accumulated, relations are quite different as they require a catalog lookup making it a lot less appealing. The number of relations can easily become quite large, especially when large partitioning hierarchies are involved, and while capping prevents large scans there is no guarantee that the MAX_REL_HINT_CANDIDATES set contains the most likely entries. Worst case is that it never contains the a close enough match and we'll spend a scan + hash + calculation of MAX_REL_HINT_CANDIDATES for each typo. Are there ways you can pare down the scan, perhaps a scankey on relispartition and only consider base relations? Hinting on relations can also give hints for relations the user does not have permissions on which further reduce the useability. -- Daniel Gustafsson