Обсуждение: Selecting all variations of job title in a list
Companies can have slightly different titles for the same job; for example (using abbreviations). 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,', 'Plant Mgr.' I want to select all people table rows that contain these varieties. I know the 'like' operator uses '%' as a wildcard, but is not accepted in an 'in' list. Is there a way to use a multicharacter wildcard in an 'in' list? TIA, Rich
On Tue, Nov 25, 2025 at 11:33 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
Companies can have slightly different titles for the same job; for example
(using abbreviations). 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,',
'Plant Mgr.'
I want to select all people table rows that contain these varieties. I know
the 'like' operator uses '%' as a wildcard, but is not accepted in an 'in'
list.
Is there a way to use a multicharacter wildcard in an 'in' list?
Maybe regex_match() with a bunch of OR clauses.
In bash, I'd do something like:
grep -E ' ^Asst Gen Mgr.*|^Env Mgr.*|^Gen Mgr.*|^Mgr.*|^Plant Mgr..*' foo.txt
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 11/25/25 08:33, Rich Shepard wrote: > Companies can have slightly different titles for the same job; for example > (using abbreviations). 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,', > 'Plant Mgr.' > > I want to select all people table rows that contain these varieties. I know > the 'like' operator uses '%' as a wildcard, but is not accepted in an 'in' > list. Something like?: select 'Env mgr.' ilike ANY(ARRAY['Asst Gen Mgr.', 'Env Mgr.', 'Gen Mgr.', 'Mgr.','Plant Mgr.']); t > > Is there a way to use a multicharacter wildcard in an 'in' list? > > TIA, > > Rich > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 25 Nov 2025, Ron Johnson wrote: > Maybe regex_match() with a bunch of OR clauses. > > In bash, I'd do something like: > grep -E ' ^Asst Gen Mgr.*|^Env Mgr.*|^Gen Mgr.*|^Mgr.*|^Plant Mgr..*' > foo.txt Ron, I've not used regex in postgres before, only in emacs and small languages. So I'll learn how to do this. I was trying to avoid multiple OR clauses, but that script will not be run often so it should not matter. Thanks, Rich
On Tue, 25 Nov 2025, Adrian Klaver wrote: > Something like?: > > select 'Env mgr.' ilike ANY(ARRAY['Asst Gen Mgr.', 'Env Mgr.', 'Gen Mgr.', > 'Mgr.','Plant Mgr.']); Adrian, Hadn't thought of this. Your, and Ron's, answers taught me that the answer to my question is 'no, there is no shortcut.' :-) Thanks, Rich
On Tue, Nov 25, 2025 at 2:05 PM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 25 Nov 2025, Ron Johnson wrote:
> Maybe regex_match() with a bunch of OR clauses.
>
> In bash, I'd do something like:
> grep -E ' ^Asst Gen Mgr.*|^Env Mgr.*|^Gen Mgr.*|^Mgr.*|^Plant Mgr..*'
> foo.txt
Ron,
I've not used regex in postgres before, only in emacs and small languages.
So I'll learn how to do this. I was trying to avoid multiple OR clauses, but
that script will not be run often so it should not matter.
Failed clarity on my part. The "OR clauses" are within the regex string.
Note that The Relational Way of doing this is for everyone to have title_code in their "person" record, and the "title" table will have, in addition to the title_code PK column, the title_description, title_abbreviation ( 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,', 'Plant Mgr.') and title_group columns. All those manager titles would be in the same group. You'd then join "person" to "title" and filter where title_group="mumble".
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tue, 25 Nov 2025, Ron Johnson wrote: > Failed clarity on my part. The "OR clauses" are within the regex string. Ron, That's okay. > Note that The Relational Way of doing this is for everyone to have > title_code in their "person" record, and the "title" table will have, in > addition to the title_code PK column, the title_description, > title_abbreviation ( 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,', > 'Plant Mgr.') and title_group columns. All those manager titles would be > in the same group. You'd then join "person" to "title" and filter where > title_group="mumble". That's impractical. The title table would be hundreds of rows long, most having only 1 or 2 references. Companies don't have standard titles for everyone. Rich
On 11/25/25 11:08, Rich Shepard wrote: > On Tue, 25 Nov 2025, Adrian Klaver wrote: > >> Something like?: >> >> select 'Env mgr.' ilike ANY(ARRAY['Asst Gen Mgr.', 'Env Mgr.', 'Gen >> Mgr.', 'Mgr.','Plant Mgr.']); > > Adrian, > > Hadn't thought of this. > > Your, and Ron's, answers taught me that the answer to my question is 'no, > there is no shortcut.' :-) You will need to be clearer about what you want. If it is to fetch titles with Mgr in the name then it would be simple: select * from people_table where title ilike '%mgr%'; If it is something more selective you will need to provide a more detailed example. > > Thanks, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 25 Nov 2025, Adrian Klaver wrote: > You will need to be clearer about what you want. > > If it is to fetch titles with Mgr in the name then it would be simple: > > select * from people_table where title ilike '%mgr%'; > > If it is something more selective you will need to provide a more detailed > example. Adrian, I thought my example was clear: there are modifiers to job titles such as 'Manager,' 'Vice President,' 'Engineer.' That's why I asked if the `like/ilike' modifier would work in a list passed to the `in' command. Rich
On 11/25/25 12:53, Rich Shepard wrote: > On Tue, 25 Nov 2025, Adrian Klaver wrote: > >> You will need to be clearer about what you want. >> >> If it is to fetch titles with Mgr in the name then it would be simple: >> >> select * from people_table where title ilike '%mgr%'; >> >> If it is something more selective you will need to provide a more >> detailed example. > > Adrian, > > I thought my example was clear: there are modifiers to job titles such as > 'Manager,' 'Vice President,' 'Engineer.' That's why I asked if the > `like/ilike' modifier would work in a list passed to the `in' command. So what are you searching on, the title, the modifier or both? > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tuesday, November 25, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:
Companies can have slightly different titles for the same job; for example
(using abbreviations). 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,',
'Plant Mgr.'
Suggest encoding that knowledge in a table then using normal joins and filters to query against whatever you;ve decided is the canonical spelling of the relevant position.
David J.
On Tue, 25 Nov 2025 08:33:07 -0800 (PST) Rich Shepard <rshepard@appl-ecosys.com> wrote: > Companies can have slightly different titles for the same job; for example > (using abbreviations). 'Asst Gen Mgr.', 'Env Mgr,', 'Gen Mgr,'. 'Mgr,', > 'Plant Mgr.' > > I want to select all people table rows that contain these varieties. I know > the 'like' operator uses '%' as a wildcard, but is not accepted in an 'in' > list. > > Is there a way to use a multicharacter wildcard in an 'in' list? > Hi Rich, As Ron wrote, a regexp would do it, I guess? SELECT * FROM people WHERE title ~ 'Asst Gen Mgr|Env Mgr|Gen Mgr|Mgr|Plant Mgr'; Although in your example, you would get the same result with just: SELECT * FROM people WHERE title ~ 'Mgr'; https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP -- Bien à vous, Vincent Veyron https://marica.fr/ Logiciel de suivi des contentieux juridiques, des sinistres d'assurance et des contrats
On Wed, 26 Nov 2025, Vincent Veyron wrote: > As Ron wrote, a regexp would do it, I guess? > > SELECT * FROM people WHERE title ~ 'Asst Gen Mgr|Env Mgr|Gen Mgr|Mgr|Plant Mgr'; > > Although in your example, you would get the same result with just: > > SELECT * FROM people WHERE title ~ 'Mgr'; > > https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP Vincent, Learning postgres regexp is at the top of my list. Thanks, Rich
On Wed, 26 Nov 2025 07:10:16 -0800 (PST) Rich Shepard <rshepard@appl-ecosys.com> wrote: > > Learning postgres regexp is at the top of my list. > https://perldoc.perl.org/perlre Read the first few pages and you'll be good to go. -- Bien à vous, Vincent Veyron https://marica.fr/ Logiciel de suivi des contentieux juridiques, des sinistres d'assurance et des contrats
On Wednesday, November 26, 2025, Vincent Veyron <vv.lists@wanadoo.fr> wrote:
On Wed, 26 Nov 2025 07:10:16 -0800 (PST)
Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> Learning postgres regexp is at the top of my list.
>
https://perldoc.perl.org/perlre
Read the first few pages and you'll be good to go.
I was using this tool a while back when I was doing heavy regex work.
Keep in mind the native flavor of regex in PostgreSQL is TCL, not Perl.
But I’d still say regexp is not the best solution here - unless you encapsulate the logic in a function. I suspect you’ll want to use this logic in more than just a single query and with a literal regexp you have to rely on manual synchronization. Note, you could combine the lookup table with regexes. Though beware of ensure you don’t produce duplicate matches if you go that route.
David J.
On Wed, 26 Nov 2025, David G. Johnston wrote: > I was using this tool a while back when I was doing heavy regex work. > > https://www.regexbuddy.com/ > > Keep in mind the native flavor of regex in PostgreSQL is TCL, not Perl. > > But I’d still say regexp is not the best solution here - unless you > encapsulate the logic in a function. I suspect you’ll want to use this > logic in more than just a single query and with a literal regexp you have > to rely on manual synchronization. Note, you could combine the lookup > table with regexes. Though beware of ensure you don’t produce duplicate > matches if you go that route. David, Thanks, Rich
I am not very skilled at Postgresql specifically, but when I was doing SQL in another environment I would just do
select distinct (or unique) jobtitle
usually getting a count of how many times each title occurred. Then I would create a mapping to standardize the the job titles.
Bryan Sayer
Retired Demographer/Statistician
In a world in which you can be anything, be kind
Retired Demographer/Statistician
In a world in which you can be anything, be kind
On 11/26/2025 11:10 AM, Rich Shepard wrote:
On Wed, 26 Nov 2025, David G. Johnston wrote:I was using this tool a while back when I was doing heavy regex work.
https://www.regexbuddy.com/
Keep in mind the native flavor of regex in PostgreSQL is TCL, not Perl.
But I’d still say regexp is not the best solution here - unless you
encapsulate the logic in a function. I suspect you’ll want to use this
logic in more than just a single query and with a literal regexp you have
to rely on manual synchronization. Note, you could combine the lookup
table with regexes. Though beware of ensure you don’t produce duplicate
matches if you go that route.
David,
Thanks,
Rich
Many years ago I used a weak precedence engine to categorize this form of job title script to a job code.
The regexp did not work because we had asst to gen mgr
The wpe tokenized the words then relaxed token patterns with other token codes with cumulative context
The final token reduction resulted in a final code or if in my example we had a modifier to role code (asst to xxxx role)
The entire engine was created in Oracle but would be easy to implement in pgsql. Back then neural nets were only just appearing in finance and llms were non existent.
Old 1960's tech saved the day
On Wed, Nov 26, 2025, 17:02 Bryan Sayer <blslists@gmail.com> wrote:
I am not very skilled at Postgresql specifically, but when I was doing SQL in another environment I would just do
select distinct (or unique) jobtitle
usually getting a count of how many times each title occurred. Then I would create a mapping to standardize the the job titles.
Bryan Sayer
Retired Demographer/Statistician
In a world in which you can be anything, be kindOn 11/26/2025 11:10 AM, Rich Shepard wrote:On Wed, 26 Nov 2025, David G. Johnston wrote:I was using this tool a while back when I was doing heavy regex work.
https://www.regexbuddy.com/
Keep in mind the native flavor of regex in PostgreSQL is TCL, not Perl.
But I’d still say regexp is not the best solution here - unless you
encapsulate the logic in a function. I suspect you’ll want to use this
logic in more than just a single query and with a literal regexp you have
to rely on manual synchronization. Note, you could combine the lookup
table with regexes. Though beware of ensure you don’t produce duplicate
matches if you go that route.
David,
Thanks,
Rich
Rich Shepard <rshepard@appl-ecosys.com> wrote:
> Learning postgres regexp is at the top of my list.
It's well worth knowing a few regex tricks and they're surprisingly
easy to remember. I find these the most useful for ad hoc queries.
'|' for or as mentioned
'()' if you want to check part of an expression eg '(abc|xyz)pqr'
'^' to restrict it to the beginning
'$' to restrict it to the end
Here's an example with your list.
with x(t) as (values ('Asst Gen Mgr.'), ('Env Mgr,'), ('Gen Mgr.'),
('Mgr,'),('Plant Mgr.'))
select * from x where t ~ '(Asst Gen |Gen |Env |Plant |)Mgr(.|,)'
Here's a slightly fancier nested one, just for illustration.
with x(t) as (values ('Asst Gen Mgr.'), ('Env Mgr,'), ('Gen Mgr.'),
('Mgr,'),('Plant Mgr.'))
select * from x where t ~ '^((Asst |)Gen |Env |Plant |)Mgr(.|,)$'
I use regex in my tests and it's practically instant on a few thousand rows.
Thanks, Bernice