I originally thought an enum was a good comparison for this feature, but I'm no longer sure that it is. A text-based ordering would be desired rather than the label index.
A better comparison may be a two-column lookup table:
-- create CREATE TABLE cities (id bigserial primary key, name text) CREATE UNIQUE INDEX ON cities (name); CREATE TABLE users (city_id bigint);
-- insert BEGIN; INSERT INTO cities (name) VALUES ('Chicago') ON CONFLICT (name) DO NOTHING RETURNING id; INSERT INTO users (city_id) VALUES (<city id returned from earlier>); COMMIT;
-- select SELECT * FROM users FROM users INNER JOIN cities ONcities.id= users.city_id WHEREname= 'Chicago';
Ideally, the lookup table could be maintained by Postgres to make reads and writes easier.
-- create CREATE TABLE users (city text DEDUPED);
-- insert INSERT INTO users (city) VALUES ('Chicago');
-- query SELECT * FROM users WHERE city = 'Chicago';
I'm not really sure the best place to store this lookup table.
> On Feb 12, 2018, at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Kane <andrew@chartkick.com> writes: >> Thanks everyone for the feedback. The current enum implementation requires >> you to create a new type and add labels outside a transaction prior to an >> insert. > > Right ... > >> Since enums have a fixed number of labels, this type of feature may be >> better off as a property you could add to text columns (as Thomas >> mentions). This would avoid issues with hitting the max number of labels. > > ... but you're not saying how you'd avoid the need for prior commit of the > labels. The sticking point for enums is that once a value has gotten into > a btree index, we can't ever lose the ability to compare that value to > others, or the index will be broken. So inserting an uncommitted value > into user tables has to be prevented. > > Maybe there's a way to assign the labels so that they can be compared > without reference to any outside data, but it's not clear to me how > that would work.
When I implemented this, I wrote the comparators to work on the Oid for the value, not the string representation. That works fine. If you want to sort the data on the stringified version, cast to text first. That works well enough for me, since I'm typically not interested in what sort order is used, as long as it is deterministic and works for indexing, group by, and so forth.